Alex Merced is a developer advocate for Dremio and has worked as a developer and instructor for companies like GenEd Systems, Crossfield Digital, CampusGuard and General Assembly. Alex is passionate about technology and has put out tech content on outlets such as blogs, videos and his podcasts Datanation and Web Dev 101. Alex Merced has contributed a variety of libraries in the Javascript & Python worlds including SencilloDB, CoquitoJS, dremio-simple-query and more.
We’re always looking for ways to better handle and save money on our data. That’s why the “data lakehouse” is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]
In the age of data-centric applications, storing, accessing, and managing data can significantly influence an organization’s ability to derive value from a data lakehouse. At the heart of this conversation are data lakehouse table formats, which are metadata layers that allow tools to interact with data lake storage like a traditional database. But why do […]
In the ever-evolving data landscape, the need for robust and scalable data storage solutions is growing exponentially. The essence of data-driven decisions lies in the capability to harness vast amounts of structured and unstructured data from various sources, process them, and prepare them for analysis. In this realm, the concept of the “lakehouse” has emerged […]
Transcript Note: This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors. Opening Alex Merced: Hey, everybody. This is Alex Merced, developer/advocate here at Dremio, and your host here every week on Gnarly Data Waves. This week, I will be presenting about ETL, ELT, and […]
Unlock the potential of data engineering in our "ELT, ETL & the Dremio Data Lakehouse" webinar! Discover how Dremio's no-copy architecture revolutionizes ETL & ELT patterns, optimizing data processing and cutting costs.The Dremio Data Lakehouse has emerged as a game-changing solution in data analytics, combining the best of data lakes and data warehouses into a unified architecture. With its versatile capabilities, Dremio opens up a world of possibilities for organizations across various use cases in the realm of either modernizing or upgrading their current data systems, […]
In the era of big data and analytics, the quality of data plays a critical role in making informed decisions and extracting meaningful insights. However, ensuring data quality can be complex, requiring thorough checks and validations. In this blog article, we explore 10 essential data quality checks using three powerful tools: SQL, Pandas, and Polars. […]
Flink is a supercharged tool for processing data in real-time or in batches. It’s open source and has a unified programming model, so you can build some serious data processing pipelines. But here’s where things get interesting. When you bring Apache Iceberg and Project Nessie into the mix, Flink becomes even more awesome. Iceberg is […]
Apache Iceberg is an open table format that enables robust, affordable, and quick analytics on the data lakehouse and is poised to change the data industry in ways we can only begin to imagine. Check out our Apache Iceberg 101 course to learn all the nuts and bolts about Iceberg. By storing your data in […]
Apache Iceberg is a data lakehouse table format that has been taking the data world by storm with robust support from tools like Dremio, Fivetran, Airbyte, AWS, Snowflake, Tabular, Presto, Apache Flink, Apache Spark, Trino, and so many more. Although one of the tools most data professionals use is Apache Spark and many introductory tutorials […]
Data engineering is an essential part of data science and analytics, as it involves transforming raw data into a usable form. With the rapid advancement of generative AI, it is becoming increasingly important for data engineers to know its capabilities and potential implications. Generative AI is a type of artificial intelligence (AI) used to create […]
Apache Iceberg is a data lake table format that is quickly growing its adoption across the data space. If you want to become more familiar with Apache Iceberg, check out this Apache Iceberg 101 article with everything you need to go from zero to hero. If you are a data engineer, data analyst, or data […]
Storage, compute, and regulatory costs can really add up when it comes to working with and managing your data. In traditional proprietary data warehouses, you must store your data in proprietary formats, organized in proprietary catalogs, to be queried with a proprietary engine. The result is vendor lock-in which over time, allows vendors to price […]
What’s a Table Format? One of the significant trends in data architecture is the idea of the data lakehouse, which combines the benefits of the data lake and the data warehouse, as exemplified by the following image: The centerpiece of this architecture is the table format, a metadata layer on top of your data lake […]
Apache Iceberg is an open table format that enables robust, affordable, and quick analytics on the data lakehouse and is poised to change the data industry in ways we can only begin to imagine. Check out our Apache Iceberg 101 course to learn all the nuts and bolts about Iceberg. The bottom line: Converting your […]
The Apache Iceberg format has taken the data lakehouse world by storm, becoming the keystone pillar of many firms’ data infrastructure. This article shows you how you can connect your Apache Iceberg tables to tools like Tableau so you can generate BI dashboards directly from your Iceberg tables without the need for cubes or extracts. […]
In today’s modern data lakes, you work with a separation of data and metadata with open table formats like Apache Iceberg giving you vastly improved query performance, the ability to time-travel, evolve your table’s partitions/schema, and much more. Open table formats rely on metadata catalogs to track where the metadata lives so engines can access […]