March 8, 2022

How Dremio Sonar and Arctic Bring the Lakehouse to Life

Mark Lyons

Mark Lyons · Vice President of Product Management, Dremio

Until now, data lakes have been too difficult for most companies. Despite significant innovation in query engines and table formats to bring data warehouse performance and functionality to the data lake, companies are stuck figuring out how to build and maintain data lakes instead of deriving value from data.

That’s why we built Dremio Cloud, the world’s first free, fully managed data lakehouse platform that makes creating insights from data easier than ever. (If you haven’t done so, I highly suggest you check out the blog from our founder, Tomer, that discusses the vision and motivation behind the Dremio Cloud platform.)

In this blog, we’ll take a closer look into Dremio Cloud’s key services: Dremio Sonar, a lakehouse query engine, and Dremio Arctic, an intelligent metastore for Apache Iceberg that provides a unique Git-like experience for the lakehouse. These services bring new capabilities to the data lakehouse, beyond what was previously possible in both data lakes and warehouses, so we’ll take some time to discuss the key features and technologies that make this possible.

Dremio Sonar: A Lakehouse Query Engine

Sonar is a lakehouse query engine that provides lightning-fast SQL queries directly on data lakes and a self-service user experience that makes data consumable, consistent, and collaborative.

Simply put, Sonar helps organizations access more data freely so they can make better business decisions. Sonar does this by combining a best-in-class query engine with a seamless, self-service user experience for data consumers. Here’s a quick rundown of the key technologies that make this possible:

  • Query Engine (powered by Apache Arrow): Sonar’s query engine delivers all the performance and functionality of a data warehouse directly on the data lake, including DML operations. The query engine is built to support all SQL workloads on the lakehouse, from ad-hoc & exploratory to mission-critical BI dashboards. You can also connect to a variety of RDBMSs, enabling analysts to join data between the lake and other data sources all with ANSI SQL compatibility.
  • Reflections: A query acceleration technology that speeds up queries behind the scenes, so data applications and analysts can interact seamlessly with data without needing to worry about optimizing their data and queries. Reflections enable sub-second query response times by automatically and transparently rewriting query plans to utilize different aggregations or layouts of tables and views.
  • Spaces: An integrated semantic layer that enables data teams to deliver a consistent and secure view of data to data consumers, and enables analysts to curate, analyze, and share datasets in a self-service manner. Spaces enable datasets across lakes and other sources to be exposed as reusable metrics.
  • SQL Runner and SQL Profiler: Sonar provides a best-in-class integrated experience for analysts who know and love SQL, including a feature-rich IDE (SQL Runner) and the world’s easiest and most advanced tool for understanding and troubleshooting query performance (SQL Profiler). 
  • Arrow Flight and FlightSQL: A next-generation interface for interacting with databases that is 20x faster than ODBC and JDBC and supports a variety of programming languages.
  • Frictionless BI Tool Integrations: Native connectors in leading BI tools, including Power BI, Tableau, dbt, Hex, and Preset, enable users to quickly and easily visualize their data from their favorite BI tool.

Sonar works for everyone, from BI teams with stringent SLAs, to developers building data applications, or citizen data analysts trying to answer new business questions. Enterprises around the world use Sonar to drive their SQL workloads, including 3 of the Fortune 5.

Dremio Arctic: An Intelligent Metastore for Apache Iceberg

Arctic is an intelligent metastore for Apache Iceberg that uniquely provides users a Git-like experience to branch, tag, and time travel datasets all while automatically optimizing the files to ensure high-performance analytics today and in the future as users, use cases, and data volumes grow.

Simply put, Arctic helps organizations automate lakehouse operations and simplify data workflows. It brings several key capabilities to the lakehouse that innovate far beyond the legacy metastores of earlier data lakes:

  • Metastore (powered by Nessie): A metastore service that enables a Git-like experience for the lakehouse across any engine, including Sonar, Flink, Presto, and Spark. Data engineers can use Git-like branching to transform data across tables and schemas in isolation without impacting production workloads, and merge changes once they’ve been validated. In addition, teams can time travel across their entire lakehouse environment (not just a single table) to reproduce ML models or dashboards based on a specific point in time. The metastore service enables multi-statement (and multi-engine) transactions, safe and easy experimentation, and referential integrity enforcement.
  • Data Optimization: Background compute jobs that automate all the tedious bits of data management for the lakehouse, including compaction, repartitioning, and indexing. With Arctic, teams no longer need to worry about how data is physically organized in files, and any compute engine running on that data can operate more efficiently.

Until now, data lakes and lakehouses have been difficult to adopt because of the overhead needed to build and manage them. Arctic is a revolutionary service that not only makes lakehouses easier than ever before by automating data management tasks, but also gives data teams entirely new ways to work with data.

