Deliver Lightning-Fast, Self-Service Analytics
Maximize the power of your data with Dremio—the data lake engine. Dremio operationalizes your data lake storage and speeds your analytics processes with a high-performance and high-efficiency query engine while also democratizing data access for data scientists and analysts via a governed self-service layer. The result is fast, easy data analytics for data consumers at the lowest cost per query for IT and data lake owners.
Separate your data and compute
Leverage Dremio to run your live, interactive queries directly against your petabyte-scale data in your own data lake storage, avoiding data copies, movement and lock-in altogether. Dremio goes beyond separation of storage and compute to separate data and compute with an open, best-of-breed architecture in which any compute engines can work with your data. Your data stays in its existing systems and formats, on-prem or within your AWS or Azure account, and under your control. Use Dremio alongside hundreds of other technologies that also work with data lake storage, including ETL services, data science tools and compute engines.
Accelerate with an Apache Arrow-based query engine—and save >90%
Harness multi-stage acceleration to drive lightning-fast queries directly on your data lake storage. Dremio’s combination of patent-pending technologies speeds queries by up to 100x. This means that for any given performance level, you’ll only need a fraction of the cloud compute infrastructure and associated costs. And when you combine this efficiency with the additional savings from elastic engines, you’ll eliminate >90% of your AWS compute costs compared to traditional SQL engine approaches.
Powered by Apache Arrow
Drive up to 4x faster ad-hoc queries on the same compute infrastructure with Dremio’s high-performance execution engine, powered by Apache Arrow. Co-created by Dremio, Arrow is now the industry standard for columnar, in-memory analytics—and Dremio is the first and only query execution engine built from the ground up to take advantage of it.
Eliminate latency issues
Solve the challenge of low latency read access to cloud data lake storage with massively parallel readers and Predictive Pipelining to fetch data with extreme concurrency. Add Columnar Cloud Cache (C3) to automatically cache and refresh data on local NVMe storage within AWS and Azure as it’s being accessed, enabling NVMe-level speed on data lake storage.
Turn on extreme speed
Accelerate BI dashboarding and reporting queries by another 10-100x with Data Reflections in Dremio. Reflections are physically optimized parquet data structures which Dremio invisibly and automatically incorporates into query plans to provide maximum query speed when you need it. And that’s not all: Dremio automatically keeps Reflection data up to date behind the scenes.
Move data 1000x faster
For your data science initiatives, replace legacy ODBC and JDBC protocols with Arrow Flight, a high-speed distributed protocol designed to handle modern big data workloads. Arrow Flight provides a 1000x increase in throughput between client applications and Dremio. That’s like populating a client-side Python or R data frame with millions of records in seconds.
Enjoy a service-like experience in your own AWS VPC with Dremio AWS Edition
Harness deep levels of automation, complete multi-tenancy and unparalleled cost-per-query resource efficiency with the new Dremio AWS Edition. We’ve automated the entire process of deploying, running, scaling and protecting Dremio on your AWS data lake—and we’re driving down your AWS infrastructure costs at the same time.
Elastic engines: optimize compute resource consumption
Eliminate under- and over-provisioning of compute resources, eliminate workload contention and further slash cloud costs by 60% or more with elastic engines. Configure any number of query engines, each one not only sized and tailored to the workload it supports, but equipped with elastic, on-demand scale. Engines automatically start and scale up to full size, then scale back down and stop, according to query traffic, eliminating cloud costs for idle workloads. Individual engine performance also scales linearly with additional execution nodes for a dramatic acceleration of long-running query workloads at the same cloud infrastructure cost.
Parallel projects: complete automation, full multi-tenancy
Enable data engineers and data analysts to deploy an optimized Dremio instance from scratch, start querying their data in minutes and effortlessly stay current with the latest Dremio features. Parallel projects provides a service-like experience in your AWS account, with end-to-end lifecycle automation, best practice configurations and upgrades delivered with a simple restart. Parallel projects are fully isolated, multi-tenant Dremio instances—which enable business unit independence and facilitate compliance.
Deliver self-service with IT governance
Dremio establishes views into data (called virtual datasets) in a semantic layer on top of your physical data, so data analysts and engineers can manage, curate and share data—while maintaining governance and security—but without the overhead and complexity of copying data. Connect any BI or data science tool, including Tableau, Power BI, Looker and Jupyter Notebooks, to Dremio and start exploring and mining your data lake for value.
Dremio’s semantic layer is fully virtual, indexed and searchable, and the relationships between your data sources, virtual datasets and transformations and all your queries are maintained in Dremio’s data graph, so you know exactly where each virtual dataset came from. Role-based access control makes sure that everyone has access to exactly what they need (and nothing else), and SSO enables a seamless authentication experience.
Optionally join data from other sources with your data lake
Where necessary, you can connect to and join with external data sources for advanced analytics use cases. No need to move the data—Dremio provides optimized data access to each external data source, which slashes your time to value and enables you eventually to migrate data and analytics workloads to your data lake storage at your own pace.
Harness flexible deployment options across clouds
Run Dremio on AWS, Azure and/or on-premise, and get a consistent, modern and self-service experience. Network permitting you can even query data across disparate regions or clouds in a hybrid fashion. Better still, use Dremio to accelerate your cloud journey by migrating your semantic layer from your on-prem Dremio instance to your cloud Dremio instance.