explainer
What is a Data Lake?
A data lake is a raw, unfiltered central repository of data used for businesses to keep all possible information for later analysis.
explainer
What is a Data Lake?
A data lake is a raw, unfiltered central repository of data used for businesses to keep all possible information for later analysis.
explainer
What is a Data Lake Engine?
A data lake engine is an application or service which queries and/or processes the vast sets of data living inside data lake storage.
explainer
What is Data Lineage?
Data lineage refers to the lifecycle of data, its origins and where it goes. The ability to track and monitor these data sources can improve the data flow process.
explainer
Data Lake vs. Data Warehouse
Data lakes and data warehouses are both widely used (often together) but they are not the same. Understanding the differences and how they can help your business dataempower your business intelligence.
webinar
Eliminate Data Transfer Bottlenecks with Apache Arrow Flight
Join us as we explore how Apache Arrow Flight solves data transfer bottlenecks by providing a new and modern standard for transporting large data between networked applications. We’ll even run a live bake-off to demonstrate how Arrow Flight enables more than 10x faster transfer rates for highly parallel systems compared to pyodbc.
webinar
Query Engine as Code with HashiCorp
HashiCorp's Terraform is the world's most frequently used tool for infrastructure provisioning using Infrastructure as Code (IaC). Watch this webinar to see how easily complex applications like Dremio can be generated automatically and reproducibly using Terraform. This automates and holistically manages the lifecycle of the required resources.
white paper
The Next-Generation Cloud Data Lake: An Open, No-Copy Data Architecture
To address data access bottlenecks and rising data warehousing costs, a next-gen cloud data lake architecture has emerged that brings together the best attributes of the data warehouse and the data lake. This new open data architecture is built to maximize data access with minimal data movement and no data copies.
keynote
Data Lakes Drive Decisions: A Virtual Fireside Chat
Mai-Lan Tomsen Bukovec, Global Vice President, at AWS discuss emerging trends in data lakes and how they are powering the next generation applications.
keynote
When People Are the Problem - Obstacles to a Data-Driven Culture
Billy Bosworth, CEO at Dremio discusses a career’s worth of insights into why some companies get it right, and others get it wrong because of common obstacles.
technical talk
The 2021 State of Data Operations: Emerging Challenges in Expanding Cloud Data Ecosystems
Current and future state of data access governance and sensitive data analytics, and how to avoid challenges when creating data governance strategy.
technical talk
Migrating to Parquet - The Veraset Story
Veraset is a data-as-a-service (DaaS) company that delivers PBs of geospatial data to customers across a variety of industries.
technical talk
Implementing a Data Mesh Architecture at JPMC
This session from JPMC discuss JPMC’s data lake via data mesh architecture and the wholesale credit risk use case for data lake via data mesh.
technical talk
Iceberg at Adobe: Challenges, Lessons & Achievements
We were on our second iteration of Adobe Experience Platform's (AEP) data lake when Apache Iceberg first came up on our radar.
technical talk
High-Performance Big Data Analytics Processing Using Hardware Acceleration
This talk will give an introduction to FPGAs and discuss their advantages and challenges in the context of big data analytics.
technical talk
Enabling Real-Time Analytics for Data Lakes with Apache Ignite
In this presentation, Matthew Halliday will show the audience how no-code ETL gives data teams the freedom to move fast and innovate with data.
technical talk
Effectively Cataloging Data Lakes with Amundsen and Dremio
This session will introduce the need for cataloging, an overview of Amundsen, integration with Dremio and conclude with a demo.
technical talk
Data Observability for Data Lakes: The Next Frontier of Data Engineering
This talk will introduce the concept of “data downtime” and how to eliminate it in your data lake, as well as the rest of your data ecosystem.
technical talk
Centralized Security and Governance in the Cloud
This talk will highlight new capabilities of centralized access control and how it can be used to provide robust security and governance.
technical talk
Arrow Flight and Flight SQL: Accelerating Data Movement
This talk will look at Apache Arrow Flight and how it plays a key role in accelerating the movement of data in modern data architectures.
technical talk
Analytics Engineering in Data Lakes with dbt
This talk present an analytics engineering toolset, dbt, is a natural fit to enable the intuitive workflows of data warehousing in the cloud data lake.
technical talk
A Git-Like Experience for Data Lakes
Project Nessie, decouples transactions and makes distributed transactions real using table formats capabilities to provide Git-like semantics for data lakes.
technical talk
Power BI Best Practices for Working with Big Data
In this session you’ll learn how to choose between Import mode and DirectQuery mode for your dataset, find out how composite models and aggregations can help.
technical talk
Plan, Design and Build a Successful Data Lake on AWS
This talk share how to best plan, design and build a successful data lake that will scale and evolve as your business needs and demands increase.
keynote
Analytics for Everyone - Unlocking the Full Power of All Your Data
Francois Ajenstat, CPO at Tableau in a Virtual Fireside Chat session on key market learnings, analytics market and analytics trend in years to come.
keynote
The New Data Tier
Tomer Shiran, Founder and CPO at Dremio presents the evolution of cloud data lakes and the separation of compute and data.
technical talk
Serverless Cloud Data Lake with Spark for Serving Weather Data
This session presents TWC’s architecture with serverless cloud data lake on top of Apache Spark and how that enables highly elastic and economic data serving.
technical talk
Scaling Data Access and Governance on Data Lakes: Challenges and Common Approaches
This talk discuss the challenges that teams face when trying to secure their data lake access as well as common approaches and trade-offs.
technical talk
Introducing InfluxDB IOx, a Federated In-Memory Columnar Store Backed by Object Storage
Paul Dix, CTO at InfluxData introduces InfluxDB IOx, the future open source core of the InfluxDB time series database.
technical talk
High Frequency Small Files vs. Slow Moving Datasets
In this presentation, we will review the design of Flux, its place in AEP's data lake, the challenges we had in operationalizing it and the final results.
technical talk
GOing Native with Arrow Flight and Dremio
This session describes how and why FactSet pursued native Golang connectors to Dremio, as well as a native Golang Apache Arrow Flight server and client implementation.
technical talk
From Discovering Data to Trusting Data
Mark Grover, Amundsen Creator presents overview of Amundsen and detail of both automated and curated metadata to show trusted and non-trusted data in Amundsen.
technical talk
Flexible Data Lake Architectures for Seamless Real-time Data and Machine Learning Integrations
This talk will go through various design problems spawned from integrations and solutions used at GFT and showcase those use cases in data lake architecture.
technical talk
Designing Performant, Scalable, and Secure Data Lakes
Rukmani Gopalan, PM at Microsoft presented the do's and don'ts of building enterprise data lakes including patterns, pipeline, organization and security.
technical talk
Deep Dive into Iceberg SQL Extensions
This talk will focus on the Iceberg SQL extensions, a recent development in the Iceberg community to efficiently manage tables through SQL.
technical talk
Data Lineage with Apache Airflow
This talk demonstrates how metadata management with Marquez helps maintain inter-DAG dependencies, catalog historical runs, and minimize data quality issues.
technical talk
Apache Iceberg: What's New
Ryan Blue, Software Engineer, Data Platform at Netflix present the latest features and updates on Apache Iceberg technology.
technical talk
5 Lessons Learnt from Building Context Aware Smart Data Lake
This session features five lessons learned from building a context-aware smart data lake.
webinar
Analyze Your Entire Cloud Data Lake in Real Time
Join technical experts from Tableau and Dremio as they discuss how to enable fast access to more complete data and accelerate query performance. They’ll demonstrate how you can easily connect Tableau to your data lake with Dremio to immediately begin driving better business decisions.
webinar
5 Big Data Predictions for 2021
Watch this webinar as Tomer Shiran, Dremio CPO and co-founder, discusses the five major trends he predicts will emerge in the new year that will make modern cloud data lakes the center of gravity for data architectures.
white paper
Building a Modern Architecture for Interactive Analytics on Amazon S3
Amazon S3 cloud object stores provide an ideal platform for data lake storage. However, to meet the performance requirements of data consumers, data teams need to extract subsets of data from the data lake and replicate it in a data warehouse. This extra step adds cost, slows time to insight and undermines the very benefits that data lakes were meant to achieve.
white paper
A Modern Cloud DataArchitecture for Financial Services
This solution brief explains how by adopting a modern cloud data architecture, financial services organizations can drive monumental change — dramatically improving time to insight to unlock new business models, grow revenue streams and deepen customer relationships
webinar
Enable High-Concurrency, Low-Latency BI on a Cloud Data Lake to Shrink Your Data Warehouse Cost
Watch this webinar where we’ll explore how innovative, new Dremio features enable high-concurrency, low-latency BI queries directly on Amazon S3 and Azure Data Lake Storage.
webinar
A Modern Architecture for Interactive Analytics on AWS Data Lakes
Built upon cost-efficient cloud object stores such as Amazon S3, cloud data lakes benefit from an open and loosely-coupled architecture that minimizes the risk of vendor lock-in as well as the risk of being locked out of future innovation.
case study
DATEV Software Improves Product Development Using Dremio for Self-Service Access to Software Usage Data
Dremio provides DATEV product managers with self-service access to software usage data to help them understand and improve their software and customer experience.
white paper
Ultimate Guide to the Cloud Data Lake Engine
This guide describes how to evaluate cloud data lake engine offerings based on their ability to deliver on their promise of improving performance, data accessibility and operational efficiency as compared with earlier methods of querying the data lake.
case study
NewDay Transitions from Disparate Legacy Systems to a Unified Data Platform Running the Dremio Data Lake Engine
NewDay is transforming its data infrastructure from legacy systems to a cloud-based, secure PCI DSS-compliant data platform on AWS, leveraging the Dremio data lake engine for self-service data access.
white paper
Top Considerations for Building an Open Cloud Data Lake
In this paper, we explore the top considerations for building a cloud data lake including architectural principles, when to use cloud data lake engines and how to empower non-technical users.
ebook
Migrating from On-Prem to a Modern Cloud Data Lake: The Cost-Efficiency and Performance Benefits
Cloud data lakes certainly offer many advantages over their on-prem counterparts. But simply residing in the cloud does not make a data lake “modern.” Learn more about Dremio’s unique approach to powering modern cloud data lakes.
case study
Dremio Drives Increased Revenue by Improving Supply Chain Analytics for Global Consumer Products Company
A global consumer products company is using Dremio to improve supply chain analytics—increasing the efficiency of data access, analytics and insights about supply chain and consumer demand.
case study
InCrowd Sports Uses Dremio to Provide Clients with Deeper Insights to Create Personalized Fan Experiences
InCrowd Sports uses Dremio to create a single 360° customer view across multiple data sources so they can offer their sports clients deeper insights and personalized fan experiences.
case study
NCR Uses Dremio to Deliver Business Insights at a Faster Clip
To meet the demand for faster data insights, increase analytic capacity and contain costs, NCR chose Dremio to help modernize its data analytics infrastructure.
case study
Digital Insurance Agency AP Intego Enhances Service and Revenue by Using Dremio to Gain 360° Customer View
AP Intego is improving customer service and revenue by using the Dremio data lake engine to integrate customer data from multiple sources into a single 360° customer view.
case study
Dremio Data Lake Engine Accelerates Insights and Drives Efficiency in Henkel’s Supply Chain
Henkel uses the Dremio data lake engine to join data silos, accelerate business insights and save millions through productivity improvements in their global supply chain.
webinar
Webinar Series: Cloud Data Lake Query Engine Showdown
In this webinar series, Serge demonstrates why Dremio successfully strikes the right query performance and cost-efficiency balance to win in head-to-head comparisons with specific distros of Presto.
technical talk
Smart Data Lakes for Predictive and Prescriptive Analytics
Kumar Madalli, VP of Search, Data Science and Big Data Platforms @ Telenav, presents "Smart Data Lakes for Predictive and Prescriptive Analytics – Telenav's Journey" at Subsurface Summer 2020.
technical talk
Reducing Time to Market with S7 Airlines Self Service Data Platform
Areg Azaryan, Enterprise Data Platform Product Owner @ S7 Airlines, presents "Reducing Time to Market with S7 Airlines Self Service Data Platform" at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
technical talk
Lessons Learned From Running Apache Iceberg at Petabyte Scale
Anton Okolnychyi, Apache Iceberg PMC Member and Apache Spark Contributor, presents "Lessons Learned From Running Apache Iceberg at Petabyte Scale" at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
technical talk
Lessons Learned from Operating an Exabyte Scale Data Lake at Microsoft
Raji Easwaran, Group Program Manager @ Microsoft Azure, presents "Lessons Learned from Operating an Exabyte Scale Data Lake at Microsoft" at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
technical talk
How to Build an IoT Data Lake
Tim Doernemann, Senior Lead Software Engineer and Michael Cammert, Senior Manager @ Software AG present "How to Build an IoT Data Lake" at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
technical talk
Hiveberg: Integrating Apache Iceberg with the Hive Metastore
Adrian Woodhead, Principal Data Engineer, and Christine Mathiesen, Software Engineer @ Expedia present "Hiveberg Integrating Apache Iceberg with the Hive Metastore" at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
technical talk
Functional Data Engineering - A Set of Best Practices
Maxime Beauchemin, CEO and Founder, Preset, presents "Functional Data Engineering - A Set of Best Practices" at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
technical talk
Extracting Value from Data Assets
Yannis Katsanos, Head of Customer Data Science @ Exelon Utilities, presents "Extracting Value from Data Assets" at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
technical talk
Data Lineage and Observability with Marquez
Julien Le Dem, co-founder and CTO, Datakin, presents "Data Lineage and Observability with Marquez" at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
technical talk
Building an Efficient Data Pipeline for Data Intensive Workloads
Ryan Murray, OSS Developer @ Dremio, presents "Building an Efficient Data Pipeline for Data Intensive Workloads" at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
technical talk
AWS Data Lake Architectures
Raghu Prabhu, Global Manager, Data Lakes @ AWS, presents "AWS Data Lake Architectures" at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
technical talk
Apache Arrow: A New Gold Standard for Dataset Transport
Wes McKinney, Director @ Ursa Labs, presents Apache Arrow, A New Gold Standard for Dataset Transport at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
keynote
Welcome with Billy Bosworth
Billy Bosworth starts the show with his welcome address at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
keynote
The Future of Intelligent Storage in Big Data
Daniel Weeks, Big Data Compute Team Lead @ Netflix, presents the keynote session "The Future of Intelligent Storage in Big Data" at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
keynote
The Future Is Open - The Rise of the Cloud Data Lake
Tomer Shiran, co-founder and CPO @ Dremio, presents the opening keynote session "The Future Is Open The Rise of the Cloud Data Lake" at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
keynote
Five Data Trends You Should Know
Tomasz Tunguz, Managing Director @ Redpoint Ventures, presents the closing keynote "Five Data Trends You Should Know" at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
keynote
Closing Session with Billy Bosworth
Billy Bosworth starts the show with his welcome address at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
webinar
Dremio vs. Presto Benchmarks - Top 3 Performance and Cost Comparisons That Matter Most
In this webinar, you will see proof that Dremio is the fastest and most cost-effective cloud data lake query engine available.
benchmark
Dremio vs. Presto – Performance and Efficiency Benchmark
In this paper, you will see comprehensive and detailed query performance and EC2 compute efficiency benchmark results and analysis for recent versions of Dremio and Presto. You’ll read about the differences between Dremio and multiple flavors of Presto: PrestoDB, PrestoSQL, Starburst Presto and AWS Athena.
white paper
Dremio vs. Presto – Performance and Efficiency Benchmark
In this paper, you will see comprehensive and detailed query performance and EC2 compute efficiency benchmark results and analysis for recent versions of Dremio and Presto. You’ll read about the differences between Dremio and multiple flavors of Presto: PrestoDB, PrestoSQL, Starburst Presto and AWS Athena.
explainer
What is AWS Glue?
AWS Glue is a fully managed extract, transform and load (ETL) service that automates the time-consuming data preparation process for consequent data analysis.
white paper
Reducing the Cost of Cloud Data Analytics
In this paper we look at three popular architecture choices for cloud data analytics, then describe how Dremio can help you accelerate projects and productivity at a fraction of the cost of cloud data warehouses and simple query engines.
white paper
Dremio Architecture Guide
Wondering what Dremio is and how it works? Download the Dremio Architecture Guide to understand Dremio in depth.
white paper
The Rise of the cloud Data Lake Engine: Architecting for Real-Time SQL Queries
Thanks to the advancement of cloud object storage, new processors and elastic compute, the cloud data lake has matured. It now supports many workloads, including both data science and business intelligence (BI), traditionally owned by the enterprise data warehouse (EDW). However, it faces some final growing pains when it comes to speed, complexity and efficiency.
white paper
Ensuring an Open Data Lake Future
In this paper, we outline several factors that should be considered to ensure an open data lake architecture, avoid vendor lock-in and reduce the risk of being locked out of future industry innovation.
webinar
The Rise of the Cloud Data Lake Engine: Architecting for Real-Time Queries
Join Eckerson Group and Dremio as we discuss the components of a cloud data lake engine, as well as common benefits, adoption trends and use cases.
white paper
Dremio Semantic Layer - Best Practices for Efficient and Productive Analytics
Download this whitepaper to learn how to use Dremio to architect a self-service semantic layer as well as its best practices and endless customization options.
webinar
Building an Efficient Data Architecture for Maximum Productivity
Are your data engineering teams spending weeks or even months on tedious ETL, OLAP cubes and BI extracts in order to provision data sets with enough performance for your BI and data science stakeholders?
webinar
Dremio Grundlagen II (Webinar in German Language)
In diesem Webinar präsentieren wir Ihnen die Architektur von Dremio basierend auf den Kernkomponenten Apache Arrow und unserem Semantic Layer.
webinar
Optimize Cloud Data Lake for Query Performance and Economics
Introducing Dremio AWS Edition, the lightning-fast data lake engine with a service-like experience and unparalleled resource efficiency
webinar
Best Practices for Building a Fast and Reliable IoT Data Pipeline
Learn how to seamlessly migrate your organizational data from an on-premise data lake to the cloud—and more quickly enjoy all of the resulting benefits.
ebook
11 Best Practices for Migrating to a Cloud Data Lake
Learn how to seamlessly migrate your organizational data from an on-premise data lake to the cloud—and more quickly enjoy all of the resulting benefits.
webinar
Building a Best-In-Class Data Lake
In this webinar, we will explore, advantages of cloud data lakes; Major challenges when building a data lake on the cloud; What a modern data architecture looks like and how to best implement it.
webinar
Dremio Grundlagen (Webinar in German Language)
In diesem Webinar möchten wir Ihnen unsere Sicht auf moderne Data-Analytics im Jahr 2020 vorstellen und dabei auch Einblick zu Dremio als Unternehmen geben.
webinar
How a Self-Service Semantic Layer for Your Data Lake Saves You Money
In this webinar we discuss Common challenges with semantic layers and how to overcome them; how a semantic layer reduces pipeline complexity, and best practices to successfully implement a semantic layer on the data lake.
webinar
5 Reasons Why You Are Still Asking Your Big Data Small Questions
We surveyed hundreds of data consumers, data architects, and data executives to understand and get a clear picture of where cloud and data lake modernization is, where it is headed, and why it still represents a challenge for many.
ebook
3 Steps for Making High-Performance BI Work Directly with Cloud Data Lake Storage
Now you can enjoy all of the benefits cloud data lake storage has to offer by empowering your cloud data lake to support the majority of your analytics workloads.
webinar
How Dremio and Tableau enable cloud data lake analytics at InCrowd Sports
In this webinar we explore how to accelerate query performance for BI and Data Science workloads; Visualize data directly on the data lake; Secure and govern the data lake; Leverage multiple data sources and bring them to life with Tableau and create a self-service semantic layer on the data lake.
webinar
Top 5 Data Industry Predictions for 2020: What Should You Expect?
In this webinar, we walk through our top five data industry predictions for 2020.
webinar
Enabling Self-Service Interactive Analytics on All of Your Data
In this webinar, Jason Hughes discusses how Dremio enables self-service interactive analytics.
webinar
Using a Data Lake Engine to Create a Scalable and Lightning Fast Data Pipeline
Learn how Dremio, the data lake engine, can help build a scalable and lightning fast data pipeline.
webinar
Dremio 4.0 – Technical Deep Dive
A deep dive into the new features of Dremio 4.0.
webinar
Creating a Cloud Data Lake for a $1 Trillion Organization
The Dremio team sits down to discuss exactly how a trillion dollar organization can build a data lake.
webinar
Dremio 3.3 – Technical Deep Dive
A deep dive into the new features of Dremio 3.3.
explainer
What is a Cloud Data Lake?
A cloud data lake is a cloud-hosted centralized repository that allows you to store all your structured and unstructured data at any scale, typically using an object store such as Amazon S3 or Microsoft Azure Data Lake Storage (ADLS).
explainer
Azure Storage Types and Use Cases
Within Azure there are two types of storage accounts, four types of storage, four levels of data redundancy and three tiers for storing files. We will focus on exploring each one of these options in detail to help you understand which offering adapts better to your big data storage needs.
webinar
Running SQL-Based Workloads in the Cloud Using Apache Arrow
At Strata 2019, Jacques Nadeau discussed how cloud-based SQL workloads can benefit from Apache Arrow.
webinar
Data Reflections: Accelerate your Queries Without Copies
In order to make raw data available for business users or data scientists to consume, companies often develop complex ETL pipelines in which data is copied many times between systems. These can be hard to maintain and prone to breakage.
webinar
Dremio 3.2 - Technical Deep Dive
A deep dive into the new features of Dremio 3.2.
webinar
Interactive Data Science and BI on the Hadoop Data Lake
Our VP of Marketing Kelly Stirman discusses how Dremio lets users build data science and BI workflows on their Hadoop data lake.
webinar
Running SQL Based Workloads in The Cloud at 20x - 200x Lower Cost Using Apache Arrow
Jacques Nadeau discusses the impacts of Apache Arrow and Gandiva when running SQL workloads in the cloud.
explainer
What is Apache Iceberg?
Apache Iceberg is a new table format that is rapidly becoming an industry standard for managing data in data lakes.
explainer
What is Azure Data Lake Storage (ADLS)?
Azure Data Lake Storage (ADLS), is a fully-managed, elastic, scalable, and secure file system that supports HDFS semantics and works with the Hadoop ecosystem.
webinar
Data Science Across Data Sources With Apache Arrow
Jacques Nadeau.
webinar
Dremio 3.1 - Technical Deep Dive
A deep dive into the new features of Dremio 3.1.
explainer
Starting Apache Arrow
Our CTO Jacques Nadeau sat down for a fireside chat with Wes Mckinnney, discussing the past, present, and future of Apache Arrow.
webinar
Dremio 3.0 - Technical Deep Dive
A deep dive into the new features of Dremio 3.0.
webinar
Dremio 2.1 - Technical Deep Dive
A deep dive into the new features of Dremio 2.1.
webinar
Conquering Slow, Dirty and Distributed Data with Apache Arrow and Dremio
At the 2018 Data Science Summit, CEO Tomer Shiran spoke about Dremio and Apache Arrow, outlining how projects like Pandas are utilizing Arrow to achieve high performance data processing and interoperability across systems.
webinar
Using LLVM to Accelerate Processing of Data in Apache Arrow
Dremio CEO Tomer Shiran was a guest on the Software Engineering Daily podcast to talk about how Dremio works and who it benefits.
explainer
Origin and History of Apache Arrow
A background and overview of the Apache Arrow project from the PMC Chair, Jacques Nadeau.
explainer
What is Apache Arrow?
Apache Arrow is an open source project, initiated by over a dozen open source communities, which provides a standard columnar in-memory data representation and processing framework. Arrow has emerged as a popular way way to handle in-memory data for analytical purposes.
story
The 2018 Bee: A Visual Story
A visualized preview of the 2018 Scripps National Spelling Bee.
webinar
Vectorized Query Processing With Apache Arrow
Dremio software engineer Siddharth Teotia provides an overview of Apache Arrow and breaks down the benefits of vectorized query processing at our May 2018 Apache Arrow SF meetup.
webinar
Apache Arrow Integration with Spark
IBM software engineer Bryan Cutler provides an overview of Apache Arrow integration with Spark.
webinar
Apache Arrow In Theory, In Practice
An in-depth walkthrough of Apache Arrow with Dremio CTO Jacques Nadeau from the May 2018 Apache Arrow SF meetup.
webinar
Evolve 2018: Pitch Your Tech in 5 Minutes
Dremio's CMO Kelly Stirman pitches the benefits of Dremio at Evolve 2018.
story
Serena The Great: A Visual Story
A visual story of the most dominant player in tennis.
webinar
Making Big Data Self-Service for Users
Kelly Stirman sits down with Truth in IT to discuss how Dremio can help make big data self-service for it's users.
webinar
Dremio 2.0 - Technical Deep Dive
A deep dive into the new features of Dremio 2.0.
webinar
Fast and Flexible: Interactive BI Arrives
CMO Kelly Stirman provides an overview of data lake challenges and how users can navigate the growing complexity of self-service data with help from Dremio.
webinar
Data Access for Data Science
CTO Jacques Nadeau spoke at the 2018 AnacondaCON, detailing how Apache Arrow and Dremio enable users to access and analyze data across disparate data sources.
webinar
Making BI Work with a Data Lake
CMO Kelly Stirman discusses how to incorporate Business Intelligence into Data Lakes, and then provides some real world examples using Dremio.
white paper
Dremio Security Architecture Guide
Wondering what Dremio’s security features and how they work? Download the Dremio Security Architecture Guide to understand Dremio in depth.
white paper
Dremio Data Reflections Overview & Best Practices
Data Reflections are Dremio’s patented ability to accelerate queries on data from any source and any size. Download this white paper to learn about how Data Reflections work and best practices for designing and managing their use.
white paper
6 Self-Service Analytics Obstacles Data Teams Should Not Ignore
Read this paper to learn about the reasons progress toward self-service analytics has regressed and what data engineers and architects can do about it.
story
The Winter Olympics: An Interactive Visualization
A visual story of the Winter Games.
webinar
Making Data Fast and Easy to Use with Data Reflections
Tomer Shiran discusses data reflections and how they can help speed up data access and analysis.
webinar
How New Companies Can Contribute to Open Source
Jacques Nadeau talks with Swapnil Bhartiya, founder of TFiR, about the ways new companies can contribute to Open Source.
webinar
Vectorized Query Processing Using Apache Arrow
Dremio software engineer Siddharth Teotia provides an overview of Apache Arrow and breaks down the benefits of vectorized query processing.
webinar
Building an Analytics Stack on AWS with Dremio
CEO Justin Bock of Bock Corporation sits down with Dremio's Kelly Stirman to discuss building an analytics stack on AWS using Dremio.
webinar
Jacques Nadeau Discusses Dremio and Big Data with theCube
CTO and Co-Founder Jacques Nadeau sits down with theCube to discuss Dremio's role in the future of Big Data.
story
Trump's Greatest Tweets: A Sentiment Analysis
A visual story of the most dominant player in tennis.
webinar
The Heterogeneous Data Lake
A webinar by Tomer Shiran about the rise of Heterogeneous Data.
webinar
The Columnar Roadmap: Apache Parquet and Apache Arrow
A presentation by Julien Le Dem about the Columnar Roadmap using Apache Parquet and Apache Arrow.
webinar
Simplifying and Accelerating Data Access for Python
Sudheesh Katkam discusses how you can use Python with Dremio to simplify and accelerate access to several different data sources together.
webinar
Using Arrow, Calcite and Parquet to build a Relational Cache
Jacques Nadeau talks about how layering in-memory caching, columnar storage and relational caching can combine to provide a substantial improvement in overall data science and analytical workloads.
webinar
Improving Python and Spark Performance and Interoperability with Apache Arrow
A talk with Julien Le Dem and Ji Lin about using Apache Arrow to improve the performance of Apache Spark and Python while scaling up data processing.
explainer
What are ETL Tools?
ETL stands for extract, transform and load. ETL tools move data between systems. If ETL were for people instead of data, it would be public and private transportation. Companies use ETL to safely and reliably move their data from one system to another.
explainer
What is Data Engineering?
Data Engineering helps make data more useful and accessible for consumers of data, sourcing and preparing it for data scientists.
explainer
What is a Data Warehouse?
At its simplest, data warehouse is a system used for storing and reporting on data. The data typically originates in multiple systems, then it is moved into the data warehouse for long-term storage and analysis.
explainer
What is a Data Pipeline?
A data pipeline is a series of steps or actions (typically automated) to move and combine data from various sources for analysis or visualization.
webinar
Arrow C++ Roadmap and pandas2
Arrow C++ roadmap and Pandas2 talk from Wes McKinney, Arrow committer and creator of Python Pandas.
webinar
Apache Arrow: In Theory, In Practice
A talk through Apache Arrow with Dremio CTO Jacques Nadeau.
webinar
Dremio 1.3 - Technical Deep Dive
A deep dive into the new features of Dremio 1.3.
webinar
New Options for Moving Analytics to the Cloud
A Dremio webinar that explores new options for moving your analytics to the cloud.
webinar
Intro to Self-Service Data With Dremio
A webinar that introduces new users to self-service data with Dremio.
story
Big Data Debt Calculator
Calculate the debt that your company has accrued from big data.
webinar
What are Dremio and Apache Arrow?
CTO and co-founder Jacques Nadeau sits down with Datameer to discuss the launch of Apache Arrow and the future of Dremio.