Dremio Jekyll

The Cloud Data Lake Site

Events

Data Lake Technologies

Apache Iceberg

Apache Iceberg

Apache Iceberg is a new table format that enables multiple applications to work together on the same data in a transactionally consistent manner.

Project Nessie

Project Nessie

Project Nessie provides Git-like semantics for data lakes and enables a single transaction spanning operations from multiple users and analytics engines.

Apache Arrow

Apache Arrow

Apache Arrow provides a columnar in-memory format for flat and hierarchical data. It supports zero-copy reads for fast data access without serialization overhead.

Apache Arrow Flight

Apache Arrow Flight

Apache Arrow Flight is an open source data connectivity technology that provides ten times faster data transfer rates than ODBC, JDBC and pyodbc.

Amazon S3

Amazon S3

Amazon S3 (Simple Storage Service) is an AWS service that provides object storage, which is commonly used as a data lake for analytics.

Amundsen

Amundsen

Amundsen is a metadata-driven app that indexes data resources (tables, dashboards, streams, etc.) and powers a page-rank style search based on usage patterns.

Apache Airflow

Apache Airflow

Apache Airflow is an open-source workflow management platform designed by Airbnb to programmatically author and schedule workflows and monitor them.

Apache Parquet

Apache Parquet

Apache Parquet is a columnar storage format compatible with your choice of data processing framework, data model or programming language.

Apache Spark

Apache Spark

Apache Spark is an analytics engine that provides an interface for programming clusters with implicit data parallelism and fault tolerance.

AWS Glue

AWS Glue

AWS Glue is a fully managed extract, transform and load (ETL) service that automates the time-consuming data preparation process for data analysis.

AWS Lake Formation

AWS Lake Formation

AWS Lake Formation is a fully managed service that makes it easier to bring data into a data lake from various sources using pre-defined templates.

Azure Data Lake Storage (ADLS)

Azure Data Lake Storage (ADLS)

Microsoft Azure Data Lake Storage (ADLS) is a fully managed, elastic, scalable and secure file system suitable for storing a large variety of data.

Hive Metastore

Hive Metastore

Hive metastore (HMS) stores metadata related to Apache Hive and other services in a backend RDBMS, such as MySQL or PostgreSQL.