Data Sharing of Apache Iceberg tables and other data in the Dremio Lakehouse

Start For Free

Copied to clipboard

Types of Data Sharing

How Dremio Facilitates Data Sharing

Conclusion

Data sharing is becoming increasingly important in the data world. Not all the data we need can be generated in-house, and our data can also be a valuable asset for generating revenue or building strategic partnerships. Leveraging tools that enable data sharing can significantly enhance the value of your data. In this blog, we aim to clarify the different types of data sharing and explore how the Dremio Lakehouse Platform can enhance data sharing capabilities within your data platform.

We can categorize data sharing into three primary categories:

Data Marketplaces: These platforms allow you to add your datasets for others to purchase or subscribe to, either for free or for a fee. Popular data marketplaces include Snowflake and AWS.

Sharing Data with Compute: In this model, you provide access to your data and the necessary compute resources. Users can utilize the data, but you are responsible for the cost of compute and data storage.

Sharing Data without Compute: Here, users are given access to the data without a preconfigured compute engine. Examples include using Delta Sharing for Delta Lake tables or bringing an Apache Iceberg catalog to your preferred compute engine.

Dremio offers features that support and enhance all three of these data-sharing pathways, enabling seamless and efficient data sharing within your data platform.

Regarding data sharing marketplaces, Dremio doesn’t manage its own marketplace but can connect to platforms like S3, AWS Glue, and Snowflake. This allows you to maximize the value of datasets you've purchased by joining them with data you have elsewhere. Additionally, since Dremio integrates Nessie, and with Nessie's upcoming interoperability with Snowflake, it may soon be possible to list datasets curated in Dremio on Snowflake’s Marketplace.

Using Dremio, you can create users and assign them different access levels within your Dremio organization. Users can be granted dataset access and then query these datasets by logging into Dremio or using the REST API, Apache Arrow Flight, or JDBC/ODBC. In this scenario, when they query the data you’ve given them access to, they will use your Dremio cluster to share both your data and compute resources. Alternatively, if using the integrated Dremio Catalogs powered by Nessie, you can grant a user access to the catalog, which they can bring to engines like Apache Spark, Apache Flink, Presto, Trino, and more, using their own compute resources to query tables in the catalog.

In summary, Dremio enables:

Query Federation: Integrate data from multiple data marketplaces using Dremio’s query federation capabilities.
Shared Compute: Grant users access to individual datasets and data sources, allowing them to use your Dremio cluster for queries.
Catalog Access: Provide users access to a Dremio catalog, which they can then query with any supporting tool or library, bringing their own compute resources.

Suppose you are using other catalogs, such as standalone Nessie, Graviton, AWS Glue, and others. In that case, each has its own methods for granting access to share data without compute via Apache Iceberg catalog access.

Conclusion

Dremio offers a versatile and powerful platform for data sharing, whether through integrating with existing data marketplaces, providing shared compute resources, or enabling independent data access via catalogs. By leveraging these capabilities, you can maximize the value of your data, streamline collaboration, and create new opportunities for revenue and partnerships. Dremio’s comprehensive approach to data sharing ensures that you can meet your organization’s needs while maintaining control and governance over your data assets.

Want to explore how to unify, collaborate and share your data with Dremio? Contact Us

Here are Some Exercises for you to See Dremio’s Features at Work on Your Laptop

Explore Dremio University to learn more about Data Lakehouses and Apache Iceberg and the associated Enterprise Use Cases. You can even learn how to deploy Dremio via Docker and explore these technologies hands-on.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.

Start For Free

Article Topics

Product Insights from the Dremio Blog

Blog coverpage for Ingesting Data into Aparche Iceberg with Dremio

Feb 1, 2024 Product Insights from the Dremio Blog

Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg

By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.