Get Started Free
No time limit - totally free - just the way you like it.Sign Up Now
Whether you're a newcomer to the world of big data and data lakes or an experienced pro looking to expand your knowledge, the Dremio Wiki provides insights and guidance for all your data-related needs. Dive in and unlock the power of your data today!
Learn about the fundamental properties of ACID transactions and how they guarantee reliability and data integrity in databases
Apache Beam is an open-source platform for processing big data that provides a unified programming model and can run on different execution engines.
Apache Camel is an open-source framework for enterprise integration patterns. It provides a set of predefined components for integrating diverse systems and data sources.
Apache CXF is an open-source, fully featured web service framework. It provides an efficient, reliable and flexible architecture for creating and consuming SOAP and RESTful web services.
Apache Flink is an open-source data processing framework for building real-time and batch processing pipelines.
Apache HBase is an open-source, column-oriented, distributed database designed to store and manage massive amounts of unstructured data; built on top of Apache Hadoop.
Apache Hive is a data warehouse technology that facilitates querying and managing of large datasets stored in distributed storage systems like Hadoop.
Apache Hudi (Hadoop upserts, deletes, and incrementals) is an open-source data management framework designed for big data workloads.
Apache Knox is a security layer for your Hadoop ecosystem, providing a single point of authentication and security.
Apache Lucene is a powerful, open-source information retrieval library that provides easy-to-use and scalable search capabilities to applications.
Apache NiFi is a tool for data integration and flow management, with a graphical user interface, large processor library, and data processing capabilities.
Learn about Apache Pig: a high-level platform for data processing and ETL workflows. Features include Pig Latin, UDFs, scalability, and interoperability.
Apache ServiceMix is an open-source integration container that provides a lightweight and flexible integration framework. It is built on top of Apache Karaf and Apache Camel
Apache Solr is a fast and reliable search engine platform that offers a wide range of features like faceted search, hit highlighting, and more.
Apache Spark is an open-source distributed computing system that can handle large amounts of data processing tasks. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Discover the power of big data analytics – gain insights, improve decision-making, and increase efficiency for businesses today.
BLOBs are a powerful data type for storing large volumes of binary data, such as images, videos, and audio files within a database management system.
Improve collaboration and decision-making while ensuring data quality and compliance. Learn more about data catalogs here.
Data cleansing is the process of detecting and correcting or removing inaccurate, incomplete, or irrelevant data.
A data cube, also known as a multi-dimensional cube or a hypercube, is a data structure that allows for efficient querying and analysis of data.
Unlock the full value of your data with data discovery. Discover, understand, and analyze your data to make better decisions and solve business problems.
A data fabric is a unified and integrated data management framework that enables organizations to manage data seamlessly across various data sources, locations, and formats.
Data Governance is the overall management of the availability, usability, integrity, and security of data used within an organization.
Learn about data integration, its benefits, and how it streamlines decision-making by consolidating diverse datasets for effective analysis and reporting.
A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale.
A data lakehouse is a centralized repository that allows organizations to store structured and unstructured data at any scale.
Data lineage is the process of tracking the data as it moves through different systems and stages of its lifecycle.
Unlock the full value of data with data marts. Tailor your data to specific departments and subject areas to improve decision-making and drive business growth.
Data mesh is a relatively new concept in the field of data architecture that emphasizes the importance of decentralizing data ownership and management.
While both data mesh and data lake are popular concepts in modern data architecture, they are distinct in their approach and purpose.
Explore the world of data migration tools, their key features, and best practices to ensure successful data transfers between systems.
Explore data modeling, its importance, and how it helps organizations manage data effectively, optimize performance, and drive decision-making.
Learn about data processing: its types, importance, and methods. Discover how it can help optimize business operations and make better decisions.
Data quality refers to the overall fitness and usefulness of data for a specific purpose or application.
Data transformation converts data to a new format or structure for analysis or integration.
A data warehouse is a centralized repository that is designed to store and manage large amounts of data from various sources.
Dive into data warehousing – a centralized repository for storing, managing, and analyzing data from diverse sources. Enhance decision-making today!
Data wrangling is the process of cleaning, transforming, and integrating data to make it suitable for analysis, leading to better decision-making.
DataOps architecture refers to the framework and practices used to manage and optimize data pipelines in a way that supports agile development.
DataOps is a collaborative approach to data management that aims to improve the efficiency and effectiveness of data pipelines.
DevOps and DataOps share similarities in terms of their principles and practices, but they also have differences in their focus, processes, and outcomes.
Delta Lake is a storage layer that sits on top of existing data lakes, enabling data engineers and scientists to perform special functions with data.
Delta Lake Merge is a versatile tool that enables users to combine data from multiple sources quickly, efficiently, and reliably.
DevOps is a software development methodology emphasizing collaboration and communication between development and operations teams.
Explore the principles and techniques of Domain-Driven Design. Discover how it helps tackle software complexity and aligns it with business needs.
Discover the power of Online Analytical Processing (OLAP) for data analysis. Gain insights by leveraging multidimensional data modeling and analysis techniques
Open data is data that is stored in the data lake and is freely available for anyone to use, reuse, and redistribute without any legal, technological, or financial restrictions.
Some commonly used open-source tools for building a data mesh architecture include Apache Kafka, and Apache Spark, among others.
This page provides an overview of the benefits of a semantic layer, common use cases, and best practices for building and maintaining a semantic layer.
Learn about stored procedures, their purpose, and their benefits in managing complex database operations, improving maintainability and performance.
Learn the pros and cons of structured and unstructured data and how they are stored in data lakes and data warehouses for analysis.