Data Mastery Hub: Term Resource for Data Professionals

Whether you're a newcomer to the world of big data and data lakes or an experienced pro looking to expand your knowledge, the Dremio Wiki provides insights and guidance for all your data-related needs. Dive in and unlock the power of your data today!

A


Delta Lake

ACID Transaction

Learn about the fundamental properties of ACID transactions and how they guarantee reliability and data integrity in databases

Apache

Apache Beam

Apache Beam is an open-source platform for processing big data that provides a unified programming model and can run on different execution engines.

Apache

Apache Camel

Apache Camel is an open-source framework for enterprise integration patterns. It provides a set of predefined components for integrating diverse systems and data sources.

Apache

Apache CXF

Apache CXF is an open-source, fully featured web service framework. It provides an efficient, reliable and flexible architecture for creating and consuming SOAP and RESTful web services.

Apache

Apache Flink

Apache Flink is an open-source data processing framework for building real-time and batch processing pipelines.

Apache

Apache HBase

Apache HBase is an open-source, column-oriented, distributed database designed to store and manage massive amounts of unstructured data; built on top of Apache Hadoop.

Apache

Apache Hive

Apache Hive is a data warehouse technology that facilitates querying and managing of large datasets stored in distributed storage systems like Hadoop.

Data lakehouse

Apache Hudi

Apache Hudi (Hadoop upserts, deletes, and incrementals) is an open-source data management framework designed for big data workloads.

Apache

Apache Knox

Apache Knox is a security layer for your Hadoop ecosystem, providing a single point of authentication and security.

Apache

Apache Lucene

Apache Lucene is a powerful, open-source information retrieval library that provides easy-to-use and scalable search capabilities to applications.

Apache

Apache NiFi

Apache NiFi is a tool for data integration and flow management, with a graphical user interface, large processor library, and data processing capabilities.

Uncategorized

Apache Pig

Learn about Apache Pig: a high-level platform for data processing and ETL workflows. Features include Pig Latin, UDFs, scalability, and interoperability.

Apache

Apache ServiceMix

Apache ServiceMix is an open-source integration container that provides a lightweight and flexible integration framework. It is built on top of Apache Karaf and Apache Camel

Apache

Apache Solr

Apache Solr is a fast and reliable search engine platform that offers a wide range of features like faceted search, hit highlighting, and more.

Apache

Apache Spark

Apache Spark is an open-source distributed computing system that can handle large amounts of data processing tasks. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

D


Data lakehouse

Data Catalog

Improve collaboration and decision-making while ensuring data quality and compliance. Learn more about data catalogs here.

Data Engineering

Data Cleansing

Data cleansing is the process of detecting and correcting or removing inaccurate, incomplete, or irrelevant data.

Data lakehouse

Data Cube

A data cube, also known as a multi-dimensional cube or a hypercube, is a data structure that allows for efficient querying and analysis of data.

Data lakehouse

Data Discovery

Unlock the full value of your data with data discovery. Discover, understand, and analyze your data to make better decisions and solve business problems.

Data Fabric

Data Fabric

A data fabric is a unified and integrated data management framework that enables organizations to manage data seamlessly across various data sources, locations, and formats.

Data lakehouse

Data Governance

Data Governance is the overall management of the availability, usability, integrity, and security of data used within an organization.

Uncategorized

Data Integration

Learn about data integration, its benefits, and how it streamlines decision-making by consolidating diverse datasets for effective analysis and reporting.

Data lakehouse

Data Lake

A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale.

Data lakehouse

Data Lakehouse

A data lakehouse is a centralized repository that allows organizations to store structured and unstructured data at any scale.

Data lakehouse

Data Lineage

Data lineage is the process of tracking the data as it moves through different systems and stages of its lifecycle.

Uncategorized

Data Mart

Unlock the full value of data with data marts. Tailor your data to specific departments and subject areas to improve decision-making and drive business growth.

Data Mesh

Data Mesh Architecture

Data mesh is a relatively new concept in the field of data architecture that emphasizes the importance of decentralizing data ownership and management.

Data Mesh

Data Mesh vs. Data Lake

While both data mesh and data lake are popular concepts in modern data architecture, they are distinct in their approach and purpose.

Uncategorized

Data Migration Tools

Explore the world of data migration tools, their key features, and best practices to ensure successful data transfers between systems.

Uncategorized

Data Modeling

Explore data modeling, its importance, and how it helps organizations manage data effectively, optimize performance, and drive decision-making.

Data Engineering

Data Processing

Learn about data processing: its types, importance, and methods. Discover how it can help optimize business operations and make better decisions.

Data lakehouse

Data Quality

Data quality refers to the overall fitness and usefulness of data for a specific purpose or application.

Data Transformation

Data Transformation

Data transformation converts data to a new format or structure for analysis or integration.

Data lakehouse

Data Warehouse

A data warehouse is a centralized repository that is designed to store and manage large amounts of data from various sources.

Uncategorized

Data Warehousing

Dive into data warehousing – a centralized repository for storing, managing, and analyzing data from diverse sources. Enhance decision-making today!

Data Wrangling

Data Wrangling

Data wrangling is the process of cleaning, transforming, and integrating data to make it suitable for analysis, leading to better decision-making.

DataOps

DataOps Architecture

DataOps architecture refers to the framework and practices used to manage and optimize data pipelines in a way that supports agile development.

DataOps

DataOps Best Practices

DataOps is a collaborative approach to data management that aims to improve the efficiency and effectiveness of data pipelines.

DataOps

DataOps vs. DevOps

DevOps and DataOps share similarities in terms of their principles and practices, but they also have differences in their focus, processes, and outcomes.

Data lakehouse

Delta Lake

Delta Lake is a storage layer that sits on top of existing data lakes, enabling data engineers and scientists to perform special functions with data.

Delta Lake

Delta Lake Merge

Delta Lake Merge is a versatile tool that enables users to combine data from multiple sources quickly, efficiently, and reliably.

DataOps

DevOps

DevOps is a software development methodology emphasizing collaboration and communication between development and operations teams.

Uncategorized

Domain-Driven Design

Explore the principles and techniques of Domain-Driven Design. Discover how it helps tackle software complexity and aligns it with business needs.

Topics

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us