Data Mastery Hub: Term Resource for Data Professionals
Whether you're a newcomer to the world of big data and data lakes or an experienced pro looking to expand your knowledge, the Dremio Wiki provides insights and guidance for all your data-related needs. Dive in and unlock the power of your data today!
A
Delta Lake
ACID Transaction
Learn about the fundamental properties of ACID transactions and how they guarantee reliability and data integrity in databases
Apache
Apache Beam
Apache Beam is an open-source platform for processing big data that provides a unified programming model and can run on different execution engines.
Apache
Apache Camel
Apache Camel is an open-source framework for enterprise integration patterns. It provides a set of predefined components for integrating diverse systems and data sources.
Apache
Apache CXF
Apache CXF is an open-source, fully featured web service framework. It provides an efficient, reliable and flexible architecture for creating and consuming SOAP and RESTful web services.
Apache
Apache Flink
Apache Flink is an open-source data processing framework for building real-time and batch processing pipelines.
Apache
Apache HBase
Apache HBase is an open-source, column-oriented, distributed database designed to store and manage massive amounts of unstructured data; built on top of Apache Hadoop.
Apache
Apache Hive
Apache Hive is a data warehouse technology that facilitates querying and managing of large datasets stored in distributed storage systems like Hadoop.
Data lakehouse
Apache Hudi
Apache Hudi (Hadoop upserts, deletes, and incrementals) is an open-source data management framework designed for big data workloads.
Apache
Apache Knox
Apache Knox is a security layer for your Hadoop ecosystem, providing a single point of authentication and security.
Apache
Apache Lucene
Apache Lucene is a powerful, open-source information retrieval library that provides easy-to-use and scalable search capabilities to applications.
Apache
Apache NiFi
Apache NiFi is a tool for data integration and flow management, with a graphical user interface, large processor library, and data processing capabilities.
Uncategorized
Apache Pig
Learn about Apache Pig: a high-level platform for data processing and ETL workflows. Features include Pig Latin, UDFs, scalability, and interoperability.
Apache
Apache ServiceMix
Apache ServiceMix is an open-source integration container that provides a lightweight and flexible integration framework. It is built on top of Apache Karaf and Apache Camel
Apache
Apache Solr
Apache Solr is a fast and reliable search engine platform that offers a wide range of features like faceted search, hit highlighting, and more.
Apache
Apache Spark
Apache Spark is an open-source distributed computing system that can handle large amounts of data processing tasks. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
B
Uncategorized
Big Data Analytics
Discover the power of big data analytics – gain insights, improve decision-making, and increase efficiency for businesses today.
Data Lake
BLOBs
BLOBs are a powerful data type for storing large volumes of binary data, such as images, videos, and audio files within a database management system.
D
Data lakehouse
Data Catalog
Improve collaboration and decision-making while ensuring data quality and compliance. Learn more about data catalogs here.
Data Engineering
Data Cleansing
Data cleansing is the process of detecting and correcting or removing inaccurate, incomplete, or irrelevant data.
Data lakehouse
Data Cube
A data cube, also known as a multi-dimensional cube or a hypercube, is a data structure that allows for efficient querying and analysis of data.
Data lakehouse
Data Discovery
Unlock the full value of your data with data discovery. Discover, understand, and analyze your data to make better decisions and solve business problems.
Data Fabric
Data Fabric
A data fabric is a unified and integrated data management framework that enables organizations to manage data seamlessly across various data sources, locations, and formats.
Data lakehouse
Data Governance
Data Governance is the overall management of the availability, usability, integrity, and security of data used within an organization.
Uncategorized
Data Integration
Learn about data integration, its benefits, and how it streamlines decision-making by consolidating diverse datasets for effective analysis and reporting.
Data lakehouse
Data Lake
A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale.
Data lakehouse
Data Lakehouse
A data lakehouse is a centralized repository that allows organizations to store structured and unstructured data at any scale.
Data lakehouse
Data Lineage
Data lineage is the process of tracking the data as it moves through different systems and stages of its lifecycle.
Uncategorized
Data Mart
Unlock the full value of data with data marts. Tailor your data to specific departments and subject areas to improve decision-making and drive business growth.
Data Mesh
Data Mesh Architecture
Data mesh is a relatively new concept in the field of data architecture that emphasizes the importance of decentralizing data ownership and management.
Data Mesh
Data Mesh vs. Data Lake
While both data mesh and data lake are popular concepts in modern data architecture, they are distinct in their approach and purpose.
Uncategorized
Data Migration Tools
Explore the world of data migration tools, their key features, and best practices to ensure successful data transfers between systems.
Uncategorized
Data Modeling
Explore data modeling, its importance, and how it helps organizations manage data effectively, optimize performance, and drive decision-making.
Data Engineering
Data Processing
Learn about data processing: its types, importance, and methods. Discover how it can help optimize business operations and make better decisions.
Data lakehouse
Data Quality
Data quality refers to the overall fitness and usefulness of data for a specific purpose or application.
Data Transformation
Data Transformation
Data transformation converts data to a new format or structure for analysis or integration.
Data lakehouse
Data Warehouse
A data warehouse is a centralized repository that is designed to store and manage large amounts of data from various sources.
Uncategorized
Data Warehousing
Dive into data warehousing – a centralized repository for storing, managing, and analyzing data from diverse sources. Enhance decision-making today!
Data Wrangling
Data Wrangling
Data wrangling is the process of cleaning, transforming, and integrating data to make it suitable for analysis, leading to better decision-making.
DataOps
DataOps Architecture
DataOps architecture refers to the framework and practices used to manage and optimize data pipelines in a way that supports agile development.
DataOps
DataOps Best Practices
DataOps is a collaborative approach to data management that aims to improve the efficiency and effectiveness of data pipelines.
DataOps
DataOps vs. DevOps
DevOps and DataOps share similarities in terms of their principles and practices, but they also have differences in their focus, processes, and outcomes.
Data lakehouse
Delta Lake
Delta Lake is a storage layer that sits on top of existing data lakes, enabling data engineers and scientists to perform special functions with data.
Delta Lake
Delta Lake Merge
Delta Lake Merge is a versatile tool that enables users to combine data from multiple sources quickly, efficiently, and reliably.
DataOps
DevOps
DevOps is a software development methodology emphasizing collaboration and communication between development and operations teams.
Uncategorized
Domain-Driven Design
Explore the principles and techniques of Domain-Driven Design. Discover how it helps tackle software complexity and aligns it with business needs.
O
Uncategorized
Online Analytical Processing (OLAP)
Discover the power of Online Analytical Processing (OLAP) for data analysis. Gain insights by leveraging multidimensional data modeling and analysis techniques
Data lakehouse
Open Data
Open data is data that is stored in the data lake and is freely available for anyone to use, reuse, and redistribute without any legal, technological, or financial restrictions.
Data Mesh
Open Source Data Mesh
Some commonly used open-source tools for building a data mesh architecture include Apache Kafka, and Apache Spark, among others.
S
Data lakehouse
Semantic Layer
This page provides an overview of the benefits of a semantic layer, common use cases, and best practices for building and maintaining a semantic layer.
Uncategorized
Stored Procedures
Learn about stored procedures, their purpose, and their benefits in managing complex database operations, improving maintainability and performance.
Data Engineering
Structured vs. Unstructured Data
Learn the pros and cons of structured and unstructured data and how they are stored in data lakes and data warehouses for analysis.