Data Mastery Hub: Term Resource for Data Professionals
Whether you're a newcomer to the world of big data and data lakes or an experienced pro looking to expand your knowledge, the Dremio Wiki provides insights and guidance for all your data-related needs. Dive in and unlock the power of your data today!
Data Management
Data Rollback
Data Rollback is a feature that allows businesses to revert their data to a previous state, aiding in data processing and analytics.
Network Infrastructure
Data Routing
Data Routing is the process of directing data flows to the appropriate systems for processing and analytics.
Data Management
Data Sampling
Data Sampling is a technique used to select a subset of data from a larger dataset to perform analysis, processing, or testing.
Data Management
Data Schema Evolution
Data Schema Evolution is the process of modifying the structure of a database or data warehouse to accommodate changes in data requirements.
Machine Learning
Data Science
Feature engineering is the process of selecting, manipulating, and transforming raw data into features used in machine learning algorithms to improve model accuracy on unseen data.
Data Management
Data Scrubbing
Data Scrubbing is the process of identifying and correcting or removing errors, inconsistencies, and inaccuracies in datasets to ensure data quality and reliability.
Data Security
Data Security and Governance Policies
Data Security and Governance Policies is a set of guidelines and practices implemented by organizations to protect their data assets, ensure compliance with regulations, and facilitate effective data processing and analytics.
Data Security
Data Security and Privacy
Understand Data Security and Privacy, its benefits and challenges, and how it integrates with data lakehouse environments for data scientists.
Data Management
Data Segregation
Data Segregation is the practice of organizing and separating data based on its attributes or characteristics to optimize data processing and analytics.
Data Engineering
Data Serialization
Data Serialization is the process of converting structured or semi-structured data into a serialized format, such as JSON or XML, for storage or transmission.
Data Storage
Data Sharding
Data Sharding is a technique for horizontally partitioning large datasets into smaller, more manageable parts.
Data Management
Data Silos
Data Silos is a term used to describe isolated repositories of data within an organization that are not easily accessible or interoperable with other systems.
Data Management
Data Skew
Data Skew is an imbalance in the distribution of data within a dataset that can impact data processing and analytics.
Data Analysis
Data Skewness
Data Skewness is the imbalance in the distribution of data across partitions or nodes in a distributed computing environment.
Data Management
Data Snapshot
Data Snapshot is a technology that allows businesses to capture and store a static copy of their data at a specific point in time.