Data Mastery Hub: Term Resource for Data Professionals
Whether you're a newcomer to the world of big data and data lakes or an experienced pro looking to expand your knowledge, the Dremio Wiki provides insights and guidance for all your data-related needs. Dive in and unlock the power of your data today!
Apache
Apache Mahout
Apache Mahout is a machine learning library that provides data processing and analytics features for businesses that use big data.
Apache
Apache MapReduce
Apache MapReduce is a software framework for distributed processing of large datasets on commodity hardware.
Apache
Apache Mesos
Apache Mesos is a distributed systems kernel that abstracts CPU, memory, storage, and other resources, providing efficient resource isolation and sharing across distributed applications.
Apache
Apache MRUnit
Apache MRUnit is a Java framework for unit testing MapReduce jobs, providing a simple and effective way to test data processing and analytics workflows.
Apache
Apache NiFi
Apache NiFi is a tool for data integration and flow management, with a graphical user interface, large processor library, and data processing capabilities.
Apache
Apache NiFi MiNiFi
Apache NiFi MiNiFi is an edge-based data collection and processing agent used to manage and transmit data from remote devices to a central location.
Apache
Apache Oozie
Apache Oozie is a workflow scheduler system designed to manage Apache Hadoop jobs. Learn more about how it aids in data processing and analytics.
Apache
Apache Parquet
Apache Parquet is a high-performance columnar storage format that enables efficient processing of large datasets and faster query executions.
Apache
Apache Phoenix
Apache Phoenix is a high-performance relational database layer that enables you to run SQL queries on Apache Hadoop
Apache
Apache Pig
Learn about Apache Pig: a high-level platform for data processing and ETL workflows. Features include Pig Latin, UDFs, scalability, and interoperability.
Apache
Apache Pulsar
Apache Pulsar is a distributed pub-sub messaging system that offers a unified messaging platform for streaming data.
Apache
Apache Ranger
Apache Ranger is a security framework that provides centralized security policy administration for big data platforms.
Apache
Apache S4
Apache S4 is a distributed computing platform that simplifies real-time data processing and analytics.
Apache
Apache Samza
Apache Samza is a distributed stream processing framework, ideal for real-time data processing and analytics.
Apache
Apache Sentry
Apache Sentry provides role-based access control (RBAC) and data authorization capabilities to processing frameworks like Apache Hadoop, Apache Spark, and Apache Impala.