Data Mastery Hub: Term Resource for Data Professionals

Whether you're a newcomer to the world of big data and data lakes or an experienced pro looking to expand your knowledge, the Dremio Wiki provides insights and guidance for all your data-related needs. Dive in and unlock the power of your data today!

Apache

Apache Mahout

Apache Mahout is a machine learning library that provides data processing and analytics features for businesses that use big data.

Apache

Apache MapReduce

Apache MapReduce is a software framework for distributed processing of large datasets on commodity hardware.

Apache

Apache Mesos

Apache Mesos is a distributed systems kernel that abstracts CPU, memory, storage, and other resources, providing efficient resource isolation and sharing across distributed applications.

Apache

Apache MRUnit

Apache MRUnit is a Java framework for unit testing MapReduce jobs, providing a simple and effective way to test data processing and analytics workflows.

Apache

Apache NiFi

Apache NiFi is a tool for data integration and flow management, with a graphical user interface, large processor library, and data processing capabilities.

Apache

Apache NiFi MiNiFi

Apache NiFi MiNiFi is an edge-based data collection and processing agent used to manage and transmit data from remote devices to a central location.

Apache

Apache Oozie

Apache Oozie is a workflow scheduler system designed to manage Apache Hadoop jobs. Learn more about how it aids in data processing and analytics.

Apache

Apache Parquet

Apache Parquet is a high-performance columnar storage format that enables efficient processing of large datasets and faster query executions.

Apache

Apache Phoenix

Apache Phoenix is a high-performance relational database layer that enables you to run SQL queries on Apache Hadoop

Apache

Apache Pig

Learn about Apache Pig: a high-level platform for data processing and ETL workflows. Features include Pig Latin, UDFs, scalability, and interoperability.

Apache

Apache Pulsar

Apache Pulsar is a distributed pub-sub messaging system that offers a unified messaging platform for streaming data.

Apache

Apache Ranger

Apache Ranger is a security framework that provides centralized security policy administration for big data platforms.

Apache

Apache S4

Apache S4 is a distributed computing platform that simplifies real-time data processing and analytics.

Apache

Apache Samza

Apache Samza is a distributed stream processing framework, ideal for real-time data processing and analytics.

Apache

Apache Sentry

Apache Sentry provides role-based access control (RBAC) and data authorization capabilities to processing frameworks like Apache Hadoop, Apache Spark, and Apache Impala.

1 2 3 4 5 6 7 60 61 62 63
No Wikis Found
Topics
get started

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now
demo on demand

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo
talk expert

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.