Data Mastery Hub: Term Resource for Data Professionals
Whether you're a newcomer to the world of big data and data lakes or an experienced pro looking to expand your knowledge, the Dremio Wiki provides insights and guidance for all your data-related needs. Dive in and unlock the power of your data today!
Apache
Apache HBase
Apache HBase is an open-source, column-oriented, distributed database designed to store and manage massive amounts of unstructured data; built on top of Apache Hadoop.
Apache
Apache HCatalog
Apache HCatalog is a metadata and table management system for Hadoop, facilitating the movement of multi-format data between Hadoop and other data processing platforms.
Apache
Apache Hive
Apache Hive is a data warehouse technology that facilitates querying and managing of large datasets stored in distributed storage systems like Hadoop.
Apache
Apache HTrace
Apache HTrace is a distributed tracing framework that helps businesses optimize data processing and analytics.
Apache
Apache Hudi
Apache Hudi (Hadoop upserts, deletes, and incrementals) is an open-source data management framework designed for big data workloads.
Apache
Apache Ignite
Apache Ignite is an in-memory computing platform that can function as a distributed computing platform, data caching layer, and an SQL database.
Apache
Apache Impala
Apache Impala is an open-source massively parallel processing (MPP) SQL query engine for data stored in Apache Hadoop based data lakes.
Apache
Apache Jena
Apache Jena is a Java-based framework for building semantic web and linked data applications. It offers powerful reasoning and data processing capabilities for businesses.
Apache
Apache Kafka
Apache Kafka is a highly scalable, fault-tolerant messaging system that is used by organizations to manage large volumes of real-time data.
Apache
Apache Knox
Apache Knox is a security layer for your Hadoop ecosystem, providing a single point of authentication and security.
Apache
Apache Knox Gateway
Apache Knox Gateway is an API Gateway for interacting with Hadoop clusters. It helps secure and simplify data access for enterprises.
Apache
Apache Kudu
Apache Kudu is a columnar storage manager for Hadoop that enables real-time analytics and data processing
Apache
Apache Kylin
Apache Kylin is a big data analytics and processing engine, supporting large-scale data warehousing with sub-second query latency.
Apache
Apache Livy
Apache Livy is a RESTful web service that helps data scientists and developers manage and interact with Spark from anywhere.
Apache
Apache Lucene
Apache Lucene is a powerful, open-source information retrieval library that provides easy-to-use and scalable search capabilities to applications.