Hortonworks Data Platform

What is Hortonworks Data Platform?

Hortonworks Data Platform (HDP) is an open-source platform designed to manage, process, and analyze large data sets (big data). It delivers a robust platform for multi-workload data processing across numerous processing methods, from batch through interactive to real-time, supported with robust security, governance and operations capabilities.

History

HDP was developed and released by Hortonworks Inc., in June 2012. Hortonworks, originally a spin-off from Yahoo, aimed to enhance the adoption of Apache Hadoop, a popular big data processing tool. The latest version of HDP, HDP 3.1.5, was released in August 2021.

Functionality and Features

  • Comprehensive Data Processing: HDP supports multiple data processing paradigms, including batch processing, interactive querying, and real-time analytics.
  • Data Governance: HDP includes robust tools for data governance and security, ensuring data is handled properly and securely.
  • Scalability: HDP is highly scalable and can accommodate growing data volumes with ease.
  • Open Source: HDP is completely open source, giving businesses the freedom to modify and extend the platform as needed.

Architecture

The HDP architecture includes Hadoop Distributed File System (HDFS) for data storage, YARN for resource management, and various components for different data processing methods like MapReduce, Hive, HBase, Storm, etc. It provides a shared storage and compute layer, built on commodity hardware.

Benefits and Use Cases

HDP is suitable for many use cases, including data discovery, data warehousing optimization, and advanced analytics. It offers businesses the opportunity to harness the value in big data and draw insights from both structured and unstructured data. It's known for its robustness, scalability, and flexibility in handling a variety of workloads.

Challenges and Limitations

Like any technology, HDP has its limitations. Though powerful, HDP can be complex to set up and manage. The platform requires significant resources to run effectively and can be daunting to businesses with smaller IT teams or less technical expertise.

Integration with Data Lakehouse

HDP can be used as the underlying platform for a data lakehouse. The flexibility, multi-faceted data processing capabilities, and robustness of HDP make it an excellent choice for organizations implementing a data lakehouse architecture.

Security Aspects

HDP includes built-in security features, such as Kerberos for authentication, Apache Ranger for authorization, and Apache Knox for gateway services. Additionally, data encryption at rest and in transit ensure data is safeguarded at all stages.

Performance

HDP delivers high performance on commodity hardware. However, the performance can vary depending on the workloads and the hardware setup.

Frequently Asked Questions

  • What is Hortonworks Data Platform? Hortonworks Data Platform is an open-source platform that provides comprehensive data processing functionality, from batch processing to real-time analytics.
  • Who developed HDP? HDP was developed by Hortonworks Inc, a company that spun off from Yahoo.
  • What are the benefits of HDP? HDP offers robust, flexible, and scalable solutions for big data processing. It also includes powerful security and governance tools.
  • How does HDP integrate with a data lakehouse? HDP can be used as the underlying platform for a data lakehouse, thanks to its flexibility and various data processing capabilities.
  • What security measures does HDP have? HDP includes Kerberos for authentication, Apache Ranger for authorization, Apache Knox for gateway services, and encryption for data protection.

Glossary

  • Apache Hadoop: An open-source software framework for distributed storage and processing of big data using the MapReduce programming model.
  • Apache Ranger: A framework designed to enable, monitor and manage comprehensive data security across the Hadoop platform.
  • Data Lakehouse: A new architecture that combines the best elements of data lakes and data warehouses in one package.
  • Kerberos: A network authentication protocol designed to provide strong authentication for client/server applications.
  • Hadoop Distributed File System (HDFS): A distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems.
get started

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now
demo on demand

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo
talk expert

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.