11 minute read · July 9, 2024

Why a Cyber Lakehouse? | Dremio & VAST Data: Transforming Cybersecurity

Mark Shainman

Mark Shainman · Principal Product Marketing Manager

Over the years, cybersecurity capabilities have evolved from single-point solutions to comprehensive cyber data platforms utilizing advanced analytic-based technologies. With the exponential growth in the volume, variety, and complexity of cyber-relevant data, cybersecurity professionals must leverage cutting-edge data platform technologies to address their needs effectively and economically. In today’s digital age, virtually all data holds cybersecurity significance, necessitating a new approach to handling the vast amounts of data that need to be analyzed.

The Emergence of Data Lakehouses
A data lakehouse is an architectural approach that combines the performance, functionality, and governance features of a data warehouse with the scalability, flexibility, and cost-efficiency of a data lake. Data lakehouses have become the preferred way to store, manage, and analyze vast amounts of structured, unstructured, and event-based data (such as logs), enabling efficient data processing and analytics. Scale-out computing elements of the lakehouse allow organizations to meet performance demands without exceeding budgetary constraints related to fixed computing resources. The data lakehouse model mitigates data silos, especially with cyber data, reduces the need for complex data movements, and allows organizations to perform real-time analytics on diverse data types.

Next-Generation Cyber Lakehouse
A next-generation cyber lakehouse equips cybersecurity professionals with a robust technology platform to collect, store, analyze, manage, alert, and act on their cybersecurity data at a massive scale in a hybrid cloud environment. Here’s what an effective cyber lakehouse should offer:

Data Ingestion
The ability to support the seamless ingestion of large volumes of data from various sources, including agents, packet capture (PCAP), network, security, observability, and event pipelines. This capability extends to data located both on-premises and in the cloud, ensuring flexible and comprehensive data integration. It needs to handle high volume and velocity, to guarantee lossless collection and scalability to meet evolving data demands.

Data Curation
The system should be able to store data in multiple formats and transform it on demand, including any preprocessing required for known analytical needs. It should efficiently write raw data to scale-out storage and subsequently transform it as needed cost-effectively. This capability ensures appropriate data processing for cyber analytics requirements

Data Optimization
Dynamic optimization of data layout, summarization, and lifecycle management is crucial to meet continuously evolving analytics and compliance requirements.

Data Analysis
The ability to conduct analytics on both structured and unstructured data at scale for alerting, reporting, and root cause analysis is key. It must deliver:

  • Fast and efficient querying of cyber data.
  • Effective summarizations and aggregations.
  • Quick identification of specific data points within large datasets.

Connectivity to Third-Party Systems
Providing the ability to connect other analytics systems and utilize a commonly used language, such as SQL, eliminates timely and costly data movement, data duplication, and redundant analytics. This ensures data availability for Security Information and Event Management (SIEM), Security Orchestration Automation, and Response (SOAR), Extended Detection and Response (XDR), and other security solutions.

Data Sharing
Due to regulatory and practical concerns, the ability to share cyber data on demand with internal and external entities is increasingly important. A cyber data lakehouse facilitates the direct access and sharing of data, or export of data for sharing at large data volumes without overloading compute resources or complicating access to internal SIEM and analytics systems.

VAST Data and Dremio
Dremio and VAST Data deliver a powerful cybersecurity solution compatible with any SQL-compliant SIEM, analytics engine, alerting system, or visualization tool. This combined platform leverages the strengths of each component to provide real-time and continuous cyber insights. Systems such as Splunk, QRadar, Grafana, and Elastic seamlessly integrate with the joint solution. The Cyber Lakehouse is designed to address modern cybersecurity challenges by combining real-time data ingestion, high-performance analytics, and seamless integration with SIEM systems.

Real-Time Data Ingestion and Curation
The VAST Data Platform is an advanced, AI-driven solution that integrates storage and compute functionalities. It efficiently manages massive volumes of both structured and unstructured data at an exabyte scale, ensuring high levels of performance and scalability. Leveraging its Disaggregated Shared Everything (DASE) architecture, VAST Data employs Storage Class Memory (SCM) as a write buffer for data with flash memory for scalable, high-performance data ingestion. This architecture is ideal for cyber applications that require immediate insights and actions.

Dremio Unified Lakehouse Platform
Dremio enables high-speed, automated data summarization at scale, facilitating the easy application of complex data transformations such as filtering, sorting, aggregating, joining, and casting. These transformations can be quickly built as Dremio Views—governed virtual datasets with layered transformations—allowing for straightforward structured data access and analysis once the data is ingested and stored in VAST.

Data Optimization and Governance
Dremio's Enterprise Data Catalog delivers robust data optimization and governance to support efficient and reliable cyber operations. The platform leverages the Apache Iceberg table format, enabling dynamic schema and partition evolution without disrupting ongoing processes. This ensures organizations can adapt their data structures seamlessly as needs evolve. Additionally, the platform provides comprehensive governance capabilities with built-in version control and metadata management, enhancing data reliability and traceability across hybrid and multi-cloud environments.

The VAST Data Platform ensures data integrity and security with advanced features like Attribute-Based Access Control (ABAC), granular access control, and intelligent threat detection. By leveraging ABAC, the platform provides detailed access control policies that consider user attributes, environmental conditions, and resource attributes, ensuring that only authorized users can access sensitive data.

Dremio Reflections
Dremio Reflections are optimized relational caches that are pre-computed data representations significantly accelerating query performance. By automatically leveraging these Reflections during query execution, Dremio reduces query response times to sub-second levels, enabling real-time analytics and efficient use of system resources. In the cyber lakehouse, Reflections can also map and transform event data collected by the VAST Data Platform into the Open Cybersecurity Schema Framework (OCSF) format, enhancing data query efficiency and performance. OCSF is a vendor-agnostic core security schema that simplifies data ingestion and normalization, making it easier for security teams to analyze and respond to threats. These transformations are saved as Dremio Views, ensuring consistent data formatting and optimization for analysis.

Data Analysis
Dremio’s intelligent query engine provides sub-second analytics performance on cyber data, enabling live and interactive queries on cloud and on-premises data through advanced query acceleration and optimization techniques, such as Reflections and C3 Columnar Cloud Cache. This, paired with the VAST DataBase’s columnar storage, supports high-performance querying and filtering, essential for large-scale data analytics. It accommodates both transactional and analytical workloads, combining the capabilities of a traditional database, data warehouse, and data lake.

Connectivity to Third-Party Systems
Dremio provides unified data access, enabling data visualization, federation, and interactive analytics across all organizational datasets. Its extensive connector ecosystem supports numerous integrations with diverse data sources, including object storage, metastores, and databases, both in the cloud and on-premises. Moreover, coupled with Dremio’s robust semantic layer, cyber users can seamlessly access and analyze data from multiple sources as a unified dataset. This integration allows for comprehensive insights and real-time cyber analytics without the need for complex data movement or transformations.

Enhanced Security Posture
The combined solution enhances proactive threat detection capabilities by leveraging VAST’s real-time data ingestion and Dremio’s high-performance analytics to identify and mitigate threats before significant damage can occur. Integration with Splunk provides continuous monitoring and real-time visibility into the organization’s security posture, enabling swift detection and response to potential threats.

Scalability and Flexibility
Both Dremio and VAST are designed to scale with the growing needs of modern organizations. Dremio’s cloud-native architecture allows for elastic scaling of compute and storage resources while VAST’s platform supports exabyte-scale data management. This solution supports various data types, making it suitable for diverse use cases, including real-time fraud detection, customer analytics, and IoT analytics.

Cost-Effective Scaling
Decoupling the compute layer from the storage layer enables organizations to use Dremio as a high-performance SQL query engine integrated with VAST Data’s scalable and cost-effective data storage platform. This approach offers several advantages over traditional SIEM solutions:

  • Cost Optimization: By separating compute and storage, organizations can scale these components independently, optimizing costs for specific workloads.
  • Flexibility and Scalability: The decoupled architecture adapts to changing requirements without being locked into a monolithic SIEM solution.
  • Performance and Efficiency: Dremio’s high-performance query engine handles complex queries across large datasets efficiently.
  • Open Architecture: Both Dremio and VAST Data embrace open standards and interoperability, fostering flexibility and future-proofing.

Conclusion
Dremio and VAST Data’s cyber lakehouse represents a paradigm shift in collecting, managing, and analyzing data for cybersecurity insights. It offers cybersecurity professionals a powerful, scalable, and cost-efficient solution for managing and analyzing massive volumes of structured and unstructured data. The cyber lakehouse provides users with real-time, interactive analytical insights into cyber data by leveraging features such as high-speed data ingestion, automated data summarization, and dynamic data optimization. The platform’s intelligent query engine and query acceleration technologies, Dremio Reflections and the C3 Columnar Cloud Cache, ensure sub-second query response times, making it ideal for immediate insights and actions in cyber applications. Moreover, the extensive connector ecosystem ensures seamless integration with SIEM systems. When coupled with a robust semantic layer, it facilitates seamless data access and visualization across diverse sources, ensuring comprehensive and unified cyber analytics. This integrated approach enhances threat detection and response capabilities and supports cost-effective and scalable data management, making the Dremio and VAST Data cyber lakehouse ideal for any organization facing today’s cybersecurity challenges.

Book a Meeting today to Explore whether a Cyber Lakehouse is the right solution for your use case.

Ready to Get Started?

Enable the business to create and consume data products powered by Apache Iceberg, accelerating AI and analytics initiatives and dramatically reducing costs.