November 6, 2025

A Survey of Cybersecurity Data Infra and How to Simplify It with Graph Lakehouse

Modern cybersecurity systems rely heavily on complex data infrastructures to detect threats, analyze risks, and enforce policies at scale. However, these infrastructures are often fragmented, expensive to maintain, and difficult to evolve. This survey examines the common practices and architectural patterns in cybersecurity data infrastructure—focusing on log pipelines, SIEM platforms, data lakes, and threat detection engines—and identifies their limitations in handling real-time, interconnected data. We highlight the challenges in achieving high performance, explainability, and scalability in traditional setups. To address these challenges, we propose a graph-based approach built on the Data Lakehouse architecture, integrating technologies such as Nessie, Dremio, Apache Iceberg, and PuppyGraph, a graph query engine optimized for Iceberg tables. By modeling cybersecurity data as a connected graph rather than isolated logs or events, PuppyGraph enables more intuitive threat detection, faster investigation, and a streamlined architecture with reduced ETL complexity. We present real-world case studies from industry adopters to illustrate the simplification and performance improvements enabled by this paradigm shift.

Topics Covered

Apache Iceberg
Data Analytics
Data lakehouse
Data Optimization
Modernization and Migration
Streaming Analytics
Use Cases

Sign up to watch all Subsurface 2025 sessions