December 10, 2025

COBOL to Cloud: Architecting Insurance Data Lakehouses for Legacy Modernization

Transforming 30-year-old insurance systems into modern data lakehouses presents unique challenges when processing $100M+ daily transactions. This presentation details how we migrated a major insurance conglomerate from mainframe COBOL to a cloud-native lakehouse architecture at DXC Technology. Our approach combined Apache Iceberg for versioned policy data, Apache Spark for distributed claims processing, and streaming ingestion for real-time updates—all while maintaining zero downtime during the two-year transformation.

The migration leveraged lakehouse capabilities to solve legacy system limitations. Using Iceberg’s schema evolution, we incrementally modernized data models without breaking downstream systems. Apache Arrow enabled efficient data exchange between legacy mainframes and cloud services, while Spark Structured Streaming processed 2 million daily policy updates. The results were transformative: policy issuance reduced from 5 days to 4 hours, claims processing accelerated from 3 weeks to 2 days, and system availability improved to 99.9% uptime.

Critical success factors included our CDC-based data synchronization maintaining consistency between legacy and lakehouse systems, Iceberg’s partition evolution supporting gradual migration of historical data spanning decades, and Apache Parquet optimization reducing storage costs by 80%. I’ll share specific techniques for extracting business logic from COBOL, implementing the strangler pattern with lakehouse architectures, and managing stakeholder expectations during transformation. Attendees will gain practical insights for modernizing legacy systems using open-source lakehouse technologies, ensuring data quality during migration, and achieving business continuity while fundamentally transforming enterprise data infrastructure.

Topics Covered

Data Analytics
Data Lake
Lakehouse
Streaming Analytics

Sign up to watch all Subsurface 2025 sessions