CUSTOMER STORY

Genomics England Unlocks Life-Saving Research with Dremio’s Data Lakehouse Platform

280,000 participants with unified data access across multiple research programs

400 active research projects supported with automated, granular access controls

80 petabytes of genomic and medical data managed efficiently

The Customer

Genomics England is a pioneering UK life sciences organization that advances genomic medicine and research. Best known for the landmark 100,000 Genomes Project, it manages one of the world’s largest genomic and clinical data ecosystems, working with the NHS and global partners. The organization handles petabyte-scale datasets to enable breakthroughs in rare disease diagnosis, cancer treatment, and health insights. To translate this knowledge into improved patient outcomes, Genomics England relies on secure, high-performance data platforms for thousands of clinicians and researchers. 

The Challenge 

The National Genomic Research Library (NGRL) was built from multiple initiatives, resulting in separate data structures, standards, and participant cohorts. Historically released as separate tables, this fragmentation made large-scale analyses difficult and often caused smaller cohorts to be overlooked. Researchers struggled to understand data context and relationships across programs, which spanned up to 80 years of medical history and included over 6,000 phenotypic data fields. 

Operating in a highly secure and regulated environment also introduced complex authentication requirements. Genomics England needed to integrate Dremio with a wide range of third-party bioinformatics tools while preserving centralized identity and access management. External researchers required an abstracted authentication model to maintain strict security without limiting tool interoperability. 

Finally, access control at scale presented a challenge. With roughly 400 concurrent research projects, each governed by unique participant permissions, the organization had to enforce highly granular entitlements. Managing these rules at the table level created a poor user experience, forcing researchers into complicated joins across fragmented datasets, which slowed analysis and increased the risk of error.

The Solution 

Genomics England modernized its research platform by adopting Dremio’s data lakehouse architecture, utilizing Apache Iceberg and virtual datasets for a unified and scalable foundation. Iceberg table snapshots enable the synchronization of data updates while preserving point-in-time references, a capability essential for research reproducibility over decades. This standardized structure simplifies schema enforcement and presents previously fragmented programs as a cohesive data model. 

To strengthen metadata management, the organization implemented an automated cataloging workflow. AWS Glue provides raw metadata and quality reports, which Amazon DataZone’s business catalog enriches with SME-authored forms. Python pipelines then extract this curated information to automatically populate Dremio Wikis with business definitions, column descriptions, and quality metrics, providing a complete metadata layer within the researchers’ restricted environment. 

Authentication was seamlessly integrated. Internal users authenticate via Okta using OIDC/SSO, while external researchers access bioinformatics tools through an OAuth 2.0 token exchange. This abstracts authentication for external users while retaining centralized control within Genomics England’s secure infrastructure. 

Fine-grained access control is enforced through Dremio’s native integration with AWS Lake Formation, enabling row-level permissions at scale without performance trade-offs. Researchers simply query data; Dremio applies Lake Formation policies in real time, returning only the approved participant records. The setup required minimal configuration and operates transparently, delivering strong governance with no reduction in usability. 

Results 

The adoption of Dremio’s lakehouse platform has dramatically improved research efficiency at Genomics England. Researchers now work from unified participant tables that eliminate the need for complex joins and ensure smaller cohorts are no longer overlooked. With a rich metadata catalog providing immediate context for more than 6,000 phenotypic fields, time-to-insight has accelerated significantly across research teams. 

Operational agility has also increased. Virtual datasets allow rapid iteration on data models without heavy ETL pipelines or additional storage overhead, and schema updates automatically cascade to all dependent views. This enables multiple representations—such as FHIR and OMOP—to be generated from the same Iceberg tables while maintaining consistency and reducing maintenance effort. 

Governance has strengthened through automated, real-time access controls that adjust to project status changes and participant consent updates. With approximately 400 concurrent projects, each with unique permission profiles, Dremio and Lake Formation ensure that all data access remains tightly governed, fully auditable, and compliant with stringent healthcare regulations. 

The platform has proven highly scalable, supporting more than 80 petabytes of genomic and clinical data with capacity for continued growth. Genomics England is now evaluating the integration of its variant store into Dremio to further streamline the researcher experience and reduce the number of specialized tools required. Looking ahead, the organization plans to adopt Dremio’s workload management for power users, introduce AI-generated queries to support non-technical researchers, and leverage query-level metrics to drive data quality improvements—all in support of accelerating life-saving genomic discoveries.  

get started

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now
demo on demand

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo
talk expert

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us

Make data engineers and analysts 10x more productive

Boost efficiency with AI-powered agents, faster coding for engineers, instant insights for analysts.