Data access restrictions, retention, and encryption-at-rest are fundamental security controls to achieve data privacy and compliance. This talk shows how we build and utilize open source Parquet’s finer-grained encryption feature to support all three controls in a unified way.
In particular, we will focus on the technical challenges of designing and applying encryption in a secure, reliable, and efficient manner for large-scale data. Those challenges include multiple access routes, performance overhead, handling the access denied, reliability, huge historical data, auto-onboarding, etc.
We will also share our experiences with recommended practices to manage the system in production at scale.One-Stone.-Three-Birds
Xinli Shang is a Tech Lead Manager on the Uber Big Data Infra team and VP of Apache Parquet PMC Chair. He is leads the Apache Parquet community and contributes to several other communities like Presto and Trino. He also leads several initiatives on data format for storage efficiency, security, and performance. He is passionate about tuning large-scale services for performance, throughput, and reliability.
Mohammad Islam is a Sr. Staff Engineer at Uber. He co-leads the Data cost efficiency effort, and also leads Data security and compliance efforts. He is also an Apache Oozie and Tez PMC member.