Compression, Dedupe and Encryption Conundrums in Cloud Data Lakes

Cloud data lake footprints are in exabytes and exponentially growing, and companies pay billions of dollars to store and retrieve data. In this talk, we will cover some of the space and time optimizations, which have historically been applied to on-premises file storage, and how they would be applied to objects stored in cloud data lakes.Deduplication and compression are techniques that have been traditionally used to reduce the amount of storage used by applications. Data encryption is table stakes for any remote storage offering, and today we have client-side and server-side encryption support by cloud providers.Combining compression, encryption and deduplication for object stores in the cloud is challenging due to the nature of overwrites and versioning, but the right strategy can save millions of dollars for an organization. We will cover some strategies for employing these techniques, depending on whether an organization prefers client-side or server-side encryption, and discuss online and offline deduplication of objects.Companies such as Box and Netflix employ a subset of these techniques to reduce their cloud footprint and provide agility in their cloud operations.

Speakers

Tejas Chopra

Tejas Chopra is a Senior Software Engineer, working in the Data Storage Platform team at Netflix, where he is responsible for architecting storage solutions to support Netflix Studios and Netflix Streaming Platform. Prior to Netflix, Tejas was working on designing and implementing the storage infrastructure at Box, Inc. to support a cloud content management platform that scales to petabytes of storage and millions of users. Tejas has worked on distributed file systems and backend architectures, both in on-premises and cloud environments as part of several startups in his career. Tejas is an international keynote speaker and periodically conducts seminars on software development and cloud computing. He holds a master’s degree in Electrical & Computer Engineering from Carnegie Mellon University, with a specialization in Computer Systems.

Compression, Dedupe and Encryption Conundrums in Cloud Data Lakes

Speakers

Ready to Get Started? Here Are Some Resources to Help

Whitepaper

Dremio Upgrade Testing Framework

Whitepaper

Operating Dremio Cloud Runbook

Webinars

Unlock the Power of a Data Lakehouse with Dremio Cloud

Get Started Free

See Dremio in Action

Talk to an Expert

Ready to Get Started?