Dremio Blog

5 minute read · October 16, 2020

Your Path to the Cloud Data Lake – Navigating the Thorny Path of Migration

Kevin Petrie Eckerson Group

Start For Free

Copied to clipboard

Your Path to the Cloud Data Lake – Navigating the Thorny Path of Migration

So far so good. Now how do you move your data?

Pack Your Bag

Map Your Course

Start Moving

Blaze the Next Trail

Let’s say you weighed the pros and cons, and decided to journey to the cloud data lake. Furthermore, you opted for a Cloud Data Lake Engine (CDLE) to run your Business Intelligence (BI) workloads once you get there. Designed and built well, the CDLE delivers performance and efficiency breakthroughs by applying interactive SQL query methods and a consolidated semantic layer to cloud-native object storage.

So far so good. Now how do you move your data?

Cloud migrations present a circuitous and thorny path in the best of circumstances. This blog introduces guidelines for architects and data engineers to plan and execute successful migrations. Given the complexity of the topic, one should treat this as a primer rather than a comprehensive guide. The common themes: think ahead, execute in phases, and adapt your plan based on lessons learned. Above all, remain vigilant on the journey to your data lake in the cloud. Wolves lurk.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

Pack Your Bag

Architects and data engineers should start their migration planning process by creating an inventory of all their BI use cases, then profiling the workloads, data sets and business requirements for each. They should identify the use cases that have modest latency and throughput needs, and rely on relatively small volumes of data. Ideally these are departmental use cases that do not directly impact revenue. Plan to migrate those workloads and datasets first.

Ultimate Guide to the Cloud Data Lake Engine

Download the Whitepaper

By starting with a lighter pack, you improve the odds you will get there. Data teams can learn the basic dos/don’ts of migration, and test target performance, without posing significant risks to the business. Higher complexity workloads and datasets can wait until after the first migration proves successful.

Map Your Course

Data teams also need to define their migration approach. They might be able to simply “re-host,” or “lift and shift,” their existing application or workload to the Cloud Service Provider Infrastructure as a Service (IaaS) – Amazon S3, Azure Data Lake Storage (ADLS), etc. They can replicate their data, schema, metadata, and if necessary their ETL scripts, from source to target, without significant changes.

Look for help from your CDLE tool as well. Dremio, for example, can simplify things by transferring the semantic layer – that abstracted business of view of all the interdependent tables, columns and schemas – with no changes needed. Some migrations grow more complicated. For example, to integrate with specific Platform as a Service (PaaS) offerings on the cloud target, you might need to rewrite ETL scripts or the BI application itself. You also would need to change interfaces for your data to support new Software as a Service (SaaS) applications on the cloud.

As a final planning consideration, evaluate and select your migration tool. Options include homegrown ELT or ETL scripts, or change data capture (CDC) tools such as Fivetran or Qlik. CDC reduces the WAN bandwidth required for ongoing updates from source to target by eliminating the need for repeated batch copies.

Start Moving

Well-planned migrations should not pose major risks to operations or analytics activities, for two primary reasons. First, data teams should schedule the migration during slow business hours, when higher latency or lower throughput will not disrupt BI analysts or business managers. By allotting ample time, they can monitor and remediate migration issues with less risk of intruding on working hours.

Second, tools now support zero-downtime migrations in many cases. For example, the Dremio semantic layer can abstract the virtual datasets being queried from the underlying physical datasets and storage. Even as the location of the physical data changes during migrations, queries continue against the same virtual dataset without disruption. In addition, CDC tools maintain uptime by replicating incremental updates to the source during the initial load transfer. Once the load transfer and updates are complete, you can re-point your query application to the fully-synchronized target – with no downtime.

Blaze the Next Trail

Once data teams complete that first departmental migration, they can assess what they learned. Perhaps WAN transfer throughput fell short of expectations, or latency proved lower than expected on the target S3 platform. Data teams can adjust their future migration schedules or SLAs accordingly. With the post-mortem complete, you can plan your migration of higher-volume, more complex, more mission-critical BI workloads. Be sure to work closely with the BI analysts that consume those workloads to meet their needs without disruption. Then it is time to embark on your next journey to the data lake in the cloud.

About the author

Kevin Petrie

VP of Research at Eckerson Group

Kevin’s passion is to decipher what technology means to business leaders and practitioners. He has invested 25 years in technology, as an industry analyst, writer, instructor, product marketer, and services…

More About Kevin Petrie

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.

Start For Free

Article Topics

Dremio Blog: Open Data Insights

Sep 22, 2023 Dremio Blog: Open Data Insights

Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop

We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]

Alex Merced

Aug 16, 2023 Dremio Blog: News Highlights

5 Use Cases for the Dremio Lakehouse

With its capabilities in on-prem to cloud migration, data warehouse offload, data virtualization, upgrading data lakes and lakehouses, and building customer-facing analytics applications, Dremio provides the tools and functionalities to streamline operations and unlock the full potential of data assets.

Alex Merced

Aug 31, 2023 Dremio Blog: News Highlights

Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud

Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.

Jeremiah Morrow

Your Path to the Cloud Data Lake – Navigating the Thorny Path of Migration

Table of Contents

So far so good. Now how do you move your data?

Try Dremio’s Interactive Demo

Pack Your Bag