Serverless Cloud Data Lake with Spark for Serving Weather Data

The Weather Company (TWC) collects weather data across the globe at the rate of 34 million records per hour, and the TWC History on Demand (HoD) application serves that historical weather data to users via an API, averaging 600,000 requests per day. Users are increasingly consuming large quantities of historical data to train analytics models, and require efficient asynchronous APIs in addition to existing synchronous ones that use Elasticsearch.This session presents TWC’s architecture that uses a serverless cloud data lake running on top of Apache Spark and how that enables a highly elastic and economic way of serving weather history data. We will explain our concept of data skipping indexes that boosts performance by orders of magnitude compared to an out-of-the-box Spark setup, as well as significantly reducing cost. This enables TWC HoD to triple weather data coverage from land only to the entire globe, while at the same time reducing costs by an order of magnitude.We will also review serverless cloud data lake architecture in general and elaborate on the composition of serverless building blocks such as serverless storage, serverless ETL, serverless SQL and serverless data pipeline orchestration. In addition, we will review a set of major enhancements, including built-in geospatial and time series functions and a built-in multi-tenant Hive Metastore.Finally, we will highlight how TWC was able to adopt the serverless cloud data lake platform for new applications by rolling out a brand-new global data collection pipeline and data lake for COVID-19 data in just a few weeks.

Topics Covered

Dremio Subsurface for Apache Spark
Dremio Subsurface: Advanced Storage Solutions
get started

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now
demo on demand

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo
talk expert

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.