July 13, 2021

Migrating to Parquet – The Veraset Story

Veraset is a data-as-a-service (DaaS) company that delivers PBs of geospatial data to customers across a variety of industries. We build and manage a central data lake, housing years of data, and operationalize that data to solve our customers’ problems. I recently gave a talk about the specifics of file formats at Spark+AI Summit 2020 that generated a lot of questions about my company’s migration from CSV to Apache Parquet. As CTO of a DaaS company, I saw firsthand how this migration had a drastic effect for all of our customers. This session will drill into the operational burden of transforming the storage format in an ecosystem and its impact on the business.

Topics Covered

CSV

Dremio Subsurface for Apache Parquet

Speakers

Vinoo Ganesh

Vinoo Ganesh is Chief Technology Officer at Veraset, a data-as-a-service startup focused on understanding the world from a geospatial perspective. Vinoo previously managed the compute team at Palantir Technologies, tasked with managing Spark and its interaction with HDFS, S3, Parquet, YARN and Kubernetes across the company. Most recently, this team was closely involved in pushing forward a number of open source Spark initiatives, including a DataSource V2 implementation and the External shuffle service.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

Migrating to Parquet – The Veraset Story

Speakers

Try Dremio’s Interactive Demo

Get Started Free

See Dremio in Action

Talk to an Expert

Make data engineers and analysts 10x more productive