Migrating to Parquet – The Veraset Story

Veraset is a data-as-a-service (DaaS) company that delivers PBs of geospatial data to customers across a variety of industries. We build and manage a central data lake, housing years of data, and operationalize that data to solve our customers’ problems. I recently gave a talk about the specifics of file formats at Spark+AI Summit 2020 that generated a lot of questions about my company’s migration from CSV to Apache Parquet. As CTO of a DaaS company, I saw firsthand how this migration had a drastic effect for all of our customers. This session will drill into the operational burden of transforming the storage format in an ecosystem and its impact on the business.

Topics Covered

Apache Parquet


Vinoo Ganesh

Vinoo Ganesh

Vinoo Ganesh is Chief Technology Officer at Veraset, a data-as-a-service startup focused on understanding the world from a geospatial perspective. Vinoo previously managed the compute team at Palantir Technologies, tasked with managing Spark and its interaction with HDFS, S3, Parquet, YARN and Kubernetes across the company. Most recently, this team was closely involved in pushing forward a number of open source Spark initiatives, including a DataSource V2 implementation and the External shuffle service.

Ready to Get Started? Here Are Some Resources to Help

Case Study

When E-Commerce Explodes – The More Data the More Dremio

read more


Real-World Strategies to Optimize Data Platform Cost

read more
On-Demand webinar graphic


Centralize Data Security Governance on your Open Data Lakehouse with Dremio & Privacera

read more

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us