FactSet, a leading provider of content in financial services, is focused on continuously improving our data pipeline and data fetch APIs. Most pipeline utilities like Flink and Spark require writing code to their API to define the pipeline, and to cover the breadth of content we offer, various departments have had to write custom ETL code for adding value at various parts of the content enrichment process. In order to standardize and simplify this common workflow, we decided to create a configuration file-based utility that still gives us the granular control we need, but allows encapsulation of the data movements and flows in centralized config files, reducing or eliminating the disparate custom ETL scripts. This case study will examine why we chose to leverage Golang and Apache Arrow to mix new data with our legacy sources and existing stack as we modernize our fetch code paths, and discuss other technologies we leveraged in order to do so.
Hailing from the faraway land of Brentwood, NY, and currently residing in the rolling hills of Connecticut, Matt Topol has always been passionate about software. After graduating from Brooklyn Polytechnic (now NYU-Poly), he joined FactSet Research Systems, Inc. in 2009 to develop financial software. In the time since, Matt has worked in infrastructure and application development, has lead development teams, and has architected large-scale distributed systems for processing analytics on financial data. Matt is a committer on the Apache Arrow repository, frequently enhancing the Golang library and helping to grow the Arrow Community. Recently, Matt wrote the first and only book on Apache Arrow, “In-Memory Analytics with Apache Arrow,” and joined Voltron Data in order to work on the Apache Arrow libraries full-time and grow the Arrow Golang community.
In his spare time, Matt likes to bash his head against a keyboard, develop/run delightfully demented games of fantasy for his victims–er–friends, and share his knowledge with anyone interested who’ll listen to his rants.