6 minute read · April 19, 2022

How Spice AI and Dremio Are Making Data Accessible in Web3

Mark Lyons

Mark Lyons · Vice President of Product Management, Dremio

We all know the next evolutionary phase of the internet will bring extraordinary opportunities. Web3 phenomena like cryptocurrencies, NFTs, smart contracts, and the metaverse are poised to disrupt traditional spheres from finance to entertainment - potentially ushering in a new era of decentralized ownership, free of traditional gatekeepers. These technologies are based on encrypted ledgers, or blockchains (like the ones used by Bitcoin and Ethereum) and take a decentralized, open approach to data. Whereas in Web2, data was owned by a handful of powerful companies (think Google, Facebook, and so on), data in the Web3 era is open and trusted, powered by shared ledgers that aren’t owned by any one entity. 

We know from the Web2 world that data is essential for delivering personalized, engaging experiences. The same holds for Web3, which at its foundation is about data. But getting data from blockchains is very hard. Companies and developers who figure out ways to discover, query, and use blockchain data have a tremendous advantage over the next few years. For that reason we’re excited to partner with startup Spice AI, which recently came out of private beta and uses Dremio to accelerate queries on massive amounts of blockchain data. 

Making It Easy to Add AI to Applications with Blockchain Data

Spice is focused on making it easy to develop smart applications for Web3. Right now, adding AI and machine learning (ML) to applications and having them learn is still much harder than it should be.   

Spice makes it easier for developers to add AI to applications with its open source solution. And it is building the data platform that they can then use as the foundation for their applications. One reason for the huge gap in applying AI in software is that AI and ML are incredibly data hungry. As Spice CEO Luke Kim puts it, there’s a “cold start” problem in AI development. Spice’s data platform leverages the vast amount of blockchain data and makes it available for training models so developers can avoid a cold start. 

Use Cases for a Web3 Data Platform

Spice sees several use cases for its Web3 data platform. For instance, in the Web2 world, recommendation systems are hugely successful. Today, if you go to Netflix, you’re served up a personalized recommendation on what to watch next. E-commerce sites like Amazon offer tailored suggestions based on your previous purchases. 

But recommendation systems don’t yet exist in the Web3 world. If you go on the NFT marketplace OpenSea and first connect your wallet, you don’t get served all the personalized recommendations for NFTs that you could, based on your wallet history. The reason? Until now, it’s been too hard and it takes too long to run queries against what might be 10 terabytes or more of blockchain data. 

In addition to recommendations for marketplaces, Spice is also working on a number of use cases, including: 

  • Clustering addresses and observing crypto trading behavior for fin-tech and hedge funds (for example, observing behavior across Bitcoin and Ethereum blockchains)
  • Wallet-to-wallet messaging and Web3-native marketing
  • Authenticity services for NFTs

How Spice AI Uses Dremio to Accelerate Data Access on an Open Lakehouse Platform

To handle massive quantities of blockchain data, Spice combines multiple data techniques, including using Apache Parquet files stored in ADLS for long-term storage and time-series databases for real-time access. The company intends to be available on Amazon S3 and Google Cloud Storage in the future. The founders’ experience with Azure (Kim and CTO Phillip LeBlanc have long-standing experience with Azure and Microsoft) led them to start with ADLS first.  

The company extracts-transforms-loads (ETLs) data from blockchain nodes via blockchain-tailored processing pipelines and time-series databases.The company archives data from blockchain nodes and ETLs the data to a database. They then use a variety of processes to coalesce that data and then consolidate it into bigger Parquet files. Dremio enables Spice to query these very large datasets and get subsecond responses.

For Spice, choosing Dremio’s open lakehouse platform was a matter of the company’s open source DNA as well as technology like Reflections and Apache Arrow. Says Kim, “Building upon a foundation of Arrow and Arrow Flight creates a unified platform across AI and data.” Another factor is Dremio’s ability to unify data systems including Parquet in storage, traditional RDBMS, and NoSQL database systems. Because the nature of blockchain data is unique, one size does not fit all. Dremio allows Spice to stitch all these systems together. 

Using Reflections, the Spice team can get real-time insights that aren’t available otherwise. “In half a second we can get blockchain data,” says Kim. “We’re significantly faster on Dremio architecture and have the ability to join across blockchains.”

Get started with Spice for Web3 data at https://spice.xyz/. Sign up for the Dremio open lakehouse platform at https://www.dremio.com/get-started/.

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.