Data streamer Confluent has updated its Tableflow software to give AI and analytics software access to real-time operational data in data warehouses and lakes.
The Tableflow updates build on the Confluent-Databricks partnership announced in February to add bi-directional integration between Tableflow, Delta Lake, and the Databricks Unity Catalog.
Specifically, support for Apache Iceberg is now generally available and an Early Access Program for Delta Lake support is open. Production workload teams “can now instantly represent Apache Kafka topics as Iceberg tables to feed any data warehouse, data lake, or analytics engine for real-time or batch processing use cases.”
Tableflow also offers enhanced data storage flexibility and integrations with catalog providers, such as the AWS Glue Data Catalog and the Snowflake Open Catalog (a managed service for Apache Polaris). The Apache Iceberg and Delta Lake support enable real-time and batch processing and help ensure real-time data consistency across applications.
Confluent’s Chief Product Officer, Shaun Clowes, stated: “With Tableflow, we’re bringing our expertise of connecting operational data to the analytical world. Now, data scientists and data engineers have access to a single, real-time source of truth across the enterprise, making it possible to build and scale the next generation of AI-driven applications.”
Citing the IDC FutureScape: Worldwide Digital Infrastructure 2025 Predictions report, which states: “By 2027, after suffering multiple AI project failures, 70 percent of IT teams will return to basics and focus on AI-ready data infrastructure platforms,” Confluent claims AI projects are failing because old development methods cannot keep pace with new consumer expectations.
The old development methods, or so says Confluent, are related to an IDC finding: “Many IT organizations rely on scores of data silos and a dozen or more different copies of data. These silos and redundant data stores can be a major impediment to effective AI model development.”
The lesson Confluent draws is that instead of many silos, you need a unified system that knows “the current status of a business and its customers and take action automatically.” Business operational data needs to get to the analytics and AI systems in real-time. It says: “For example, an AI agent for inventory management should be able to identify if a particular item is trending, immediately notify manufacturers of the increased demand, and provide an accurate delivery estimate for customers.”
Tableflow, it declares, simplifies the integration between operational data and analytical systems, because it continuously updates tables used for analytics and AI with the exact same data from business applications connected to the Confluent Cloud. Confluent says this is important as AI’s power depends on the quality of the data feeding it.
The Delta Lake support is AI-relevant as well as it’s “used alongside many popular AI engines and tools.”
Of course, having both real-time and batch data available though Iceberg and Delta Lake tables in Databricks and other data warehouses and lakes is not enough for AI large language model processing; the data needs to be tokenized and vectorized first.
Confluent is potentially ready for this, with its Create Embeddings action, a no-code feature “to generate vector embeddings in real time, from any model, to any vector database, across any cloud platform.”
Users can bring their own storage to Tableflow, using any storage bucket. Tableflow’s Iceberg tables support access for analytical engines such as Amazon Athena, EMR, and RedShift, and other data lakes and warehouses, including Snowflake, Dremio, Imply, Onehouse, and Starburst.
Read the full story, via Blocks & Files.