|
Navigation
Search
|
Databricks fires back at Snowflake with SQL-based AI document parsing
Thursday November 13, 2025. 12:27 PM , from InfoWorld
Databricks and Snowflake are at it again, and the battleground is now SQL-based document parsing.
In an intensifying race to dominate enterprise AI workloads with agent-driven automation, Databricks has added SQL-based AI parsing capabilities to its Agent Bricks framework, just days after Snowflake introduced a similar ability inside its Intelligence platform. The new abilities from Snowflake and Databricks are designed to help enterprises analyze unstructured data, preferably using agent-automated SQL, backed by their individual existing technologies, such as Cortex AISQL and Databricks’ AI Functions. The ability to query unstructured data using relatively simpler yet automated methods compared to building and running costly ETL pipelines is a critical cog in the common goal that cloud data warehouses like Snowflake and Databricks share: help enterprises reduce cost and complexity by enabling unified queries across structured and unstructured data — a capability traditional warehouses lack as they are designed for analyzing structured data. This goal is currently in sync with enterprises’ demand, said Mansi Gupta, practice director at Everest Group: “In today’s cost-conscious environment, enterprises want to leverage massive, complex datasets without driving up spend”. Additionally, the ability to query structured and unstructured data simultaneously typically helps enterprises generate more accurate insights and accelerate decision-making. What is Databricks’ new AI document parsing capability? Databricks’ new capability — “ai_parse_document”, which is in public preview, is a new addition to Agent Bricks’ AI Functions, a subset of Databricks’ AI Functions targeted at helping enterprises create autonomous agents for specific use cases. When invoked in an agent workflow via Agent Bricks, ai-parse_document parses an entire document, not just text, although it is currently limited to formats such as PDF, JPG / JPEG, PNG, DOC/DOCX, and PPT/PPTX. “ai_parse_document captures tables, figures, and diagrams with AI-generated descriptions and spatial metadata, storing results in Unity Catalog. Your documents now behave like tables — searchable through vector search and actionable in Agent Bricks workflows,” Databricks’ Mosaic Research team wrote in a blog post. Before the introduction of the feature, Databricks users had to rely on various approaches, such as OCR, regular expressions, and custom ETL scripts, to normalize unstructured text, said Charlie Dai, vice president and principal analyst at Forrester. “With ai_parse, parsing becomes declarative and model-driven, reducing engineering overhead,” Dai added. Enterprises would also be able to extend the document parsing ability to as many documents as required with the help of an integration with Spark Declarative Pipelines, including the ability to parse documents automatically as they arrive, Databricks said. “Large-scale, incremental document processing… allows seamless ingestion, retry logic, change detection, and orchestration of new documents arriving daily. This is invaluable for production AI, compliance, and business reporting, where data freshness and reliability are essential,” said Pareekh Jain, analyst at Jain Consulting. Snowflake vs Databricks Databricks’ ai_parse, at least to some extent, is similar to Snowflake’s recently showcased Agentic Document Analytics offering that is being marketed as a complementary approach to current RAG practices, allowing enterprises to query thousands of documents in one go via the use of data agents. Snowflake’s Agentic Document Analytics combines the abilities of Snowflake’s existing Cortex AISQL functions, such as AI_PARSE_DOCUMENT, AI_EXTRACT, AI_FILTER, and AI_AGG, in the Intelligence platform to parse documents and analyze the contents, according to Baris Gultekin, vice president of AI at Snowflake. Comparing Snowflake’s existing AI_PARSE_DOCUMENT function, which was introduced a year back, to Agentic Document Analytics, Gultekin pointed out that while the parse function itself strengthens data quality for RAG by providing accurate retrieval context, Agentic Document Analytics enables quantitative and temporal analysis across those parsed results. According to analysts, Databricks and Snowflake’s offerings would help enterprises cut down the complexity of workflows required to analyze unstructured data, especially documents. Enterprises, historically, have had to build complex, slow, brittle OCR pipelines if they want to bring data from documents, such as PDFs, into an AI workflow, resulting in the culmination of RAG, which enabled semantic search over parsed text but still struggled with nuanced document structures like tables, said Bradley Shimmin, practice lead of data, analytics, and infrastructure at The Futurum Group. To handle documents with tables, enterprises often chained additional LLM calls to extract and reconstruct tables as JSON, which was effective but risky due to hallucinations, Shimmin said, adding that instead of stitching together OCR, RAG, and custom extraction logic, Databricks’ ai_parse collapses the entire workflow into a single declarative SQL statement. Databricks’ pitch for price performance Databricks claims that its ai_parse function offers better price performance when compared to other similar functions from rivals, as well as vision language models. “Price performance matters a lot. As an industry, we’re still figuring out how to optimize complex, agentic AI workflows, particularly in terms of how they manage context and memory assets over time. But even for basic data ingestion routines, this kind of effort can make a big difference, especially for enterprises that need to process millions, or even billions, of documents,” Shimmin said. However, he warned that enterprises should do their own benchmarking tests and not just rely on Databricks’ claims. Databricks’ pitch on price performance might give it an edge over Snowflake when it comes to enterprise customers, Shimmin said. “In a market where the two leaders have very similar top-line messaging, these kinds of cost savings for foundational workloads can make for a very compelling argument.”
https://www.infoworld.com/article/4089186/databricks-fires-back-at-snowflake-with-sql-based-ai-docum...
Related News |
25 sources
Current Date
Nov, Thu 13 - 15:00 CET
|







