Agentic AI won’t wait for your data architecture to catch up

Monday June 23, 2025. 05:53 PM , from InfoWorld

A decade ago, the cloud ignited a massive replatforming of application and server infrastructure. Open-source technologies like Docker and Kubernetes transformed software velocity and operational flexibility, launching a new era.

But it didn’t happen overnight. Enterprises had to adapt to shifting foundations, talent gaps, and an open-source ecosystem evolving faster than most teams could absorb.

Today, agentic AI is catalyzing a similar, profound replatforming. This shift centers on real-time data interaction, where success is measured in milliseconds, not minutes. What’s at stake is your company’s ability to thrive in new marketplaces shaped by intelligent systems.

To navigate this transition, here are key considerations for preparing your data infrastructure for agentic AI.

The AI data layer must serve polyglot, multi-persona teams

Traditional data platforms, which primarily served SQL analysts and data engineers, are no longer sufficient. Today’s AI landscape demands real-time access for a vastly expanded audience: machine learning engineers, developers, product teams, and crucially, automated agents – all needing to work with data in tools like Python, Java, and SQL.

Much as Docker and Kubernetes revolutionized cloud-native application development, Apache Iceberg has become the foundational open-source technology for this modern AI data infrastructure. Iceberg provides a transactional format for evolving schemas, time travel, and high-concurrency access.

Combined with a powerful and scalable serverless data platform, this enables real-time dataflows for unpredictable, agent-driven workloads with strict latency needs.

Together, these technologies enable fluid collaboration across diverse roles and systems. They empower intelligent agents to move beyond mere observation, allowing them to act safely and quickly within dynamic data environments.

Your biggest challenge? “Day two” operations

The greatest challenge in building data infrastructure for agentic AI lies not in technology selection, but in operationalizing it effectively.It’s not about choosing the perfect table format or stream processor; it’s about making those components reliable, cost-efficient, and secure under high-stakes workloads. These workloads require constant interaction and unpredictable triggers.

Common challenges include:

Lineage and compliance: Tracking data origins, managing changes, and supporting deletion for regulations like GDPR are complex and crucial.

Resource efficiency: Without smart provisioning, GPU and TPU costs can quickly escalate. Managed cloud offerings for OSS products help by abstracting compute management.

Access control and security: Misconfigured permissions present a significant risk. Overly broad access can easily lead to critical data being exposed.

Discovery and context: Even with tools like Iceberg, teams struggle to find the metadata needed for just-in-time dataset access.

Ease of use: Managing modern data tools can burden teams with unnecessary complexity. Simplifying workflows for developers, analysts, and agents is essential to keep productivity high and barriers low.

Without robust operational readiness, even the best-architected platforms will struggle under the constant pressure of agentic AI’s decision loops.

The right balance between open source and cloud partners

Complex infrastructure is now driven by open-source innovation, especially in data infrastructure. Here, open-source communities often pioneer solutions for advanced use cases, far exceeding the typical operational capacity of most data teams.

The biggest gaps arise when scaling open-source tools for high-volume ingestion, streaming joins, and just-in-time compute. Most organizations struggle with fragile pipelines, escalating costs, and legacy systems ill-suited to agentic AI’s real-time demands.

These are all areas where cloud providers with significant operational depth deliver critical value.

The goal is to combine open standards with cloud infrastructure that automates the most arduous tasks, from data lineage to resource provisioning. By building on open standards, organizations can effectively mitigate vendor lock-in. At the same time, partnering with cloud providers who actively contribute to these ecosystems and offer essential operational guardrails in their services enables faster deployment and greater reliability. This approach is superior to building fragile, ad-hoc pipelines or depending on opaque proprietary platforms.

For example, Google Cloud’s Iceberg integration in BigQuery combines open formats with highly scalable, real-time metadata offering high throughput streaming, automated table management, performance optimizations, integrations with Vertex AI for agentic applications.

Ultimately, your goal is to accelerate innovation while mitigating the inherent risks of managing complex data infrastructure alone.

The agentic AI skills gap is real

Even the largest companies are grappling with a shortage of talent to design, secure, and operate AI-ready data platforms. The most acute hiring challenge isn’t just data engineering, it’s also real-time systems engineering at scale.

Agentic AI amplifies operational demands and pace of change. It requires platforms that support dynamic collaboration, robust governance, and instantaneous interaction. These systems must simplify operations without compromising reliability.

Agentic AI marketplaces may prove even more disruptive than the Internet. If your data architecture isn’t built for real-time, open, and scalable use, the time to act is now. Learn more about advanced Apache Iceberg and data lakehouse capabilities here