Getting the enterprise data layer unstuck for AI

Wednesday November 26, 2025. 10:00 AM , from InfoWorld

When Miqdad Jaffer, product lead at OpenAI, challenged the illusion of ‘just add AI’ earlier this summer, he drew a line between teams crushed by hype and those who turn artificial intelligence into a lasting advantage. “The most durable and defensible moat in AI is proprietary data,” Jaffer wrote. “If your product generates unique structured data every time it is used, you pull further ahead in ways competitors cannot copy or buy.”

But that future is unevenly distributed across hyperscalers, early stage startups, and mature enterprises. OpenAI can spend billions and field armies of data infrastructure engineers. Startups have a fraction of those resources but enjoy a clean slate, building data pipelines for AI systems without the burden of legacy systems.

It is mature enterprises that have the biggest lift to get their data layer unstuck for AI. They may already own that coveted proprietary data, but they also face decades of legacy architecture and technical debt.

Every major computing wave has rewritten the data layer. Service-oriented architecture standardized system interfaces. Business intelligence and data warehousing structured analytics. Big data platforms handled scale, and streaming moved data in real time. Each shift changed how developers modeled and connected information.

AI is now pushing enterprises to rewrite the data layer again, this time around meaning, trust, and interoperability. So where to focus?

Connect what you already have

For most enterprises, the problem is not scarcity of data but lack of connection. Systems have multiplied faster than they have been integrated, each with its own schema, logic, and history. Relational models freeze early design choices, making it easier to create new applications than evolve old ones. Traditional formats store structure but not meaning. Today’s legacy debt at the data infrastructure layer looks like so many silos that hold information but hide context.

“Most companies are in this state, where you have just built new application after application, each with its own database, typically relational,” says Philip Rathle, CTO at Neo4j. “Relational databases have data models that cannot easily be evolved once they are in place. Over time it becomes simpler to build a new application than to change an existing model, which creates more sprawl.”

Once enterprises shift to graphs, their data starts to look the way the world actually behaves. “Once an enterprise has started using graphs, they see their whole world of knowledge can be represented as a graph,” Rathle said. “The world shows up as networks, hierarchies, and journeys, so why force that into tables?”

The ability to model meaning has moved graphs from niche technology to necessity. “The same organizations that started with graphs for recommendation engines or fraud detection are realizing the greater opportunity is connecting knowledge itself,” Rathle said. “The rise of AI has created broader awareness that graphs are a foundation for AI and for more accurate and transparent reasoning.”

Dave McComb, CEO of Semantic Arts, explains why this shift matters. “An ontology or a semantic database is about what things actually mean, and how things that sound similar are distinct in a way a machine can interpret,” he said. “Ambiguity is genAI’s kryptonite.” Without the clarity a knowledge graph provides, AI hallucinates faster, not smarter.

By layering graph-based connections on top of existing systems, enterprises can modernize incrementally. “If you already have structure across different parts of the business, you can connect them without rewriting everything,” Rathle said. “Over time that connected layer becomes the foundation for a knowledge graph that allows AI to understand the enterprise.”

Reclaim control of proprietary data

Even with better data modeling, enterprises face a deeper question of ownership. The convenience of multitenant software has blurred the boundaries of control. In the AI era the risk is not only exposure. It is the possibility that the competitive value inside proprietary data can be learned from and lost.

“Before AI, somebody else was storing your data, and if they got infiltrated they were a honeypot,” said Grant Miller, CEO at Replicated. “Now the data is not only being stored, it is being learned from. So it becomes part of training sets.” Once that happens, reclaiming your advantage may be impossible.

Miller argues that the answer is to bring AI to the data rather than sending data out into the world. “By removing access from the vendor, instead of sending the data to thousands of different vendors, you bring the applications to where the data already resides,” he said. “You get a major advantage.”

From his vantage point, Miller sees enterprises moving toward a four-tier model that aligns deployment with data sensitivity. “Fully managed software for low risk use, VPC-based deployments for more control, self hosted environments for sensitive systems, and air gapped setups for the highest levels of sovereignty… That is the pattern we see enterprises follow,” Miller says. “It is about aligning architecture to data sensitivity, not convenience.”

Miller believes we’re seeing a cultural turn. Enterprises that once prized cloud abstraction are reevaluating what control means when data itself has become intellectual property. “If you hand it over to a vendor, you have no idea how it is used,” Miller said. “Not your weights, not your AI.”

Build trust through shared operations

Even when data remains under enterprise control, trust breaks down if operations are fragmented. AI only works when the people who manage data, run systems, and interpret results share the same picture. Shared visibility creates explainability, accountability, and real governance.

Brian Gruttadauria, CTO of hybrid cloud at HPE, said his teams are seeing disciplines converge after years of separation. “There has to be a union between the database operations teams who understand how data is consumed, the data engineers who manage the pipeline, and the subject matter experts who are asking the abstract AI questions,” Gruttadauria said. “They all need to work together to deliver the outcome.”

Across the industry, this shift is reshaping automation. What once involved simple AIOps scripts is evolving into coordinated action between agents that communicate through protocols like the Model Context Protocol (MCP) and Agent2Agent (A2A). “In the past you opened ten tickets and involved 10 teams,” Gruttadauria said. “Now the networking agent, storage agent, and orchestration agent can communicate and make those calls directly.”

For Gruttadauria, this is the operating model for trustworthy AI: shared visibility, shared reasoning, and shared responsibility.

Stabilize APIs and evolve the stack

The final hurdle lies in the stack itself. The AI ecosystem is still a patchwork of models, frameworks, and protocols, and no enterprise can afford a full rebuild.

Anthony Annunziata, director of AI and open innovation at IBM, said the goal is not a universal stack but stability at the interaction points. “I do not think a one stack solution is what we need or can produce,” he said. “But we can standardize the APIs used to build applications and the protocols that let them communicate.”

These communication layers are moving fastest. Protocols like MCP and A2A are giving tools and agents a consistent way to exchange context and coordinate actions. They are becoming the connective tissue that allows heterogeneous systems to behave coherently.

At the data layer, standards are also maturing. Cypher, the graph query language from Neo4j, has now been formalized into ISO’s new GQL standard, the first new database language standard since SQL nearly forty years ago. “Access to the graph is done through GQL,” Rathle said. “That is the interoperability layer enterprises can rely on.”

Both Rathle and Annunziata point to how graph, relational, and vector systems can share context in a way that integrates with different models, orchestration frameworks, and systems. The aim is not unification, but dependable, open connection.

Dave McComb warns that this decision will shape the next decade. “If you build around open standards you have freedom,” he said. “If you build around proprietary APIs, you have already chosen your next migration.”

Transforming data into meaning and knowledge

Every major data evolution has battled the same problems: too many systems, too many silos, too many versions of truth. But AI raises the stakes. The challenge now is not only integrating data, but integrating meaning. When context is missing, AI does not merely degrade. It hallucinates. And when proprietary data is learned from without control, the competitive advantage inside that data can disappear for good.

“The real Holy Grail of an AI enterprise is not having functionality restricted to the knowledge inside a given silo, but the knowledge of the entire enterprise,” said Philip Rathle. Achieving that requires shared understanding across systems, not only shared access.

Dave McComb is blunt about what happens if enterprises get this wrong. “The legacy systems of tomorrow are being written today,” he said.

The cost of fragmentation is no longer inefficiency, but misaligned models, lost advantage, and AI systems that cannot be trusted. The future will belong to companies that understand the value of connected data and that structure their data to enable AI to derive meaning and knowledge.