Anatomy of an AI agent knowledge base

Monday November 24, 2025. 10:00 AM , from InfoWorld

AI agent fervor has permeated the software development world. But, we’re no longer talking about a singular, all-knowing AI. Rather, emerging agentic workflows rely on multiple specialized agents working together.

So-called “agentic AI” has a strong business case, but it raises a big unanswered question: How should agents talk to each other, retain memory, and share knowledge?

That’s where a shared agentic knowledge base comes in. A knowledge base for AI agents is like a meta system prompt that all agents can access. “Think of it as a way to fine-tune the agent,” says Christian Posta, global field CTO at Solo.io, a provider of cloud operations software.

As agents multiply and interconnected behaviors grow more complex, a shared knowledge base (or knowledge graph) keeps them aligned.

“An internal knowledge base is essential for coordinating multiple AI agents,” says James Urquhart, field CTO and technology evangelist at Kamiwaza AI, maker of a distributed AI orchestration platform. “When agents specialize in different roles, they must share context, memory, and observations to act effectively as a collective.”

Designed well, a knowledge base ensures agents have access to up-to-date and comprehensive organizational knowledge. Ultimately, this improves the consistency, accuracy, responsiveness, and governance of agentic responses and actions.

The benefits are clear. But what actually goes into such a repository? Below, we’ll look at the core content within an AI agent knowledge base, explore implementation approaches and retrieval methods, and consider the bottlenecks.

What an agentic knowledge base contains

A knowledge base for AI agents can hold many things: documentation, policies, style guides, sample code, workflows, compliance rules, and more. “A knowledge base for AI agents contains the full spectrum of a company’s operational reality,” says Igor Beninca, data science manager at Indicium, a data and AI services firm.

Because enterprise data varies widely, a knowledge base will combine structured, semi-structured, and unstructured data. It should span everything from static rules to dynamic chat conversations. Really, any data that could be vectorized for training AI is fair game. That said, some common content types shine through for AI agent use cases.

Procedures and policies

Most knowledge bases include procedures and policies for agents to follow, such as style guides, coding conventions, and compliance rules. They might also document escalation paths, defining how to respond to user inquiries.

“The content mirrors what you’d find in a senior employee’s mental toolkit, but structured for machine consumption,” says AJ Sunder, chief information and chief product officer at Responsive, a provider of AI-powered response management software.

Structured data

Structured data, often formatted in JSON, YAML, or CSV, includes databases, sample code, API documentation, schemas, and service-level agreements. A specific example is a machine-readable product table that lists prices, packages, or configurations.

“A good knowledge base would look a bit like Wikipedia—a structured data catalog that is easily searchable,” says Ankit Jain, CEO and co-founder of Aviator, a provider of developer workflow automation tools.

Semi-structured data includes internal wikis, workflow guides, and detailed runbooks. Another tactic is to capture data relationships using custom field mappings, which are schemas that specify how internal data is mapped to external fields, so agents can interpret these relationships.

Unstructured data

Next up is unstructured data. This includes text and media such as images, audio, PDFs, or video. Meeting notes, recordings, and diagrams that visualize decision-making are common examples. Text-based cues, or broadly defined relationships between concepts, can also supply helpful directions.

“Successful knowledge bases include ‘negative examples,’ what not to say or do, and contextual decision trees that help agents navigate edge cases,” says Responsive’s Sunder.

Memory and relationships

Lastly, persistent memory helps agents retain context across sessions. Access to past prompts, customer interactions, or support tickets helps continuity and improves decision-making, because it enables agents to recognize patterns. But importantly, most experts agree you should make explicit connections between data, instead of just storing raw data chunks.

Sunder cites service-level agreements (SLAs) as an example. Instead of stating “Our SLA is 24 hours,” a richer model would specify, “Our SLA applies to enterprise customers, except during maintenance windows, unless escalated by account managers.”

Implementing the knowledge base

At the core of an agentic knowledge base are two main components: an object store and a vector database for embeddings. Whereas a vector database is essential for semantic search, an object store checks multiple boxes for AI workloads: massive scalability without performance bottlenecks, rich metadata for each object, and immutability for auditability and compliance.

Beyond these fundamentals, organizations don’t necessarily need to buy new SaaS applications or infrastructure. The better option is to extend what you already have. “The pragmatic approach is to build a layer on top of existing systems, with the right connectors to make data accessible to agents,” says Rotem Weiss, founder and CEO of Tavily, maker of a real-time search engine for large language models (LLMs).

Still, unifying multiple data sources may require an abstraction layer. “The most effective strategy is to create an abstraction layer that exposes data from various sources to agents via APIs,” says Indicium’s Beninca. “This allows businesses to leverage existing knowledge management systems like Confluence, tap into data warehouses for real-time structured information, and integrate vector databases for semantic search.”

Others agree that knowledge bases don’t need to be built from scratch, but maintenance challenges remain. “Most of the existing knowledge bases can be retrofitted to support AI agents,” says Aviator’s Jain. He adds that maintaining a knowledge base is a lot harder than creating one. To solve this, agents themselves should capture new information and keep the knowledge base up-to-date.

Given the technical nuances, experts suggest starting small and expanding on early successes. “Try to focus on measured proof-of-concept projects where unique organizational knowledge and data can be curated and surfaced to agents via tools,” says Greg Jennings, VP of engineering, AI, at Anaconda, the provider of a platform that helps organizations build secure AI with open source.

Connecting to the knowledge base

Now comes actually connecting to the data, which is more complex than you might think, given that there are many schools of thought for data retrieval in AI.

The consensus is that agent knowledge bases benefit from a multi-modal retrieval strategy: vector search finds semantically similar concepts, graph traversal identifies relationships between data, and keyword search pinpoints exact matches.

“AI agents generally connect to knowledge bases through APIs or retrieval-augmented generation (RAG) pipelines,” says Neeraj Abhyankar, VP, data and AI at R Systems, a software engineering consultancy. He adds that Model Context Protocol (MCP) will likely play a role as the leading standard for how agents access tools and data.

Others agree that MCP changes the game by standardizing agentic connections. “Instead of building custom integrations for each knowledge source, agents can plug into any MCP-compatible system,” says Sunder, noting this could even allow agents to communicate across organizational boundaries.

Beyond these methods, Solo.io’s Posta suggests a concept he calls “RAG on the wire,” in which LLM calls are intercepted by an agent gateway that performs a RAG-style look-up. “This way, the guidelines or conventions are enforced regardless of who’s calling.”

Additional retrieval techniques are emerging, including hierarchical search, which narrows broad queries into precise ones, and GraphRAG, which represents knowledge as a graph. ”In my opinion, agents will make GraphRAG more popular,” says Keith Pijanowski, AI solutions engineer at MinIO, provider of an open-source object storage server.

“GraphRAG provides agents with ‘multi-node’ knowledge, showing how knowledge is related to other knowledge,” says Pijanowski. “This more accurately represents the real world, enabling agents to perform more complex reasoning and actions. Standard RAG relies on a flat document structure.”

No ‘one-size-fits-all’

Some best practices for AI agent knowledge bases are materializing across industries. These are mainly around technical execution: version control, retrieval strategies, memory of past chats, access controls, prompt chaining, embedding, and data-refresh processes.

Still, while infrastructure and design patterns may be transferable, each knowledge base will inevitably reflect an organization’s custom domain logic and workflows. As Indicium’s Beninca says, “customization is not an optional extra—it is a fundamental requirement for achieving a positive return on investment.”

Responsive’s Sunder agrees that knowledge bases are not one-size-fits-all. “The infrastructure patterns are emerging, but the ontologies remain highly specialized,” Sunder says. “I am not seeing convergence yet. Every industry has its own conceptual vocabulary and regulatory requirements.”

The data and intended use cases will be highly industry-dependent. “Vertical customization is non-negotiable,” says R Systems’s Abhyankar, who notes that healthcare will need HIPAA-aware schemas, while agents in retail may prioritize inventory logic.

Each organization’s data moat, and in turn its knowledge base, will mirror its unique business logic.

“Everyone’s using similar vector databases, embedding models, and search technologies,” says Aviator’s Jain. “However, the knowledge schemas, validation rules, and business logic remain highly customized. The ‘how’ is becoming standardized while the ‘what’ stays wildly different.”

Keep the knowledge fresh

According to Microsoft’s 2025 Work Trend Index, 46% of business leaders say their companies are already using agents to automate workflows or processes, with a growing share exploring multi-agent systems as well. As consultancies like Deloitte double down on multi-agent approaches, the momentum is expected to continue.

Software engineering offers a clear example of how agents accelerate existing processes. Over 90% of developers now use AI coding tools, saving an average of 3.6 hours per week, according to DX’s AI-Assisted Engineering Q4 Impact Report, which analyzed data from nearly 60,000 developers. Yet despite faster throughput, code quality remains inconsistent, underscoring the need for stronger baselines and shared context among AI agents.

The same need for shared understanding is true for agents assisting end users in other contexts. But the key here is ongoing maintenance, because “shared understanding” could become “shared misconception” really fast.

Since organizational knowledge is always evolving, updating the system to keep data fresh, without duplicating knowledge or breaking agentic behavior, will be the major hurdle. As Aviator’s Jain says, ”the biggest challenge is maintaining the knowledge base, more specifically, maintaining the quality and freshness of the data.”

Sunder agrees. “Freshness, or lack thereof, is the silent killer of AI knowledge systems.”