AI memory is really a database problem

Monday December 8, 2025. 10:00 AM , from InfoWorld

The pace at which large language models (LLMs) evolve is making it virtually impossible to keep up. Allie Miller, for example, recently ranked her go-to LLMs for a variety of tasks but noted, “I’m sure it’ll change next week.” Why? Because one will get faster or come up with enhanced training in a particular area. What won’t change, however, is the grounding these LLMs need in high-value enterprise data, which means, of course, that the real trick isn’t keeping up with LLM advances, but figuring out how to put memory to use for AI.

If the LLM is the CPU, as it were, then memory is the hard drive, the context, and the accumulated wisdom that allows an agent to usefully function. If you strip an agent of its memory, it is nothing more than a very expensive random number generator. At the same time, however, infusing memory into these increasingly agentic systems also creates a new, massive attack surface.

Most organizations are treating agent memory like a scratchpad or a feature behind an SDK. We need to start treating it as a database—and not just any database, but likely the most dangerous (and potentially powerful) one you own.

The soft underbelly of agentic AI

Not long ago, I argued that the humble database is becoming AI’s hippocampus, the external memory that gives stateless models something resembling long-term recall. That was before the current wave of agentic systems really hit. Now the stakes are higher.

As my colleague Richmond Alake keeps pointing out in his ongoing “agent memory” work, there is a crucial distinction between LLM memory and agent memory. LLM memory is really just parametric weights and a short-lived context window. It vanishes when the session ends. Agent memory is different. It is a persistent cognitive architecture that lets agents accumulate knowledge, maintain contextual awareness, and adapt behavior based on historical interactions.

Alake calls the emerging discipline “memory engineering” and frames it as the successor to prompt or context engineering. Instead of just stuffing more tokens into a context window, you build a data-to-memory pipeline that intentionally transforms raw data into structured, durable memories: short term, long term, shared, and so on.

That may sound like AI jargon, but it is really a database problem in disguise. Once an agent can write back to its own memory, every interaction is a potential state change in a system that will be consulted for future decisions. At that point, you are not tuning prompts. You are running a live, continuously updated database of things the agent believes about the world.

If that database is wrong, your agent will be confidently wrong. If that database is compromised, your agent will be consistently dangerous. The threats generally fall into three buckets:

Memory poisoning. Instead of trying to break your firewall, an attacker “teaches” the agent something false through normal interaction. OWASP (Open Worldwide Application Security Project) defines memory poisoning as corrupting stored data so that an agent makes flawed decisions later. Tools like Promptfoo now have dedicated red-team plug-ins that do nothing but test whether your agent can be tricked into overwriting valid memories with malicious ones. If that happens, every subsequent action that consults the poisoned memory will be skewed.

Tool misuse. Agents increasingly get access to tools: SQL endpoints, shell commands, CRM APIs, deployment systems. When an attacker can nudge an agent into calling the right tool in the wrong context, the result looks indistinguishable from an insider who “fat-fingered” a command. OWASP calls this class of problems tool misuse and agent hijacking: The agent is not escaping its permissions; it is simply using them for the attacker’s benefit.

Privilege creep and compromise. Over time, agents accumulate roles, secrets, and mental snapshots of sensitive data. If you let an agent assist the CFO one day and a junior analyst the next, you have to assume the agent now “remembers” things it should never share downstream. Security taxonomies for agentic AI explicitly call out privilege compromise and access creep as emerging risks, especially when dynamic roles or poorly audited policies are involved.

New words, old problems

The point is not that these threats exist. The point is that they are all fundamentally data problems. If you look past the AI wrapper, these are exactly the things your data governance team has been chasing for years.

I’ve been suggesting that enterprises are shifting from “spin up fast” to “get to governed data fast” as the core selection criterion for AI platforms. That is even more true for agentic systems. Agents operate at machine speed with human data. If the data is wrong, stale, or mislabelled, the agents will be wrong, stale, and will misbehave much faster than any human could manage.

“Fast” without “governed” is just high-velocity negligence.

The catch is that most agent frameworks ship with their own little memory stores: a default vector database here, a JSON file there, a quick in-memory cache that quietly turns into production later. From a data governance perspective, these are shadow databases. They often have no schema, no access control lists, and no serious audit trail.

We are, in effect, standing up a second data stack specifically for agents, then wondering why no one in security feels comfortable letting those agents near anything important. We should not be doing this. If your agents are going to hold memories that affect real decisions, that memory belongs inside the same governed-data infrastructure that already handles your customer records, HR data, and financials. Agents are new. The way to secure them is not.

Revenge of the incumbents

The industry is slowly waking up to the fact that “agent memory” is just a rebrand of “persistence.” If you squint, what the big cloud providers are doing already looks like database design. Amazon’s Bedrock AgentCore, for example, introduces a “memory resource” as a logical container. It explicitly defines retention periods, security boundaries, and how raw interactions are transformed into durable insights. That is database language, even if it comes wrapped in AI branding.

It makes little sense to treat vector embeddings as some distinct, separate class of data that sits outside your core database. What’s the point if your core transactional engine can handle vector search, JSON, and graph queries natively? By converging memory into the database that already holds your customer records, you inherit decades of security hardening for free. As Brij Pandey notes, databases have been at the center of application architecture for years, and agentic AI doesn’t change that gravity—it reinforces it.

Yet, many developers still bypass this stack. They spin up standalone vector databases or use the default storage of frameworks like LangChain, creating unmanaged heaps of embeddings with no schema and no audit trail. This is the “high-velocity negligence” I mentioned above. The solution is straightforward: Treat agent memory as a first-class database. In practice this means:

Define a schema for thoughts. You typically treat memory as unstructured text, but that’s a mistake. Agent memory needs structure. Who said this? When? What is the confidence level? Just as you wouldn’t dump financial records into a text file, you shouldn’t dump agent memories into a generic vector store. You need metadata to manage the life cycle of a thought.

Create a memory firewall. Treat every write into long-term memory as untrusted input. You need a “firewall” logic layer that enforces schema, validates constraints, and runs data loss prevention checks before an agent is allowed to remember something. You can even use dedicated security models to scan for signs of prompt injection or memory poisoning before the data hits the disk.

Put access control in the database, not the prompt. This involves implementing row-level security for the agent’s brain. Before an agent helps a user with “level 1” clearance (a junior analyst), it must be effectively lobotomized of all “level 2” memories (the CFO) for that session. The database layer, not the prompt, must enforce this. If the agent tries to query a memory it shouldn’t have, the database should return zero results.

Audit the “chain of thought.” In traditional security, we audit who accessed a table. In agentic security, we must audit why. We need lineage that traces an agent’s real-world action back to the specific memory that triggered it. If an agent leaks data, you need to be able to debug its memory, find the poisoned record, and surgically excise it.

Baked-in trust

We tend to talk about AI trust in abstract terms: ethics, alignment, transparency. Those concepts matter. But for agentic systems operating in real enterprises, trust is concrete.

We are at the stage in the hype cycle where everyone wants to build agents that “just handle it” behind the scenes. That is understandable. Agents really can automate workflows and applications that used to require teams of people. But behind every impressive demo is a growing memory store full of facts, impressions, intermediate plans, and cached tool results. That store is either being treated like a first-class database or not.

Enterprises that already know how to manage data lineage, access control, retention, and audit have a structural advantage as we move into this agentic era. They do not have to reinvent governance. They only have to extend it to a new kind of workload.

If you are designing agent systems today, start with the memory layer. Decide what it is, where it lives, how it is structured, and how it is governed. Then, and only then, let the agents loose.