Building AI agents the safe way

Monday December 22, 2025. 10:00 AM , from InfoWorld

If you want to know what is actually happening in generative AI—not what vendors pretend in their press releases, but what developers are actually building—Datasette founder Simon Willison gets you pretty close to ground truth. As Willison has been cataloguing for years on his blog, we keep making the same key mistake building with AI as we did in the web 2.0 era: We treat data and instructions as if they are the same thing. That mistake used to give us SQL injection. Now it gives us prompt injection, data exfiltration, and agents that happily (confidently!) do the wrong thing at scale.

Based on Willison’s field notes, here is why your AI agent strategy is probably a security nightmare, and how to fix it with some boring, necessary engineering.

Prompt injection is the new SQL injection

Willison wrote about a talk he gave in October on running Claude Code “dangerously.“ It’s a perfect case study in why agents are both thrilling and terrifying. He describes the productivity boost of “YOLO mode,“ then pivots to why you should fear it: Prompt injection remains “an incredibly common vulnerability.” Indeed, prompt injection is the SQL injection of our day. Willison has been banging the drum on what he calls the lethal trifecta of agent vulnerability. If your system has these three things, you are exposed:

Access to private data (email, docs, customer records)

Access to untrusted content (the web, incoming emails, logs)

The ability to act on that data (sending emails, executing code)

This is not theoretical. It’s not even exotic. If your agent can read a file, scrape a web page, open a ticket, send an email, call a webhook, or push a commit, you have created an automation system that is vulnerable to instruction injection through any untrusted input channel. You can call it “prompt injection“ or “indirect prompt injection“ or “confused deputy.“ The name doesn’t matter. The shape does.

This is where using AI to detect AI attacks starts to look like magical thinking. The security community has been warning for a year that many proposed defenses fail under adaptive attack. One June 2025 paper puts it bluntly: When researchers tune attacks to the defense, they bypass a pile of recent approaches with success rates above 90% in many cases.

In other words, we’re currently building autonomous systems that are essentially confused deputies waiting to happen. The enterprise fix isn‘t better prompts. It‘s network isolation. It‘s sandboxing. It‘s assuming the model is already compromised.

It is, in short, the same old security we used to focus on before AI distracted us from proper hygiene.

Context as a bug, not a feature

There is a lazy assumption in developer circles that more context is better. We cheer when Google (Gemini) or Anthropic (Claude) announces a two-million token window because it means we can stuff an entire codebase into the prompt.

Awesome, right? Well, no.

As I’ve written before, context is not magic memory; it’s a dependency. Every token you add to the context window increases the surface area for confusion, hallucination, and injection attacks. Willison notes that context is not free; it is a vector for poisoning.

The emerging best practice is better architecture, not bigger prompts. Think scoped tools, contexts that are small and explicit, isolated workspaces, and persistent state that lives somewhere designed for persistent state. Context discipline, in other words, means we build systems that aggressively prune what the model sees. In this way, we treat tokens as necessary but dangerous to store in bulk.

Memory is a database problem (again)

Willison calls this “context offloading,” and it’s similar to an argument I keep making: AI memory is just data engineering. For Willison, context offloading is the process of moving state out of the unpredictable prompt and into durable storage. Too many teams are doing this via “vibes,” throwing JSON blobs into a vector store and calling it memory. Notice what happens when we combine these threads:

Willison says context is not free, so you must offload state.

Offloading state means you are building a memory store (often a vector store, sometimes a hybrid store, sometimes a relational database with embeddings and metadata).

That store becomes both the agent’s brain and the attacker’s prize.

Most teams are currently bolting memory onto agents the way early web apps bolted SQL onto forms: quickly, optimistically, and with roughly the same level of input sanitization (not much). That is why I keep insisting memory is just another database problem. Databases have decades of scar tissue, such as least privilege, row-level access controls, auditing, encryption, retention policies, backup and restore, data provenance, and governance.

Agents need the same scar tissue.

Also, remember that memory is not just “What did we talk about last time?” It is identity, permissions, workflow state, tool traces, and a durable record of what the system did and why. As I noted recently, if you can’t replay the memory state to debug why your agent hallucinated, you don’t have a system; you have a casino.

Making ‘vibes’ pay

Willison is often caricatured as an AI optimist because he genuinely loves using these tools to write code. But he distinguishes between “vibe coding“ (letting the AI write scripts and hoping they work) and “vibe engineering.” The difference? Engineering.

In his “JustHTML” project, Willison didn‘t just let the LLM spit out code. He wrapped the AI in a harness of tests, benchmarks, and constraints. He used the AI to generate the implementation, but he used code to verify the behavior.

This tracks with the recent METR study that showed that developers using AI tools often took longer to complete tasks because they spent so much time debugging the AI‘s mistakes. This is, in part, because of the phenomenon I’ve called out where AI-driven code is “almost right,“ and that “almost“ takes a disproportionate amount of time to fix.

The takeaway for the enterprise is clear: AI doesn‘t replace the loop of “write, test, debug.” It just accelerates the “write” part, which means you need to double down on the “test” part.

The boring path forward

The easy days of “wrap an API and ship it” are over, if they were ever real at all. We are moving from the demo phase to the industrial phase of AI, which means that developers need to focus on evals (unit tests, etc.) as the real work. According to Hamel Husain, you should be spending 60% of your time on evaluations. Developers also need to spend much more time getting their architecture right and not simply honing their prompting skills.

The irony is that the “most pressing issues“ in genAI are not new. They’re old. We’re relearning software engineering and security fundamentals in a world where the compiler occasionally makes things up, your code can be socially engineered through a markdown file, and your application’s “state” is a bag of tokens.

So, yes, AI models are magical. But if you want to use them in the enterprise without inadvertently exposing your customer database, you need to stop treating them like magic and start treating them like untrusted, potentially destructive components.

Because, as Willison argues, there’s no “vibe engineering” without serious, boring, actual engineering.