AI agents are unlike any technology ever

Friday November 22, 2024. 12:00 PM , from ComputerWorld

The agents are coming, and they represent a fundamental shift in the role artificial intelligence plays in businesses, governments, and our lives.

The biggest news in agentic AI happened this month when we learned that OpenAI’s agent, Operator, is expected to launch in January.

OpenAI Operator will function as a personal assistant that can take multi-step actions on its own. We can expect Operator to be put to work writing code, booking travel, and managing daily schedules. It will do all this by using the applications already installed on your PC and by using cloud services.

It joins Anthropic, which recently unveiled a feature for its AI models called “Computer Use.” This allows Claude 3.5 Sonnet to perform complex tasks on computers autonomously. The AI can now move the mouse, click on specific areas, and type commands to complete intricate tasks without constant human intervention.

We don’t know exactly how these tools will work or even whether they’ll work. Both are in what you might call “eta” — aimed mainly at developers and early adopters.

But what they represent is the coming age of agentic AI.

What are AI agents?

A great way to understand agents is to compare them with something we’ve all used before: AI chatbots like ChatGPT.

Existing, popular LLM-based chatbots are designed around the assumption that the user wants, expects, and will receive text output—words and numbers. No matter what the user types into the prompt, the tool is ready to respond with letters from the alphabet and numbers from the numeric system. The chatbot tries to make the output useful, of course. But no matter what, it’s designed for text in, text out.

Agentic AI is different. An agent doesn’t dive straight away into the training data to find words to string together. Instead, it stops to understand the user’s objective and comes up with the component parts to achieve that goal for the user. It plans. And then it executes that plan, usually by reaching out and using other software and cloud services.

AI agents have three abilities that ordinary AI chatbots don’t:

1. Reasoning: At the core of an AI agent is an LLM responsible for planning and reasoning. The LLM breaks down complex problems, creates plans to solve them, and gives reasons for each step of the process.

2. Acting: AI agents have the ability to interact with external programs. These software tools can include web searches, database queries, calculators, code execution, or other AI models. The LLM determines when and how to use these tools to solve problems.

3. Memory Access: Agents can access a “memory” of what has happened before, which includes both the internal logs of the agent’s thought process and the history of conversations with users. This allows for more personalized and context-aware interactions.

Here’s a step-by-step look at how AI agents work:

The user types or speaks something to the agent.

The LLM creates a plan to satisfy the user’s request.

The agent tries to execute the plan, potentially using external tools.

The LLM looks at the result and decides if the user’s objective has been met. If not, it starts over and tries again, repeating this process until the LLM is satisfied.

Once satisfied, the LLM delivers the results to the user.

Why AI agents are so different from any other software

“Reasoning” and “acting” (often implemented using the ReACT — Reasoning and Acting) framework) are key differences between AI chatbots and AI agents. But what’s really different is the “acting” part.

If the main agent LLM decides that it needs more information, some kind of calculation, or something else outside the scope of the LLM itself, it can choose to solve its problem using web searches, database queries, calculations, code execution, APIs, and specialized programs. It can even choose to use other AI models or chatbots.

Do you see the paradigm shift?

Since the dawn of computing, the users who used software were human beings. With agents, for the first time ever, the software is also a user who uses software.

Many of the software tools agents use are regular websites and applications designed for people. They’ll look at your screen, use your mouse to point and click, switch between windows and applications, open a browser on your desktop, and surf the web — in fact, all these abilities exist in Anthropic’s “Computer Use” feature. Other tools that the agent can access are designed exclusively for agent use.

Because agents can access software tools, they’re more useful, modular, and adaptable. Instead of training an LLM from scratch, or cobbling together some automation process, you can instead provide the tools the agent needs and just let the LLM figure out how to achieve the task at hand.

They’re also designed to handle complex problem-solving and work more autonomously.

The oversized impact of the coming age of agents

When futurists and technology prognosticators talk about the likely impact of AI over the next decade, they’re mostly talking about agents.

AI agents will take over many of the tasks in businesses that are currently automated, and, more impactfully, enable the automation of all kinds of things now done by employees looking to offload mundane, repetitive and complicated tasks to agents.

Agents will also give rise to new jobs, roles, and specialties related to managing, training, and monitoring agentic systems. They will add another specialty to the cybersecurity field, which will need agents to defend against cyber attackers who are also using agents.

As I’ve been saying for many years, I believe augmented reality AI glasses will grow so big they’ll replace the smartphone for most people. Agentic AI will make that possible.

In fact, AI smart glasses and AI agents were made for each other. Using streaming video from the glasses’ camera as part of the multimodal input (other inputs being sound, spoken interaction, and more), AI agents will constantly work for the user through simple spoken requests.

One trivial and perfectly predictable example: You see a sign advertising a concert, looking directly at it (enabling the camera in your glasses to capture that information), and tell your agent you’d like to attend. The agent will book the tickets, add it to your calendar, invite your spouse, hire a babysitter and arrange a self-driving car to pick you up and drop you off.

Like so many technologies, AI will both improve and degrade human capability. Some users will lean on agentic AI like a crutch to never have to learn new skills or knowledge, outsourcing self-improvement to their agent assistants. Other users will rely on agents to push their professional and personal educations into overdrive, learning about everything they encounter all the time.

The key takeaway here is that while agentic AI sounds like futuristic sci-fi, it’s happening in a big way starting next year.