Explained: How Salesforce Agentforce’s Atlas reasoning engine works to power AI agents

Monday September 30, 2024. 12:44 PM , from InfoWorld

Salesforce’s recently launched Agentforce suite of low-code tools is betting on a human-like reasoning engine called Atlas that could be a game-changer in building AI agents.

What that essentially means is that this engine makes the AI agents created via Agentforce autonomous or in other words allows them to think like human beings and take actions on their own compared to the previous generation of Salesforce agents which had to be configured with workflows to be able to take actions.

The Atlas engine helps build autonomous AI agents created via Agentforce or in other words allows them to think like a human being compared to the previous generation of Salesforce agents which had to be configured with workflows to be able to take actions.

AI agents can take over manual system management tasks from developers or enterprise architects without requiring any human intervention, allowing developers or architects to move to a supervisory role.

Giving an example of how autonomous agents differ from their predecessors at Dreamforce 2024, Salesforce CEO Clara Shih said that the new agents were based on the premise of moving from content generation to letting generative AI systems take actions on their own.

“In the generative phase, you might ask a copilot to write an email for you to a customer. In the agentic phase, you can ask a harder question: ‘What should I do with all of my customers?’ Maybe it’s email, maybe it’s picking up the phone and calling, maybe it’s sending a text message,” Shih said. “That’s really what agents can do: They can take a higher-order question, break it down into a series of steps, and then execute each of those steps.”

Beyond chit-chat and the query detector

This automation is triggered and taken care of by the Atlas reasoning engine inside Agentforce.

The reasoning engine, which was developed by a group of researchers at Salesforce over nearly a year, combines multiple large language models (LLMs), large action models (LAMs), the Atlas retrieval augmented generation (RAG) module, REST APIs, and different data connectors to datasets or knowledge repositories.

“In essence, for any given query that comes in, the Atlas system uses between eight and 12 different types of language models that are specialized for that particular subtest,” said Phil Mui, SVP of Salesforce AI Research, who led the team developing Atlas.

These modules of the Atlas reasoning engine kick in once a user inputs a query and gets past the Einstein Trust Layer, which checks the query for abusive content, Mui explained, adding that the first step of the engine is to check and determine if the user input is valid or just chit-chat.

Salesforce defines anything that is out of the scope for the agent to answer as chit-chat and once the Chit-Chat Detector, basically an underlying LLM, finds that the user is engaging in chit-chat, it shunts back the query to the user with a typical corporate response, such as I don’t know about it, as decided or deemed fit by the enterprise using the autonomous AI Agent.

If the query passes the chit-chat detector, it enters what Salesforce calls the evaluation phase where the query passes through another LLM, dubbed the Query Evaluator, which determines if the reasoning engine has enough information to process the query.

The Query Evaluator activates the query expansion process inside the AI Agent, which in turn is handled by another LLM, and finally with the help of the secondary LLM determines if the user’s query can be answered given the data and information provided.

The query expansion process, according to Mui, is the AI Agent breaking it down into chunks for processing.

If the evaluator determines that it cannot fulfill the query, the Agent goes back to the user to seek additional information as it sees fit to fulfill the query.

“So, we call this phenomenon the agentic loop. It literally loops over, and the more powerful the reasoning engine, the fewer times you have to loop, and the faster you get to respond to the user,” Mui explained.

Laterally, when the user is providing additional input, the Agent processes the additional input and the original query together in what Salesforce calls the context refinement phase.

The combination of the inputs allows the Atlas reasoning engine to plan the query execution and determine a response to the user query.

The ‘autonomous’ piece in Agentforce Agents

Post the context refinement process and query expansion, Atlas plans and executes the query using LAMs and APIGen that help in function calling to generate a response to the query, Mui said.

According to experts, function calling can be defined as an AI development technique to help LLMs connect with external tools and APIs that are needed to execute a user request.

While APIGen is an automated data generation pipeline designed to produce verifiable high-quality datasets for function-calling applications, LAMs are smaller models inside an LLM, designed using the mixture of experts (MoE) architecture, that are usually used to invoke software functions using function calling to complete actions or tasks in the real world.

Additionally, as part of the query planning and execution process, Atlas uses three tools, namely a re-ranker, a refiner, and a response synthesizer, Mui explained.

As the names of the tools suggest, the re-ranker is used to rank or re-rank context retrieved from different knowledge sources to answer the query; and the refiner refines the context retrieved, akin to the query refinement phase, after it has been reranked.

The response synthesizer, on the other hand, is used to stitch together the different results to form a response for the user, preferably in natural language.

Once the final query is generated, it goes through the Quality Evaluator module, which in turn checks the quality of the response before it is sent to the user.

Further, Mui pointed out that there are several other LLMs that the response goes through before being provided to the user.

“There are over five more language models used to produce responses to the query that the end-user will see,” the research head explained, adding that these models perform tasks such as toxicity detection, bias infection detection, harm injection prevention, personally identifiable information (PII) de-masking, and PII masking.

“Three of these models are specifically assigned: one for toxicity detection, one for prompt injection detection, and a PII detection model that enables PII masking and de-masking.”