The magic of RAG is in the retrieval

Wednesday August 14, 2024. 10:30 AM , from InfoWorld

The decades-long pursuit to capture, organize and apply the collective knowledge within an enterprise has failed time and again because available software tools were incapable of understanding the noisy unstructured data that comprises the vast majority of the enterprise knowledge base. Until now. Large language models (LLMs) that power generative AI tools excel at processing and understanding unstructured data, making them ideal for powering enterprise knowledge management systems.

To make this shift to generative AI work in the enterprise, a dominant architectural pattern has emerged: retrieval-augmented generation (RAG), combined with an “AI agents” approach. RAG introduces an information retrieval component to generative AI, allowing systems to access external data beyond an LLM’s training set and constrain outputs to this specific information. And by deploying a sequence of AI agents to perform specific tasks, teams can now automate entire complex, multi-stage knowledge workflows using RAG—tasks that previously could only be performed by humans.

Potential use cases for this approach are vast. Credit risk analysis, scientific research, legal analysis, and customer support are just some workflows that depend on proprietary or domain-specific data—where accuracy is a hard requirement, and hallucinations are a show-stopper.

Despite its recent emergence, RAG is already facing criticism, with some prematurely declaring it a failure. However, understanding RAG’s core function—enabling LLMs to access and summarize external data—reveals that such blanket dismissals likely stem from isolated implementation issues rather than fundamental flaws in the concept. Integrating generative AI with external, domain-specific data is a crucial requirement for enterprise AI applications. Meta, Google, Amazon, and Microsoft, as well as a number of AI startups, have successfully implemented RAG-based solutions at scale. OpenAI has also added RAG capabilities to its ChatGPT product.

Why RAG implementations fail

However, there isn’t a one-size-fits-all approach to RAG. RAG implementations vary, and there are some key reasons why a RAG deployment might fail or fall short of expectations.

While LLMs attract more attention, the real magic of RAG is in the retrieval model and its upstream components. RAG deployments live and die by the quality of the source content and the retrieval model’s ability to filter the large data source down to useful data points before feeding it to an LLM. Therefore, most of the development effort should focus on optimizing the retrieval model and ensuring high-quality data.

The role of the LLM in a RAG system is to simply summarize the data from the retrieval model’s search results, with prompt engineering and fine-tuning to ensure the tone and style are appropriate for the specific workflow. All the leading LLMs on the market support these capabilities, and the differences between them are marginal when it comes to RAG. Choose an LLM quickly and focus on data and retrieval.

RAG failures primarily stem from insufficient attention to data access, quality, and retrieval processes. For instance, merely inputting large volumes of data into an LLM with an expansive context window is inadequate if the data is excessively noisy or irrelevant to the specific task. Poor outcomes can result from various factors: a lack of pertinent information in the source corpus, excessive noise, ineffective data processing, or the retrieval system’s inability to filter out irrelevant information. These issues lead to low-quality data being fed to the LLM for summarization, resulting in vague or junk responses.

It’s important to note that this isn’t a failure of the RAG concept itself. Rather, it’s a failure in constructing an appropriate “R” — the retrieval model.

Consider a system with embedded Tesla data spanning the company’s history. Without efficient chunking and retrieval mechanisms, a financial analyst inquiring about earnings or a risk analyst searching for lawsuit information would receive a response generated from an overwhelming mix of irrelevant data. This data might include unrelated CEO news and celebrity purchases. The system would produce vague, incomplete, or even hallucinated responses, forcing users to waste valuable time manually sorting through the results to find the information they actually need and then validating its accuracy.

RAG agent-based systems typically serve multiple workflows, and retrieval models and LLMs need to be tailored to their unique requirements. For instance, financial analysts need earnings-focused output, while risk analysts require information on lawsuits and regulatory actions. Each workflow demands fine-tuned output adhering to specific lexicons and formats. While some LLM fine-tuning is necessary, success here primarily depends on data quality and the effectiveness of the retrieval model to filter workflow-specific data points from the source data and feed it to the LLM.

Finally, a well-designed AI agents approach to the automation of complex knowledge workflows can help mitigate risks with RAG deployments by breaking down large use cases into discrete “jobs to be done,” making it easier to ensure relevance, context, and effective fine-tuning at each stage of the system.

Three keys to a successful RAG deployment

Teams that have successfully implemented enterprise RAG systems have adhered to the same core principles:

Identification of data-intensive, strategically impactful use cases. The most suitable candidates for automation using RAG agents are those involving highly repetitive workflows that require sifting through large volumes of unstructured data from various disparate sources and those that consume significant internal bandwidth. Such tasks are characterized by either being excessively time-consuming or requiring frequent execution, or both.

A focus on data quality and retrieval model effectiveness. Allocate approximately 90% of your effort to ensuring access to high-quality data, thoroughly cleaning and processing this data, rigorously assessing its relevance, prompt engineering, and fine-tuning the output to align precisely with specific workflows.

Measuring what matters. Establish concrete ROI measures that focus on the quality of output and time saved compared to the original task execution. No AI implementation can provide 100% automation. Instead, measure the efficiency gains by comparing the resources required for manual execution against the new process where RAG AI provides a first draft that then undergoes human review and finalization.

Chandini Jain is the founder and CEO of Auquan, an AI innovator transforming the world’s unstructured data into actionable intelligence for financial services customers. Prior to founding Auquan, Jain spent 10 years in global finance, working as a trader at Optiver and Deutsche Bank. She is a recognized expert and speaker in the field of using AI for investment and ESG risk management. Jain holds a master’s degree in mechanical engineering/computational science from the University of Illinois at Urbana-Champaign and a B.Tech from IIT Kanpur. For more information on Auquan, visit www.auquan.com, and follow the company @auquan_ and on LinkedIn.

—

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.