Agentic RAG is getting a lot of attention these days as a practical way to reduce — or, depending on the audacity of the vendor, eliminate — hallucinations from generative AI (genAI) tools. Sadly, it might not decrease hallucinations — but it could open the door to other problems.
To be clear, there’s nothing bad about Agentic RAG (which stands for retrieval augmented generation); it works fine for some users, but for others, it’s underwhelming, expensive, labor-intensive and doesn’t always deliver on its key promises.
Agentic RAG is designed to allow the integration of additional databases and data sources so a genAI algorithm has a broader range of information for its initial findings. But using AI to manage AI — in short, adding even more AI into the equation — doesn’t always produce better results.
I spoke to two genAI experts who should know: Alan Nichol, CTO at Rasa, and agentic specialist Sandi Besen.
“Agentic RAG is an unnecessary buzzword,” said Nichol. “It simply means adding a loop around your [large language models] and retrieval calls. The market is in a strange place where adding an additional ‘while’ loop or ‘if’ statement to code is touted as a new, game-changing method. State-of-the-art web agents only achieve a 25% success rate, a figure unacceptable in any software context.
“Companies and developers would be better off explicitly building some business logic in regular code,” he said. “They can use LLMs to convert user input into structured formats and paraphrase search results’ output, making it sound more natural.”
Nichol argued that Agentic RAG is often the wrong approach for enterprise data analytics needs. “Agentic RAG is the wrong way to think of the problem,” he said. “Any good performing RAG is just a simple search engine on top of which you sprinkle some LLM magic.”
While that tactic can work, IT should stop thinking that “the way to solve this (hallucination) issue is to slap on one more LLM call,” Nichol said. “People are hoping that this kind of approach is going to magically solve the root problem.”
And just what is the root problem? Data quality.
Nichol said he often sees enterprises that have “built a bad retrieval system, because they haven’t cleaned up their data. It is boring and unsexy to clean up out-of-date information, such as versioning and dealing with data conflicts. Instead, they add seven more LLM calls to paper over all of the data issues they have. It’s just going to put a lot of work on the LLM and it is not going to do very well.
“It’s not going to solve your problem, but it is going to feel like it is.”
Besen, an applied AI researcher at IBM, argues that agentic can indeed reduce hallucinations, but agrees with Nichol that it might not always be the best enterprise approach.
Besen cautions that adding complexity to a genAI package — something that’s already complex — can deliver unexpected issues.
“When you increase the (number) of agents, you inherently increase the variability of a solution,” she said. “However, with the proper architecture in place — meaning that the team of agents [is] constructed in an effective way — and adequate prompting, there should be a decreased chance of hallucinations because you can build in evaluation and reasoning. For instance, you can have one agent that retrieves the content and another that evaluates if the information retrieved is relevant to answer the original question. With traditional RAG, there was no natural language reasoning check on whether the information retrieved was relevant.”
Like anything else in programming, this might or might not deliver the desired results. “There is a way to make it very successful and a way to mess it up. The trick is to scope our expectations to the abilities of the technology,” Besen said. “An agent’s ability is only as good as the language model behind it. The reasoning ability is dependent on the language model.”
That said, Besen stressed that — despite what some AI vendors are claim — even the best deployment of agentic RAG will never make hallucinations disappear. “It is impossible to completely eliminate hallucinations at this time. But there could be a reduction in hallucinations.”
It’s up to IT executives to decide whether that uncertainty, and the risk of wrong answers from time to time, is something they can live with. “If you want the same outcome every time, don’t use genAI,” Besen said. As for accepting occasional hallucinations, Besen suggested IT consider how they would react to an employee or contractor doing it.
“Are you OK with having an employee who is not right 10% of the time?”