What is Retrieval-Augmented Generation (RAG)?
Retrieval-Augmented Generation (RAG) is an architecture that enhances LLM responses by retrieving relevant documents from an external knowledge base and including them in the model's context before generation.
WHY IT MATTERS
RAG solves one of the fundamental limitations of LLMs: their knowledge is frozen at training time. By retrieving relevant documents at inference time and injecting them into the prompt, RAG gives models access to current, domain-specific, and proprietary information.
The architecture is straightforward: a query is embedded, similar documents are retrieved from a vector store, and the retrieved text is added to the LLM's context. The model then generates a response grounded in the retrieved information.
For financial agents, RAG is crucial. An agent managing a portfolio needs current price data, recent news, and up-to-date protocol documentation — none of which exist in the model's training data.