The Essential RAG Book
Foundations of RAG Systems
Retrieval-Augmented Generation (RAG) systems couple information retrieval with generative language models. This chapter formalizes the probabilistic foundations and illustrates the interaction between retriever and generator components. Formally, a RAG system is expressed as: ...
TL;DR
Retrieval-Augmented Generation (RAG) systems couple information retrieval with generative language models. This chapter formalizes the probabilistic foundations and illustrates the interaction between retriever and generator components. Formally, a RAG system is expressed as: P(y | x) = Σ(d) P(y | x, d) · P(d | x) w...
Key Takeaways
- Retrieval-Augmented Generation (RAG) systems couple information retrieval with generative language models.
- The retriever encodes both queries and documents into a shared vector space, selecting top-k contexts with maximum cosine similarity.
Retrieval-Augmented Generation (RAG) systems couple information retrieval with generative language models. This chapter formalizes the probabilistic foundations and illustrates the interaction between retriever and generator components. Formally, a RAG system is expressed as: P(y | x) = Σ(d) P(y | x, d) · P(d | x) where x is the query, d represents retrieved documents, and y is the generated answer. [User Query] → [Retriever] → [Generator] ↓ Knowledge Source
Figure 2 - Standard RAG Pipeline.
The retriever encodes both queries and documents into a shared vector space, selecting top-k contexts with maximum cosine similarity. The generator conditions its language-model decoding on these contexts. Fusion techniques such as late fusion and token fusion balance context and prior knowledge. Training typically minimizes the negative log-likelihood of generated tokens while retrieval is optimized through contrastive learning. RAG therefore unifies retrieval and generation under a probabilistic framework, allowing models to adapt to new information without full re-training.


