RAG Retrieval & Accuracy

Q: Step 1: Is the information in the knowledge base?

* Search manually for the answer * Check if source document was ingested * Verify document processed and chunked correctly

Q: Step 2: Are relevant chunks being retrieved?

* Inspect top 10-20 candidate chunks * Check similarity scores * Look at chunk content vs query

Q: Step 3: Is the query being understood correctly?

* Review query embedding * Test query variations * Check query enhancement/rewriting

Q: Step 4: Is reranking working?

* Compare pre- and post-reranking results * Check reranker scores * Test different reranking strategies

Q: Step 5: Is context assembled well?

* Review final context sent to LLM * Check ordering and completeness * Verify token limits not exceeded

Overview

Retrieval accuracy is where RAG systems succeed or fail. You can have perfect data integration, optimal chunking, and high-quality embeddings, but if the wrong documents are retrieved—or the right documents aren't found—your agents will provide poor answers. This section focuses on ensuring your RAG pipeline retrieves the most relevant, accurate, and complete information for every query.

Why Retrieval Accuracy Matters

Accurate retrieval ensures:

Correct answers - LLM receives the right context to answer questions
Complete information - All relevant facts are provided, not just fragments
Trustworthy responses - Answers grounded in real, cited sources
User confidence - Consistent, reliable performance builds trust

Poor retrieval leads to:

Wrong answers - Irrelevant context leads to incorrect responses
"I don't know" responses - Relevant information exists but isn't found
Hallucinations - LLM fills knowledge gaps with fabricated information
Inconsistent quality - Some queries work perfectly, others fail completely
User frustration - Unreliable agent → low adoption → wasted investment

Common Retrieval Challenges

Retrieval Failures

No relevant chunks retrieved - Known information not found
Wrong documents ranked highly - Irrelevant content returned first
Incomplete context assembly - Partial information missing key details
Multi-hop reasoning failure - Cannot connect related pieces of information

Source Quality

Outdated knowledge base - Stale information leads to wrong answers
Conflicting sources - Contradictory information in retrieved context
Source ranking issues - Lower-quality sources prioritized over authoritative ones
Knowledge base drift - Performance degrades as content changes

Query Understanding

Query-document mismatch - Query phrasing doesn't match how docs are written
Ambiguous query expansion - Query rewriting makes things worse
Context relevance decay - Initially good results become less relevant

Solutions in This Section

Browse these guides to improve retrieval accuracy:

The Retrieval Pipeline

Understanding the stages helps diagnose where failures occur:

User Query
    ↓
Query Enhancement (optional)
    ↓
Embedding / Vector Search
    ↓
Candidate Retrieval (top 20-100)
    ↓
Filtering (permissions, freshness)
    ↓
Reranking (top 5-10)
    ↓
Context Assembly
    ↓
LLM Generation

Common failure points:

Query Enhancement: Misinterprets intent, adds noise
Vector Search: Embedding doesn't capture query meaning
Candidate Retrieval: Relevant docs not in top candidates
Filtering: Over-aggressive filtering removes good results
Reranking: Wrong docs prioritized
Context Assembly: Important information left out or poorly ordered

Retrieval Strategies

Different approaches for different scenarios:

1. Dense Retrieval (Vector Search)

How it works: Embed query and documents, find nearest neighbors

Strengths:

Semantic understanding
Handles synonyms and paraphrasing
Language-agnostic

Weaknesses:

Poor with rare terms or exact matches
Opaque (hard to debug)
Sensitive to embedding quality

Best for: Natural language queries, conceptual questions

2. Sparse Retrieval (Keyword Search)

How it works: TF-IDF, BM25, or keyword matching

Strengths:

Excellent for exact term matches
Fast and explainable
Works well with technical terms

Weaknesses:

No semantic understanding
Misses synonyms and variations
Language-specific

Best for: Technical queries, product names, error codes

3. Hybrid Retrieval

How it works: Combine dense + sparse, merge results

Strengths:

Best of both worlds
More robust across query types
Handles both semantic and exact matching

Weaknesses:

More complex to implement and tune
Need to balance weighting

Best for: Production RAG systems handling diverse queries

4. Multi-Stage Retrieval

How it works:

Broad retrieval (vector or keyword)
Precise reranking (cross-encoder)
Optional LLM-based final selection

Strengths:

High recall + high precision
Best accuracy for critical applications

Weaknesses:

Higher latency
Increased cost

Best for: High-stakes queries where accuracy is critical

Best Practices

Query Enhancement

Query expansion - Add synonyms, related terms (but test carefully)
Query rewriting - Rephrase for better matching ("How do I..." → "Steps to...")
Spell correction - Fix typos before search
Intent detection - Route different query types differently

Retrieval Configuration

Tune candidate count - Retrieve enough (20-100) to ensure relevant docs are included
Set appropriate thresholds - Don't retrieve below a minimum similarity score
Use metadata filters - Narrow by date, source, topic when applicable
Implement fallbacks - If vector search fails, try keyword search

Reranking

Use cross-encoder models - More accurate than embedding similarity alone
Consider LLM reranking - Ask LLM to rank relevance (expensive but effective)
Diversity in results - Don't return 10 chunks from the same document
Recency boost - Favor newer content when appropriate

Context Assembly

Order matters - Put most relevant context first and last (primacy/recency)
Deduplicate - Remove redundant chunks
Include metadata - Source, date, confidence helps LLM assess reliability
Stay under token limit - Truncate intelligently if needed

Continuous Improvement

Log all queries and retrievals - Build datasets for analysis
Track retrieval metrics - Precision, recall, MRR, NDCG
Collect user feedback - Thumbs up/down on answers
A/B test changes - Compare retrieval strategies empirically
Monitor edge cases - Focus on query types with high failure rates

Retrieval Metrics

Measure these to track accuracy:

Retrieval Quality

Precision@k - Of top k results, how many are relevant?
Recall@k - Of all relevant docs, what % are in top k?
MRR (Mean Reciprocal Rank) - Average of 1/rank of first relevant doc
NDCG (Normalized Discounted Cumulative Gain) - Quality of ranking

Answer Quality

Groundedness - % of response claims supported by retrieved context
Completeness - Does response fully answer the question?
Citation accuracy - Are citations correct and verifiable?
User satisfaction - Thumbs up/down, ratings

System Health

Zero-result queries - % of queries with no retrieval
Low-confidence retrievals - % below similarity threshold
Retrieval latency - P50, P95, P99 times
Cost per query - Embedding, vector search, reranking costs

Advanced Techniques

Query Decomposition

Break complex questions into sub-queries:

User: "How did revenue change from Q1 to Q2, and what caused it?"

Decomposed:

"What was Q1 revenue?"
"What was Q2 revenue?"
"What factors affected Q2 revenue?"

Retrieve separately, synthesize answer from combined results.

Hypothetical Document Embeddings (HyDE)

Instead of embedding the question:

Use LLM to generate hypothetical answer
Embed the hypothetical answer
Search for documents similar to that answer

Works well when questions don't resemble documents.

Multi-Vector Retrieval

Store multiple embeddings per document:

Summary embedding (for high-level matches)
Detailed embedding (for specific facts)
Question embeddings (for FAQ matching)

Retrieve using appropriate embedding type per query.

Contextual Retrieval

Prepend each chunk with document context before embedding:

Original chunk: "Revenue increased 15%"

With context: "[Q2 2024 Financial Report] Revenue increased 15%"

This preserves context and improves retrieval accuracy.

Iterative Retrieval

Retrieve initial context
Generate preliminary answer
Identify gaps or follow-up questions
Retrieve additional context
Generate final answer

Useful for complex, multi-part questions.

Quick Diagnostics

Signs your retrieval needs improvement:

✗ Agents say "I don't know" when answer is in knowledge base
✗ Retrieved chunks don't seem relevant to the question
✗ Correct answer exists but not in top results
✗ Responses are vague or incomplete
✗ Similar queries get wildly different results
✗ Agents hallucinate despite having relevant data
✗ High similarity scores but poor answer quality

Signs your retrieval is working well:

✓ Retrieved chunks are clearly relevant to query
✓ Correct information consistently appears in top results
✓ Complete answers without hallucination
✓ Appropriate "I don't know" when info truly doesn't exist
✓ Good performance across diverse query types
✓ Citations are accurate and helpful
✓ Users report high satisfaction

Debugging Poor Retrieval

When retrieval fails, investigate systematically:

Step 1: Is the information in the knowledge base?

Search manually for the answer
Check if source document was ingested
Verify document processed and chunked correctly

Step 2: Are relevant chunks being retrieved?

Inspect top 10-20 candidate chunks
Check similarity scores
Look at chunk content vs query

Step 3: Is the query being understood correctly?

Review query embedding
Test query variations
Check query enhancement/rewriting

Step 4: Is reranking working?

Compare pre- and post-reranking results
Check reranker scores
Test different reranking strategies

Step 5: Is context assembled well?

Review final context sent to LLM
Check ordering and completeness
Verify token limits not exceeded

Bottom line: Retrieval accuracy is the linchpin of RAG. If your retrieval is poor, no amount of prompt engineering or LLM sophistication will save you. Invest time in getting this right—it's the highest-leverage improvement you can make.

Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the ask query parameter:

GET /dev/rag-scenarios-and-solutions/accuracy.md?ask=<question>

The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.

Key Takeaways

Overview

Why Retrieval Accuracy Matters

Common Retrieval Challenges

Retrieval Failures

Source Quality

Query Understanding

Solutions in This Section

The Retrieval Pipeline

Retrieval Strategies

1. Dense Retrieval (Vector Search)

2. Sparse Retrieval (Keyword Search)

3. Hybrid Retrieval

4. Multi-Stage Retrieval

Best Practices

Query Enhancement

Retrieval Configuration

Reranking

Context Assembly

Continuous Improvement

Retrieval Metrics

Retrieval Quality

Answer Quality

System Health

Advanced Techniques

Query Decomposition

Hypothetical Document Embeddings (HyDE)

Multi-Vector Retrieval

Contextual Retrieval

Iterative Retrieval

Quick Diagnostics

Debugging Poor Retrieval

Step 1: Is the information in the knowledge base?

Step 2: Are relevant chunks being retrieved?

Step 3: Is the query being understood correctly?

Step 4: Is reranking working?

Step 5: Is context assembled well?

Agent Instructions: Querying This Documentation

People also ask

Related Pages

Integrations

Industries

Comparisons

Compliance

Investors

Industry