Rag Scenarios And Solutions
Context Relevance Decay
As retrieved context grows (more chunks), relevance per-chunk decreases, diluting the signal with noise and reducing answer quality.
TL;DR
As retrieved context grows (more chunks), relevance per-chunk decreases, diluting the signal with noise and reducing answer quality.
Key Takeaways
- The Problem
- Deep Technical Analysis
- How to Solve
- Agent Instructions: Querying This Documentation
The Problem
As retrieved context grows (more chunks), relevance per-chunk decreases, diluting the signal with noise and reducing answer quality.
Symptoms
- ❌ More chunks = worse answers
- ❌ Irrelevant context dilutes good context
- ❌ LLM distracted by noise
- ❌ Longer responses, less accurate
- ❌ High K (top-20) worse than low K (top-5)
Real-World Example
Query: "API rate limit"
K=5 (top 5 chunks):
→ All highly relevant (score 0.80-0.85)
→ Answer: "Rate limit is 1000 req/hour" ✓
K=20 (top 20 chunks):
→ Top 5: Highly relevant (0.80-0.85)
→ Chunks 6-10: Somewhat relevant (0.70-0.75)
→ Chunks 11-20: Marginally relevant (0.60-0.70)
With K=20:
→ LLM sees rate limits + pricing + authentication + errors + ...
→ Context diluted
→ Answer: "Rate limit depends on plan tier and may vary..." (vague)
Deep Technical Analysis
Signal-to-Noise Ratio
Retrieval Score Distribution:
Top-K chunks by score:
→ #1: 0.85 (very relevant)
→ #2: 0.83
→ #3: 0.80
→ #5: 0.75
→ #10: 0.65 (borderline)
→ #20: 0.50 (weak)
As K increases:
→ Signal (high-relevance) diluted
→ Noise (low-relevance) added
→ LLM must filter, sometimes fails
Context Dilution:
LLM attention mechanism:
→ Spreads across all context
→ With 20 chunks, each gets less attention
→ Key info (chunk #2) may be overlooked
→ Distracted by chunk #18 (irrelevant)
Smaller, focused context better
Dynamic K Selection
Query Complexity Heuristic:
Simple factual: K=3-5
→ "What is X?"
→ Need precise answer
Comprehensive: K=10-15
→ "Explain how X works"
→ Need multiple perspectives
Very complex: K=15-20
→ "Compare X, Y, Z and recommend"
→ Need broad coverage
Adjust K based on query type
Score-Based Cutoff:
Instead of fixed K:
→ Retrieve until score < threshold
Example:
→ Retrieve while score > 0.70
→ If top-3 all > 0.70, use K=3
→ If top-10 > 0.70, use K=10
Adaptive to result quality
Two-Stage Retrieval
Broad Then Narrow:
Stage 1: Retrieve K=50 (broad)
Stage 2: Rerank to top-5 (narrow)
Reranking:
→ Use cross-encoder (more accurate)
→ Consider query-document interaction
→ Refine to most relevant
Final context: High-quality, compact
Context Compression
Extractive Summarization:
For lower-ranked chunks (6-20):
→ Extract only most relevant sentences
→ Discard filler
Example:
→ Chunk #15 (1000 tokens) → Extract 100 tokens
→ Preserves key info
→ Reduces dilution
How to Solve
Use dynamic K based on query complexity (3-5 for simple, 10-15 for complex) + implement score-based cutoff (retrieve while score > 0.70) + apply two-stage retrieval (broad search + rerank to top-5) + compress lower-ranked chunks (extract key sentences) + monitor relevance decay (test K=5 vs K=20 accuracy) + prefer precision over recall for most queries. See Context Optimization.
Agent Instructions: Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.
Perform an HTTP GET request on the current page URL with the ask query parameter:
GET /dev/rag-scenarios-and-solutions/accuracy/context-relevance-decay.md?ask=<question>
The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.
Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Related Pages
Last updated January 26, 2026


