Rag Scenarios And Solutions

Context Relevance Decay

As retrieved context grows (more chunks), relevance per-chunk decreases, diluting the signal with noise and reducing answer quality.

TL;DR

As retrieved context grows (more chunks), relevance per-chunk decreases, diluting the signal with noise and reducing answer quality.

Key Takeaways

  • The Problem
  • Deep Technical Analysis
  • How to Solve
  • Agent Instructions: Querying This Documentation

The Problem

As retrieved context grows (more chunks), relevance per-chunk decreases, diluting the signal with noise and reducing answer quality.

Symptoms

  • ❌ More chunks = worse answers
  • ❌ Irrelevant context dilutes good context
  • ❌ LLM distracted by noise
  • ❌ Longer responses, less accurate
  • ❌ High K (top-20) worse than low K (top-5)

Real-World Example

Query: "API rate limit"

K=5 (top 5 chunks):
→ All highly relevant (score 0.80-0.85)
→ Answer: "Rate limit is 1000 req/hour" ✓

K=20 (top 20 chunks):
→ Top 5: Highly relevant (0.80-0.85)
→ Chunks 6-10: Somewhat relevant (0.70-0.75)
→ Chunks 11-20: Marginally relevant (0.60-0.70)

With K=20:
→ LLM sees rate limits + pricing + authentication + errors + ...
→ Context diluted
→ Answer: "Rate limit depends on plan tier and may vary..." (vague)

Deep Technical Analysis

Signal-to-Noise Ratio

Retrieval Score Distribution:

Top-K chunks by score:
→ #1: 0.85 (very relevant)
→ #2: 0.83
→ #3: 0.80
→ #5: 0.75
→ #10: 0.65 (borderline)
→ #20: 0.50 (weak)

As K increases:
→ Signal (high-relevance) diluted
→ Noise (low-relevance) added
→ LLM must filter, sometimes fails

Context Dilution:

LLM attention mechanism:
→ Spreads across all context
→ With 20 chunks, each gets less attention
→ Key info (chunk #2) may be overlooked
→ Distracted by chunk #18 (irrelevant)

Smaller, focused context better

Dynamic K Selection

Query Complexity Heuristic:

Simple factual: K=3-5
→ "What is X?"
→ Need precise answer

Comprehensive: K=10-15
→ "Explain how X works"
→ Need multiple perspectives

Very complex: K=15-20
→ "Compare X, Y, Z and recommend"
→ Need broad coverage

Adjust K based on query type

Score-Based Cutoff:

Instead of fixed K:
→ Retrieve until score < threshold

Example:
→ Retrieve while score > 0.70
→ If top-3 all > 0.70, use K=3
→ If top-10 > 0.70, use K=10

Adaptive to result quality

Two-Stage Retrieval

Broad Then Narrow:

Stage 1: Retrieve K=50 (broad)
Stage 2: Rerank to top-5 (narrow)

Reranking:
→ Use cross-encoder (more accurate)
→ Consider query-document interaction
→ Refine to most relevant

Final context: High-quality, compact

Context Compression

Extractive Summarization:

For lower-ranked chunks (6-20):
→ Extract only most relevant sentences
→ Discard filler

Example:
→ Chunk #15 (1000 tokens) → Extract 100 tokens
→ Preserves key info
→ Reduces dilution

How to Solve

Use dynamic K based on query complexity (3-5 for simple, 10-15 for complex) + implement score-based cutoff (retrieve while score > 0.70) + apply two-stage retrieval (broad search + rerank to top-5) + compress lower-ranked chunks (extract key sentences) + monitor relevance decay (test K=5 vs K=20 accuracy) + prefer precision over recall for most queries. See Context Optimization.


Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the ask query parameter:

GET /dev/rag-scenarios-and-solutions/accuracy/context-relevance-decay.md?ask=<question>

The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.

Related Pages

Last updated January 26, 2026