Rag Scenarios And Solutions
Poor Semantic Search Results
Queries return irrelevant documents, miss obviously related content, or surface documents that don't semantically match the user's intent.
TL;DR
Queries return irrelevant documents, miss obviously related content, or surface documents that don't semantically match the user's intent.
Key Takeaways
- The Problem
- Deep Technical Analysis
- How to Solve
- Agent Instructions: Querying This Documentation
The Problem
Queries return irrelevant documents, miss obviously related content, or surface documents that don't semantically match the user's intent.
Symptoms
- ❌ Search for "authentication" returns "authorization" docs only
- ❌ Query "how to debug errors" returns pricing pages
- ❌ Exact keyword match ranks lower than unrelated docs
- ❌ Synonyms not recognized ("car" doesn't match "automobile")
- ❌ User frustrated with search quality
Real-World Example
Query: "How do I reset my password?"
Top results returned:
1. "Account Security Best Practices" (score: 0.78)
2. "Password Requirements Policy" (score: 0.76)
3. "Two-Factor Authentication Setup" (score: 0.74)
Missing from results:
- "Password Reset Guide" (score: 0.68) ← Should be #1!
Problem: Semantic similarity favors "password" + "security"
over actual "password reset" procedure
Deep Technical Analysis
Embedding Space Limitations
Vector embeddings have inherent constraints:
Dimensionality and Information Loss:
Document: 2,000 tokens (rich information)
↓ Embedding model
Embedding: 1,536 dimensions (compressed)
Compression ratio: 2000:1536
→ Information loss inevitable
→ Nuances collapsed
→ Subtle differences erased
Two documents:
1. "How to reset password"
2. "How to change password"
May have nearly identical embeddings
→ Both about password modification
→ Semantic difference ("reset" vs "change") lost
→ Retrieved interchangeably
The Polysemy Problem:
Word: "bank"
Meanings:
1. Financial institution
2. River bank
3. Blood bank
4. Memory bank
Single embedding for "bank":
→ Averages all contexts from training data
→ No single meaning dominant
→ Query "bank account" may retrieve river banks
Contextual embeddings help but don't eliminate issue
Query-Document Mismatch
User queries differ from document language:
Vocabulary Gap:
User query: "My app crashed"
Document title: "Application Failure Troubleshooting"
Terms:
→ "app" vs "application"
→ "crashed" vs "failure"
Embeddings may not capture equivalence
→ Trained on formal text
→ User uses casual language
→ Semantic gap
Better if document also includes:
"If your app crashes or fails..."
→ Contains both formal and casual terms
Question-Answer Asymmetry:
User query (question form):
"How do I authenticate with OAuth?"
Document content (declarative):
"OAuth authentication requires three steps: ..."
Query embedding: encodes interrogative structure
Document embedding: encodes declarative structure
Similarity may be lower than expected
→ Different sentence structures
→ Despite identical topic
Training Data Bias
Embedding models reflect training corpus:
Domain Specificity:
General embedding model trained on:
→ Wikipedia, books, web pages
→ Broad general knowledge
→ Limited technical depth
Company's specialized domain:
→ Proprietary API terms
→ Internal acronyms (TPS, GTM, CRM)
→ Product-specific jargon
Model doesn't "understand" domain terms
→ Treats as arbitrary strings
→ Poor retrieval for specialized queries
Temporal Bias:
Model trained in 2021:
→ "GPT" associated with "generative pre-trained"
→ Weak association with "chatbot"
Query in 2024: "GPT chatbot features"
→ Model's understanding outdated
→ Doesn't reflect current usage
→ Retrieval suboptimal
Cosine Similarity Limitations
Similarity metric has blind spots:
Magnitude vs Direction:
Cosine similarity: cos(θ) = A·B / (||A|| ||B||)
→ Measures angle, not magnitude
→ Two vectors same direction: high similarity
→ Regardless of vector length
Document A: Short, focused (magnitude: 0.5)
Document B: Long, comprehensive (magnitude: 2.0)
If same direction: cosine similarity = 1.0
→ Treats equally
→ But document B has 4x more information
→ May be more valuable
Alternative: Euclidean distance
→ Considers magnitude
→ But less common in RAG systems
The Hubness Problem:
In high-dimensional spaces:
→ Some vectors become "hubs"
→ Close to many other vectors
→ Retrieved disproportionately often
Example:
Document about "getting started":
→ Generic, broad topic
→ Embedding is central in vector space
→ Matches many queries
→ Always in top-10 results
→ Crowds out more specific, relevant docs
Mitigation: Hubness reduction algorithms
→ Rare in production RAG systems
Negative Retrieval and Exclusions
Semantic search struggles with negation:
The NOT Problem:
Query: "authentication WITHOUT OAuth"
User wants: API key, JWT, Basic auth
Excludes: OAuth methods
Embedding of "authentication WITHOUT OAuth":
→ Still contains "OAuth" token
→ High similarity to OAuth docs
→ Returns exactly what user wanted to avoid
Semantic search is inclusive, not exclusive
→ Cannot filter out concepts
→ Treats "WITHOUT" as just another word
Contrastive Queries:
Query: "Differences between REST and GraphQL"
Ideal results:
→ Comparison documents
→ "REST vs GraphQL" articles
Actual results:
→ Mix of REST docs and GraphQL docs
→ High similarity to both concepts
→ But no direct comparison
→ User must synthesize themselves
Retrieval K Parameter Tuning
Top-K selection affects quality:
Too Small K:
K=3 (retrieve top 3 documents)
Scenario:
→ Top 3 all about API authentication
→ Query also needs rate limiting info
→ Rate limit doc ranked #4
→ Excluded from context
→ LLM can't answer rate limit questions
Narrow context, incomplete coverage
Too Large K:
K=20 (retrieve top 20 documents)
Scenario:
→ Top 3 highly relevant
→ Ranks 4-20 marginally relevant
→ Fill context window
→ Dilute signal with noise
→ LLM distracted by irrelevant info
Broad context, reduced precision
Dynamic K:
Adaptive approach:
1. Retrieve top K=50 candidates
2. Apply similarity threshold (e.g., >0.75)
3. Return only above threshold
4. If <3 results: Lower threshold
5. If >15 results: Raise threshold
Adjusts K based on query specificity
→ Broad query: More results
→ Specific query: Fewer, higher-quality results
Reranking and Two-Stage Retrieval
Initial retrieval may need refinement:
The Speed-Accuracy Trade-off:
Stage 1: Fast embedding retrieval
→ Retrieve top 50 candidates
→ ~100ms latency
→ Decent recall, imperfect precision
Stage 2: Slow reranking
→ Cross-encoder model
→ Score each of 50 candidates
→ +500ms latency
→ Excellent precision
Total: 600ms
→ Better quality
→ Higher cost
→ User-noticeable delay
Skip reranking for speed?
→ 100ms latency
→ Lower quality
→ Users frustrated with bad results
Trade-off: Speed vs quality
Reranker Model Selection:
Options:
1. Same embedding model (redundant)
2. Different embedding model (limited gain)
3. Cross-encoder (best, but slowest)
4. LLM-based scoring (expensive)
Cross-encoder:
→ Processes query + document together
→ Outputs relevance score
→ More accurate than separate embeddings
→ But: O(n) for n documents
→ Embedding: O(1) lookup after indexing
Reranking all 50: 50 forward passes
→ GPU/CPU intensive
Metadata Filtering vs Semantic Search
Combining structured and unstructured queries:
Hybrid Queries:
User intent: "Recent API docs"
→ Semantic: "API documentation"
→ Filter: date > 2024-01-01
Two-stage approach:
1. Filter by metadata (date)
2. Semantic search within filtered set
Or:
1. Semantic search (get top 100)
2. Filter results by metadata
3. Return top-K after filtering
Different orderings yield different results
The Filter Cardinality Problem:
Scenario:
→ 100,000 total documents
→ User queries: "API docs in Python"
→ Metadata filter: language="Python"
→ Only 500 Python docs exist
Option A: Filter first
→ Search within 500 docs
→ Fast, but limited pool
→ May miss relevant non-Python docs that are instructive
Option B: Search first
→ Get top 100 from all 100K
→ Only 5 are Python docs
→ User wanted more Python examples
Optimal: Depends on user intent
Cold Start and New Documents
Recently added content underperforms:
The Freshness Problem:
New document added today:
→ Perfect answer to user query
→ But: Ranks #47 in results
Why?
→ No user interaction history
→ No click-through data
→ No implicit feedback
→ Pure semantic similarity
Older documents:
→ Have been refined based on user feedback
→ Keywords optimized
→ Slight advantage in ranking
New doc needs time to "prove" itself
Implicit Boosting:
Document clicked frequently:
→ Indicates user satisfaction
→ Should rank higher for similar queries
But pure semantic search:
→ No feedback loop
→ Each query independent
→ Ignores user behavior
Learning-to-rank systems:
→ Incorporate click-through rate
→ Adjust scores based on engagement
→ But: Complex infrastructure
→ Rare in simple RAG systems
How to Solve
Fine-tune embeddings on domain-specific data + implement two-stage retrieval with reranking + use hybrid search (semantic + keyword) + adjust K dynamically based on score distribution + add metadata filtering + boost recently updated documents. See Search Quality Optimization.
Agent Instructions: Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.
Perform an HTTP GET request on the current page URL with the ask query parameter:
GET /dev/rag-scenarios-and-solutions/vectors/poor-search-results.md?ask=<question>
The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.
Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Related Pages
Last updated January 26, 2026


