Rag Scenarios And Solutions

Query-Document Mismatch

User queries phrased differently from document language, causing embedding mismatch and retrieval failure despite semantic similarity.

TL;DR

User queries phrased differently from document language, causing embedding mismatch and retrieval failure despite semantic similarity.

Key Takeaways

  • The Problem
  • Deep Technical Analysis
  • How to Solve
  • Agent Instructions: Querying This Documentation

The Problem

User queries phrased differently from document language, causing embedding mismatch and retrieval failure despite semantic similarity.

Symptoms

  • ❌ User asks in casual language, docs formal
  • ❌ Synonym mismatch ("delete" vs "remove")
  • ❌ Question format vs statement format
  • ❌ Different terminology conventions
  • ❌ Domain jargon vs layperson terms

Real-World Example

Document: "To terminate your subscription, navigate to Account Settings
and select the 'Cancel Subscription' option."

User query: "How do I stop paying for this?"

Embedding mismatch:
→ Query: "stop paying"
→ Doc: "terminate subscription", "cancel"
→ Semantic gap
→ Low similarity score
→ Not retrieved

Document has the answer but isn't found

Deep Technical Analysis

Vocabulary Gap

Formal vs Casual:

Docs: "Authenticate using OAuth 2.0 authorization code flow"
Query: "How do I log in?"

Embedding distance:
→ "authenticate", "OAuth", "authorization" (technical)
→ "log in" (casual)
→ Semantic link weak in embedding space

Acronyms vs Full Terms:

Docs: "RBAC policy configuration"
Query: "How to set up role-based access control?"

Problem:
→ "RBAC" embedded differently from "role-based access control"
→ Should be same concept
→ But: Model may not know they're equivalent

Question vs Statement

Format Mismatch:

Doc: "The API rate limit is 1000 requests per hour"
(Statement format)

Query: "What is the API rate limit?"
(Question format)

Embeddings differ:
→ Question words ("what", "how", "when")
→ Not present in statement
→ Reduces similarity

Query Reformulation:

Solution: Rewrite query to statement
→ "What is X?" → "X is"
→ "How to do Y?" → "To do Y"

More likely to match document phrasing

Query Expansion

Synonym Expansion:

Original query: "delete account"

Expand to:
→ "delete account"
→ "remove account"
→ "close account"
→ "cancel account"
→ "terminate account"

Embed all variations:
→ More likely to match doc phrasing
→ Retrieve if doc uses any synonym

Pros/Cons:

Pros:
+ Better recall
+ Matches more docs

Cons:
- More embeddings = slower
- May retrieve less relevant (noise)
- Balance precision vs recall

Dense vs Sparse Representations

Hybrid Search:

Semantic (dense):
→ Captures meaning
→ Works for paraphrases

Keyword (sparse):
→ Exact term matches
→ Good for technical terms, IDs

Combine:
→ Semantic finds "log in" → "authenticate"
→ Keyword ensures "OAuth" → "OAuth"
→ Best of both

How to Solve

Implement query expansion with synonyms + use hybrid search (semantic + keyword) + apply query rewriting (question → statement format) + fine-tune embeddings on query-document pairs from your domain + add query understanding layer (detect intent, reformulate) + use cross-encoder reranking to bridge vocabulary gaps. See Query Understanding.


Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the ask query parameter:

GET /dev/rag-scenarios-and-solutions/accuracy/query-interpretation.md?ask=<question>

The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.

Related Pages

Last updated January 26, 2026