Rag Scenarios And Solutions
Document Version Conflicts
Multiple versions of the same document coexist in the knowledge base, causing AI to cite outdated or conflicting information.
TL;DR
Multiple versions of the same document coexist in the knowledge base, causing AI to cite outdated or conflicting information.
Key Takeaways
- The Problem
- Deep Technical Analysis
- How to Solve
- Agent Instructions: Querying This Documentation
The Problem
Multiple versions of the same document coexist in the knowledge base, causing AI to cite outdated or conflicting information.
Symptoms
- ❌ Old and new versions both retrieved
- ❌ Conflicting information in responses
- ❌ "v1" and "v2" docs both present
- ❌ Cannot determine which is current
- ❌ Stale info mixed with current
Real-World Example
Knowledge base contains:
→ "API_Guide_v1.0.pdf" (2022): Rate limit 100/hour
→ "API_Guide_v2.0.pdf" (2024): Rate limit 1000/hour
→ "API_Guide_v2.1.pdf" (2024): Rate limit 1500/hour
Query: "What's the API rate limit?"
Retrieved chunks from all three versions:
→ AI response: "Rate limit ranges from 100 to 1500 per hour
depending on version."
Confusing - which is current?
User wants latest only
Deep Technical Analysis
Version Tracking Challenges
No Version Metadata:
Common problem:
→ Documents ingested without version tracking
→ Metadata: {document_id: "api_guide"}
→ No version field
New version ingested:
→ Same document_id
→ Both coexist
→ Cannot distinguish
Version Detection:
Filename-based:
→ "guide_v1.pdf", "guide_v2.pdf"
→ Parse version from filename
Metadata-based:
→ Document properties: Version 2.1
→ Last modified: 2024-03-15
Content-based:
→ "Version 2.0" in document text
→ Less reliable
Versioning Strategies
Explicit Version Metadata:
Store with each chunk:
{
document_id: "api_guide",
version: "2.1",
published_date: "2024-03-15",
is_latest: true
}
Retrieval filter:
WHERE document_id = "api_guide" AND is_latest = true
→ Only get current version
Version Lifecycle:
New version ingested:
1. Set all existing chunks: is_latest = false
2. Add new chunks: is_latest = true
3. Optionally: Delete old versions (if no archival need)
Automatic currency
Archival vs Deletion
Keep Old Versions:
Reasons to archive:
→ Compliance (retain historical docs)
→ Support legacy product versions
→ Audit trail
Strategy:
→ Keep but mark as archived
→ Filter out by default
→ Available on explicit request
Example filter:
WHERE is_latest = true OR (version = "1.0" AND user_needs_legacy)
Delete Old Versions:
Simpler approach:
→ New version → delete old chunks entirely
→ Only current version exists
Pros:
+ No confusion
+ Cleaner
+ Lower storage
Cons:
- No historical reference
- Cannot support legacy
Conflict Resolution
LLM Arbitration:
If multiple versions retrieved:
→ Prompt: "Prefer latest version"
→ AI should cite v2.1 over v1.0
But: Requires LLM to detect versions
→ Not 100% reliable
Better: Filter at retrieval
Recency Boosting:
Boost recent documents:
→ score = similarity_score * recency_boost
→ recency_boost = 1.0 + (days_since_publish / 365)
Recent docs rank higher:
→ v2.1 (2024) beats v1.0 (2022) in ranking
How to Solve
Track version explicitly in metadata (version number + is_latest flag) + implement version lifecycle (mark old as non-latest on new upload) + filter retrieval to is_latest=true by default + optionally delete old versions if no archival need + parse version from filename or document properties + boost recent versions in ranking + test that old versions don't appear in responses. See Version Control.
Agent Instructions: Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.
Perform an HTTP GET request on the current page URL with the ask query parameter:
GET /dev/rag-scenarios-and-solutions/data-quality/version-conflicts.md?ask=<question>
The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.
Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Related Pages
Last updated January 26, 2026


