Rag Scenarios And Solutions

Document Version Conflicts

Multiple versions of the same document coexist in the knowledge base, causing AI to cite outdated or conflicting information.

TL;DR

Multiple versions of the same document coexist in the knowledge base, causing AI to cite outdated or conflicting information.

Key Takeaways

  • The Problem
  • Deep Technical Analysis
  • How to Solve
  • Agent Instructions: Querying This Documentation

The Problem

Multiple versions of the same document coexist in the knowledge base, causing AI to cite outdated or conflicting information.

Symptoms

  • ❌ Old and new versions both retrieved
  • ❌ Conflicting information in responses
  • ❌ "v1" and "v2" docs both present
  • ❌ Cannot determine which is current
  • ❌ Stale info mixed with current

Real-World Example

Knowledge base contains:
→ "API_Guide_v1.0.pdf" (2022): Rate limit 100/hour
→ "API_Guide_v2.0.pdf" (2024): Rate limit 1000/hour
→ "API_Guide_v2.1.pdf" (2024): Rate limit 1500/hour

Query: "What's the API rate limit?"

Retrieved chunks from all three versions:
→ AI response: "Rate limit ranges from 100 to 1500 per hour
depending on version."

Confusing - which is current?
User wants latest only

Deep Technical Analysis

Version Tracking Challenges

No Version Metadata:

Common problem:
→ Documents ingested without version tracking
→ Metadata: {document_id: "api_guide"}
→ No version field

New version ingested:
→ Same document_id
→ Both coexist
→ Cannot distinguish

Version Detection:

Filename-based:
→ "guide_v1.pdf", "guide_v2.pdf"
→ Parse version from filename

Metadata-based:
→ Document properties: Version 2.1
→ Last modified: 2024-03-15

Content-based:
→ "Version 2.0" in document text
→ Less reliable

Versioning Strategies

Explicit Version Metadata:

Store with each chunk:
{
  document_id: "api_guide",
  version: "2.1",
  published_date: "2024-03-15",
  is_latest: true
}

Retrieval filter:
WHERE document_id = "api_guide" AND is_latest = true
→ Only get current version

Version Lifecycle:

New version ingested:
1. Set all existing chunks: is_latest = false
2. Add new chunks: is_latest = true
3. Optionally: Delete old versions (if no archival need)

Automatic currency

Archival vs Deletion

Keep Old Versions:

Reasons to archive:
→ Compliance (retain historical docs)
→ Support legacy product versions
→ Audit trail

Strategy:
→ Keep but mark as archived
→ Filter out by default
→ Available on explicit request

Example filter:
WHERE is_latest = true OR (version = "1.0" AND user_needs_legacy)

Delete Old Versions:

Simpler approach:
→ New version → delete old chunks entirely
→ Only current version exists

Pros:
+ No confusion
+ Cleaner
+ Lower storage

Cons:
- No historical reference
- Cannot support legacy

Conflict Resolution

LLM Arbitration:

If multiple versions retrieved:
→ Prompt: "Prefer latest version"
→ AI should cite v2.1 over v1.0

But: Requires LLM to detect versions
→ Not 100% reliable

Better: Filter at retrieval

Recency Boosting:

Boost recent documents:
→ score = similarity_score * recency_boost
→ recency_boost = 1.0 + (days_since_publish / 365)

Recent docs rank higher:
→ v2.1 (2024) beats v1.0 (2022) in ranking

How to Solve

Track version explicitly in metadata (version number + is_latest flag) + implement version lifecycle (mark old as non-latest on new upload) + filter retrieval to is_latest=true by default + optionally delete old versions if no archival need + parse version from filename or document properties + boost recent versions in ranking + test that old versions don't appear in responses. See Version Control.


Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the ask query parameter:

GET /dev/rag-scenarios-and-solutions/data-quality/version-conflicts.md?ask=<question>

The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.

Related Pages

Last updated January 26, 2026