Rag Scenarios And Solutions
Chunks Too Small
Your AI agent gives incomplete or fragmented answers because document chunks are too small and lack sufficient context.
TL;DR
Your AI agent gives incomplete or fragmented answers because document chunks are too small and lack sufficient context.
Key Takeaways
- The Problem
- Deep Technical Analysis
- Prerequisites
- Setup
- Usage
- How to Solve
The Problem
Your AI agent gives incomplete or fragmented answers because document chunks are too small and lack sufficient context.
Symptoms
- ❌ AI says "I don't have enough information" when answer exists
- ❌ Answers are partial or cut off mid-sentence
- ❌ References span multiple chunks but AI only cites one
- ❌ Code examples split across chunks, missing parts
- ❌ Tables broken, showing only headers without data
Real-World Example
Your documentation has a comprehensive setup guide,
but when asked "How do I set up the database?",
AI only mentions Step 1 and 2 of 5 steps.
Chunk size: 200 tokens
Setup guide: 800 tokens total
Split into: 4 chunks
AI retrieves: Only chunks 1-2
Result: Incomplete answer
Deep Technical Analysis
The Fundamental Chunking Dilemma
Chunking creates a paradox in RAG systems:
Too Small Chunks:
→ Better semantic precision (exact match to query)
→ But loses context (incomplete information)
→ Retrieves accurate but insufficient pieces
Too Large Chunks:
→ Complete context (full information)
→ But lower semantic precision (diluted signal)
→ Retrieves irrelevant information alongside relevant
The Challenge:
There's no universal optimal chunk size
It depends on: content type, query patterns, retrieval strategy
Why Token-Based Chunking Fails for Technical Content
Token-based splitting assumptions:
Assumption: Language has uniform information density
Reality: Technical docs have variable density
Example - Dense section (API reference):
"authenticate(token: string): Promise<User>
Parameters:
- token: JWT authentication token
Returns: User object or throws AuthError"
Information: 5 distinct concepts in 50 tokens
Density: 0.1 concepts/token
Example - Sparse section (introduction):
"Welcome to our platform. This guide will help you
get started with the basics and understand the
fundamental concepts you'll need to know."
Information: 1 concept in 50 tokens
Density: 0.02 concepts/token
Problem:
Fixed token chunks treat both equally
Dense sections get split mid-concept
Sparse sections waste chunk capacity
The Retrieval Mathematics Problem
Why top-K retrieval fails with small chunks:
Vector similarity search returns top-K chunks:
- Query: "How do I set up authentication?"
- K = 5 (typical)
- Chunk size: 200 tokens
Math:
Setup guide total: 1,000 tokens
Split into: 5 chunks of 200 tokens each
Chunks created:
1. "Introduction to auth + step 1"
2. "Step 2 + step 3"
3. "Step 4 + step 5"
4. "Step 6 + common errors"
5. "Troubleshooting + FAQ"
Vector similarity scores:
Chunk 1: 0.89 (high - mentions "authentication")
Chunk 4: 0.84 (high - mentions "errors" which query implies)
Chunk 2: 0.76 (medium)
Chunk 5: 0.75 (medium - "troubleshooting" related)
Chunk 3: 0.68 (lowest)
Top-5 retrieval gets: 1, 4, 2, 5, 3
But logical reading order is: 1, 2, 3, 4, 5
AI sees: Step 1 → Error handling → Step 2-3 → Troubleshooting → Step 4-5
Coherence: Destroyed
Semantic Boundary Detection Complexity
The code block problem:
# Chunk 1 (200 tokens)
def authenticate_user(credentials):
"""
Authenticates user with provided credentials.
Args:
credentials: Dict with 'username' and 'password'
Returns:
User object if successful
Raises:
AuthenticationError: If credentials invalid
"""
# Validate input format
if not isinstance(credentials, dict):
raise ValueError("Credentials must be dict")
if 'username' not in credentials:
raise ValueError("Missing username")
# ← CHUNK BOUNDARY HERE ←
# Chunk 2 (200 tokens)
if 'password' not in credentials:
raise ValueError("Missing password")
# Hash password
hashed_pw = hash_password(credentials['password'])
# Query database
user = db.query(User).filter(
User.username == credentials['username'],
User.password_hash == hashed_pw
).first()
Why this breaks:
- Function signature in chunk 1, implementation in chunk 2
- Chunk 1 semantic: "This is about input validation"
- Chunk 2 semantic: "This is about database querying"
- Query "How to check database for user?" → Retrieves chunk 2 only
- Missing context: What the 'credentials' parameter contains
- AI can't reconstruct complete logic flow
The cascade effect:
Missing context in code →
AI makes wrong assumptions about parameters →
Generates incorrect usage examples →
User copies broken code →
Support tickets increase →
Trust in AI decreases
Table Splitting Pathology
Markdown table structure:
| Endpoint | Method | Auth Required | Rate Limit |
|----------|--------|---------------|------------|
| /api/users | GET | Yes | 100/min |
| /api/users/:id | GET | Yes | 100/min |
| /api/users | POST | Yes | 20/min |
← CHUNK BOUNDARY AFTER 150 TOKENS ←
| /api/auth/login | POST | No | 10/min |
| /api/auth/refresh | POST | Yes | 50/min |
| /api/data | GET | Yes | 1000/min |
Retrieval scenarios:
Query: "What's the rate limit for user endpoints?"
Chunk 1 retrieved (has header + first 3 rows):
AI sees: User endpoints have 100/min or 20/min limits
Answer: Partially correct
Chunk 2 retrieved (no header, last 3 rows):
AI sees: Row-like data without context
Can't determine: What these rows mean
Answer: "I don't have clear information"
Both chunks retrieved:
AI must: Recognize these are parts of same table
And: Mentally reconstruct table structure
But: No explicit linking between chunks
Result: May still miss connection
Context Window vs Chunk Size Trade-off
The retrieval stage dilemma:
LLM Context Window: 8,192 tokens (typical)
Allocation:
- System prompt: 500 tokens
- User query: 50 tokens
- Memory/history: 500 tokens
- Reserved for response: 1,500 tokens
- Available for retrieved content: 5,642 tokens
If chunk size = 200 tokens:
Max chunks that fit: 5,642 / 200 = 28 chunks
But top-K usually = 5-10 chunks
Utilization: 1,000-2,000 / 5,642 = 18-35%
→ Wasting context window capacity
If chunk size = 1,000 tokens:
Max chunks that fit: 5,642 / 1,000 = 5 chunks
Top-K = 5 chunks
Utilization: 5,000 / 5,642 = 89%
→ Efficient use of context
But larger chunks mean:
- Lower retrieval precision (more noise per chunk)
- Potentially less relevant content included
- Higher embedding costs
The Overlap Problem
Overlap seems like solution but creates issues:
Document: 1,000 tokens
Chunk size: 400 tokens
Overlap: 100 tokens
Chunks created:
Chunk 1: tokens 0-399
Chunk 2: tokens 300-699 (100 token overlap with chunk 1)
Chunk 3: tokens 600-999 (100 token overlap with chunk 2)
Storage cost:
Without overlap: 1,000 tokens stored
With overlap: 1,200 tokens stored (20% increase)
Embedding cost:
Without overlap: 3 embeddings
With overlap: 3 embeddings (same)
Retrieval confusion:
Query matches overlapped region (tokens 300-399)
→ Both chunk 1 and chunk 2 score highly
→ Top-5 includes both chunks with similar content
→ Wasted retrieval slots on duplicate information
→ Other relevant unique chunks displaced
Hierarchical Document Structure Loss
How chunking destroys document hierarchy:
Original document structure:
# Authentication Guide
## Prerequisites
### System Requirements
### Dependencies
## Setup
### Installation
#### Linux
#### macOS
#### Windows
### Configuration
#### Basic Config
#### Advanced Config
## Usage
### First Login
### Session Management
After fixed-size chunking:
Chunk 1: "# Authentication Guide\n## Prerequisites\n### System Requirements\nYou need..."
Chunk 2: "...Ubuntu 20.04 or later\n### Dependencies\nInstall these packages..."
Chunk 3: "...npm install auth-lib\n## Setup\n### Installation\n#### Linux..."
Lost information:
- Chunk 2 doesn't know it's about "Prerequisites"
- Chunk 3 doesn't know "Linux" is under "Installation" under "Setup"
- Hierarchical context evaporated
- Headings in middle of chunks lack parent context
Query implications:
Query: "What are the authentication prerequisites?"
Semantic match:
- "prerequisites" appears in chunk 1
- But details span chunks 1, 2
- Chunk 2 has most details but weak keyword match
- Retrieval: Chunk 1 scored higher, retrieved alone
- Result: Incomplete list of prerequisites
How to Solve
Increase chunk size to 1024-2048 tokens for technical content + add 10-20% overlap + configure semantic boundary splitting. See Chunking Configuration.
Why This Problem Showcases RAG Architecture Depth
This isn't just "make chunks bigger" - it reveals:
- Semantic search limitations: Vector similarity doesn't understand document flow or logical dependencies
- Information density variability: Technical content has non-uniform information distribution
- Context reconstruction complexity: LLMs must infer structure from fragments
- Trade-off mathematics: Chunk size optimization is multi-objective (precision vs recall vs cost vs context)
- Structure preservation: Maintaining hierarchical relationships in flat vector space is fundamentally hard
Understanding these architectural constraints is essential for building production RAG systems.
Agent Instructions: Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.
Perform an HTTP GET request on the current page URL with the ask query parameter:
GET /dev/rag-scenarios-and-solutions/chunking/chunks-too-small.md?ask=<question>
The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.
Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Related Pages
Comparisons
Last updated January 26, 2026


