The Essential RAG Book
Chunking Strategies
## 1. What Is Chunking
TL;DR
## 1. What Is Chunking Chunking is the process of decomposing raw source data into retrievable, semantically coherent units that can be embedded, indexed, and supplied to an LLM as grounding context. In a RAG system, chunking defines:
Key Takeaways
- Chunking is the process of decomposing raw source data into retrievable, semantically coherent units that can be embedded, indexed, and supplied to an LLM as grounding context.
- Formally: A chunk is the minimal addressable unit of knowledge that preserves semantic coherence under retrieval and model constraints.
- LLMs cannot ingest entire documents. Chunking makes large corpora retrievable within bounded context windows.
- High-quality chunks are self-contained and explicitly scoped, reducing hallucinations and improving citation accuracy.
- If your RAG system hallucinates, your chunk boundaries are usually the root cause.
1. What Is Chunking
Chunking is the process of decomposing raw source data into retrievable, semantically coherent units that can be embedded, indexed, and supplied to an LLM as grounding context. In a RAG system, chunking defines:
- the unit of retrieval
- the scope of semantic grounding
- the upper bound on answer correctness
A chunk must be:
- small enough to retrieve precisely
- large enough to be semantically complete
- structured enough to be interpretable in isolation
Formally: A chunk is the minimal addressable unit of knowledge that preserves semantic coherence under retrieval and model constraints. Chunking is therefore an information architecture problem, not a text-splitting problem.
2. Why You Need to Chunk
2.1 Context window constraints
LLMs cannot ingest entire documents. Chunking makes large corpora retrievable within bounded context windows.
2.2 Retrieval precision
Vector search operates at the chunk level. Poor chunking causes:
- topic dilution
- partial answers
- retrieval of irrelevant context
2.3 Cost and latency
Smaller, well-scoped chunks:
- reduce token usage
- improve reranking efficiency
- lower inference cost
2.4 Answer grounding and trust
High-quality chunks are self-contained and explicitly scoped, reducing hallucinations and improving citation accuracy. In practice, chunking quality is a stronger predictor of RAG accuracy than embedding model choice.
3. Chunking Strategies
Each strategy below includes:
- Name
- Description
- Method (Step-by-step)
- When to Use
3.1 Fixed-Length Token Chunking
Description
Splits text into fixed-size token windows, optionally with overlap.
Method
- Step 1: Tokenize the document using the target model's tokenizer
- Step 2: Split tokens into chunks of size N
- Step 3: Optionally add M-token overlap between adjacent chunks
- Step 4: Embed and index each chunk independently
When to Use
- As a baseline
- For quick prototypes
- When ingestion cost must be minimal
3.2 Sentence-Boundary Chunking
Description
Chunks are formed by grouping complete sentences up to a size threshold.
Method
- Step 1: Perform sentence segmentation
- Step 2: Accumulate sentences until token limit is reached
- Step 3: Start a new chunk at sentence boundaries
When to Use
- Short factual documents
- QA over well-written prose
- When grammatical integrity matters
3.3 Paragraph-Boundary Chunking
Description
Uses paragraph breaks as primary chunk boundaries.
Method
- Step 1: Detect paragraph separators
- Step 2: Group adjacent paragraphs until size threshold
- Step 3: Enforce hard limits if paragraphs are very large
When to Use
- Narrative documentation
- Blog-style internal docs
- Lightly structured content
3.4 Heading-Boundary Chunking
Description
Aligns chunks with document structure such as sections and subsections.
Method
- Step 1: Parse document structure (Markdown, HTML, DOCX)
- Step 2: Treat each heading scope as a chunk candidate
- Step 3: Subdivide only if size exceeds limits
- Step 4: Preserve heading hierarchy as metadata
When to Use
- Technical documentation
- Knowledge bases
- API and product docs
3.5 Semantic-Boundary Chunking
Description
Splits text at detected topic shifts using embedding similarity.
Method
- Step 1: Generate embeddings for sequential sentences
- Step 2: Compute similarity between adjacent sentences
- Step 3: Insert boundaries where similarity drops below threshold
- Step 4: Enforce min/max size constraints
When to Use
- Long, dense explanations
- Mixed-topic documents
- When structure is weak or absent
3.6 Sentence-Window Context Chunking
Description
Indexes individual sentences and retrieves surrounding context dynamically.
Method
- Step 1: Index each sentence as a retrieval unit
- Step 2: On retrieval, expand ±K sentences
- Step 3: Stitch windows before passing to LLM
When to Use
- High-recall QA systems
- Troubleshooting and debugging
- Fine-grained factual queries
3.7 Parent-Child Hierarchical Chunking
Description
Maintains multiple granularities of the same content.
Method
- Step 1: Create fine-grained child chunks (e.g., 300-500 tokens)
- Step 2: Create coarse parent chunks (e.g., 1,000-1,500 tokens)
- Step 3: Embed both
- Step 4: Retrieve children, supply parents for grounding
When to Use
- Enterprise RAG systems
- Long-form documents
- High-accuracy requirements
3.8 Contextual-Header Augmented Chunking
Description
Injects structural metadata directly into chunk content.
Method
- Step 1: Compute full heading path
- Step 2: Prepend header context to chunk text
- Step 3: Embed augmented chunk
When to Use
- Similar documents with overlapping terminology
- Multi-product or multi-version corpora
3.9 Pre/Post Context-Buffered Chunking
Description
Adds summarized neighbor context to eliminate abrupt boundaries.
Method
- Step 1: Identify preceding and following chunks
- Step 2: Generate concise summaries (≤50 tokens each)
- Step 3: Attach summaries to the core chunk
- Step 4: Embed the composed chunk
When to Use
- Explanatory or tutorial content
- Documents with cross-references
- Policy and compliance docs
3.10 Question-Derived Chunking
Description
Chunks are indexed via generated questions rather than raw text.
Method
- Step 1: Generate likely user questions from text
- Step 2: Associate each question with its supporting span
- Step 3: Use questions as primary retrieval keys
When to Use
- Search-heavy RAG systems
- Diverse query phrasing
- Knowledge bases with broad audiences
3.11 Question-Anchored Chunking
Description
Chunk boundaries are defined by complete answers to explicit questions.
Method
- Step 1: Generate 3-8 core questions per section
- Step 2: Adjust chunk boundaries until answers are self-contained
- Step 3: Store questions as chunk metadata
When to Use
- FAQ-style systems
- Troubleshooting guides
- Support automation
3.12 Question-Anchored, Context-Buffered Chunking
Description
Extends Question-Anchored Chunking with contextual summaries.
Method
- Step 1: Create base semantic chunk
- Step 2: Generate pre- and post-context summaries
- Step 3: Generate grounded questions
- Step 4: Assemble canonical chunk:
- headers
- pre-summary
- base text
- post-summary
- questions
When to Use
- High-stakes enterprise RAG
- Compliance and policy systems
- Knowledge agents with low hallucination tolerance
3.13 Dual-Index Question-Referenced Chunking
Description
Separates retrieval and grounding into distinct indices.
Method
- Step 1: Index questions independently
- Step 2: Index canonical chunks separately
- Step 3: Retrieve questions → map to chunks
- Step 4: Deduplicate and rerank
When to Use
- Large corpora
- Query-diverse environments
- Production-grade systems
4. Key Takeaways
- Chunking defines retrieval correctness
- Modern RAG systems combine multiple chunking strategies
- Question-centric and hierarchical approaches dominate in production
- Chunking must be evaluated, not guessed
If your RAG system hallucinates, your chunk boundaries are usually the root cause.


