The Essential RAG Book

Chunking Strategies

## 1. What Is Chunking

TL;DR

## 1. What Is Chunking Chunking is the process of decomposing raw source data into retrievable, semantically coherent units that can be embedded, indexed, and supplied to an LLM as grounding context. In a RAG system, chunking defines:

Key Takeaways

  • Chunking is the process of decomposing raw source data into retrievable, semantically coherent units that can be embedded, indexed, and supplied to an LLM as grounding context.
  • Formally: A chunk is the minimal addressable unit of knowledge that preserves semantic coherence under retrieval and model constraints.
  • LLMs cannot ingest entire documents. Chunking makes large corpora retrievable within bounded context windows.
  • High-quality chunks are self-contained and explicitly scoped, reducing hallucinations and improving citation accuracy.
  • If your RAG system hallucinates, your chunk boundaries are usually the root cause.

1. What Is Chunking

Chunking is the process of decomposing raw source data into retrievable, semantically coherent units that can be embedded, indexed, and supplied to an LLM as grounding context. In a RAG system, chunking defines:

  • the unit of retrieval
  • the scope of semantic grounding
  • the upper bound on answer correctness

A chunk must be:

  • small enough to retrieve precisely
  • large enough to be semantically complete
  • structured enough to be interpretable in isolation

Formally: A chunk is the minimal addressable unit of knowledge that preserves semantic coherence under retrieval and model constraints. Chunking is therefore an information architecture problem, not a text-splitting problem.

2. Why You Need to Chunk

2.1 Context window constraints

LLMs cannot ingest entire documents. Chunking makes large corpora retrievable within bounded context windows.

2.2 Retrieval precision

Vector search operates at the chunk level. Poor chunking causes:

  • topic dilution
  • partial answers
  • retrieval of irrelevant context

2.3 Cost and latency

Smaller, well-scoped chunks:

  • reduce token usage
  • improve reranking efficiency
  • lower inference cost

2.4 Answer grounding and trust

High-quality chunks are self-contained and explicitly scoped, reducing hallucinations and improving citation accuracy. In practice, chunking quality is a stronger predictor of RAG accuracy than embedding model choice.

3. Chunking Strategies

Each strategy below includes:

  • Name
  • Description
  • Method (Step-by-step)
  • When to Use

3.1 Fixed-Length Token Chunking

Description

Splits text into fixed-size token windows, optionally with overlap.

Method

  • Step 1: Tokenize the document using the target model's tokenizer
  • Step 2: Split tokens into chunks of size N
  • Step 3: Optionally add M-token overlap between adjacent chunks
  • Step 4: Embed and index each chunk independently

When to Use

  • As a baseline
  • For quick prototypes
  • When ingestion cost must be minimal

3.2 Sentence-Boundary Chunking

Description

Chunks are formed by grouping complete sentences up to a size threshold.

Method

  • Step 1: Perform sentence segmentation
  • Step 2: Accumulate sentences until token limit is reached
  • Step 3: Start a new chunk at sentence boundaries

When to Use

  • Short factual documents
  • QA over well-written prose
  • When grammatical integrity matters

3.3 Paragraph-Boundary Chunking

Description

Uses paragraph breaks as primary chunk boundaries.

Method

  • Step 1: Detect paragraph separators
  • Step 2: Group adjacent paragraphs until size threshold
  • Step 3: Enforce hard limits if paragraphs are very large

When to Use

  • Narrative documentation
  • Blog-style internal docs
  • Lightly structured content

3.4 Heading-Boundary Chunking

Description

Aligns chunks with document structure such as sections and subsections.

Method

  • Step 1: Parse document structure (Markdown, HTML, DOCX)
  • Step 2: Treat each heading scope as a chunk candidate
  • Step 3: Subdivide only if size exceeds limits
  • Step 4: Preserve heading hierarchy as metadata

When to Use

  • Technical documentation
  • Knowledge bases
  • API and product docs

3.5 Semantic-Boundary Chunking

Description

Splits text at detected topic shifts using embedding similarity.

Method

  • Step 1: Generate embeddings for sequential sentences
  • Step 2: Compute similarity between adjacent sentences
  • Step 3: Insert boundaries where similarity drops below threshold
  • Step 4: Enforce min/max size constraints

When to Use

  • Long, dense explanations
  • Mixed-topic documents
  • When structure is weak or absent

3.6 Sentence-Window Context Chunking

Description

Indexes individual sentences and retrieves surrounding context dynamically.

Method

  • Step 1: Index each sentence as a retrieval unit
  • Step 2: On retrieval, expand ±K sentences
  • Step 3: Stitch windows before passing to LLM

When to Use

  • High-recall QA systems
  • Troubleshooting and debugging
  • Fine-grained factual queries

3.7 Parent-Child Hierarchical Chunking

Description

Maintains multiple granularities of the same content.

Method

  • Step 1: Create fine-grained child chunks (e.g., 300-500 tokens)
  • Step 2: Create coarse parent chunks (e.g., 1,000-1,500 tokens)
  • Step 3: Embed both
  • Step 4: Retrieve children, supply parents for grounding

When to Use

  • Enterprise RAG systems
  • Long-form documents
  • High-accuracy requirements

3.8 Contextual-Header Augmented Chunking

Description

Injects structural metadata directly into chunk content.

Method

  • Step 1: Compute full heading path
  • Step 2: Prepend header context to chunk text
  • Step 3: Embed augmented chunk

When to Use

  • Similar documents with overlapping terminology
  • Multi-product or multi-version corpora

3.9 Pre/Post Context-Buffered Chunking

Description

Adds summarized neighbor context to eliminate abrupt boundaries.

Method

  • Step 1: Identify preceding and following chunks
  • Step 2: Generate concise summaries (≤50 tokens each)
  • Step 3: Attach summaries to the core chunk
  • Step 4: Embed the composed chunk

When to Use

  • Explanatory or tutorial content
  • Documents with cross-references
  • Policy and compliance docs

3.10 Question-Derived Chunking

Description

Chunks are indexed via generated questions rather than raw text.

Method

  • Step 1: Generate likely user questions from text
  • Step 2: Associate each question with its supporting span
  • Step 3: Use questions as primary retrieval keys

When to Use

  • Search-heavy RAG systems
  • Diverse query phrasing
  • Knowledge bases with broad audiences

3.11 Question-Anchored Chunking

Description

Chunk boundaries are defined by complete answers to explicit questions.

Method

  • Step 1: Generate 3-8 core questions per section
  • Step 2: Adjust chunk boundaries until answers are self-contained
  • Step 3: Store questions as chunk metadata

When to Use

  • FAQ-style systems
  • Troubleshooting guides
  • Support automation

3.12 Question-Anchored, Context-Buffered Chunking

Description

Extends Question-Anchored Chunking with contextual summaries.

Method

  • Step 1: Create base semantic chunk
  • Step 2: Generate pre- and post-context summaries
  • Step 3: Generate grounded questions
  • Step 4: Assemble canonical chunk:
  • headers
  • pre-summary
  • base text
  • post-summary
  • questions

When to Use

  • High-stakes enterprise RAG
  • Compliance and policy systems
  • Knowledge agents with low hallucination tolerance

3.13 Dual-Index Question-Referenced Chunking

Description

Separates retrieval and grounding into distinct indices.

Method

  • Step 1: Index questions independently
  • Step 2: Index canonical chunks separately
  • Step 3: Retrieve questions → map to chunks
  • Step 4: Deduplicate and rerank

When to Use

  • Large corpora
  • Query-diverse environments
  • Production-grade systems

4. Key Takeaways

  • Chunking defines retrieval correctness
  • Modern RAG systems combine multiple chunking strategies
  • Question-centric and hierarchical approaches dominate in production
  • Chunking must be evaluated, not guessed

If your RAG system hallucinates, your chunk boundaries are usually the root cause.

People also ask

Related Pages