Rag Scenarios And Solutions
Nested Lists Broken
Multi-level nested lists lose their hierarchical structure during chunking, making step-by-step procedures and hierarchical information incomprehensible.
TL;DR
Multi-level nested lists lose their hierarchical structure during chunking, making step-by-step procedures and hierarchical information incomprehensible.
Key Takeaways
- The Problem
- Deep Technical Analysis
- How to Solve
- Agent Instructions: Querying This Documentation
The Problem
Multi-level nested lists lose their hierarchical structure during chunking, making step-by-step procedures and hierarchical information incomprehensible.
Symptoms
- ❌ Sub-items separated from parents
- ❌ Indentation levels lost
- ❌ Numbered lists restart incorrectly
- ❌ Cannot determine item relationships
- ❌ Multi-step procedures broken
Real-World Example
Original nested list:
1. Configure API access
a. Generate API key in dashboard
b. Store key securely
i. Use environment variables
ii. Never commit to git
c. Test connection
2. Set up webhooks
- Create webhook endpoint
- Configure URL in settings
- Use HTTPS only
- Add authentication header
Chunk boundary here ↓
Chunk 1:
1. Configure API access
a. Generate API key in dashboard
b. Store key securely
i. Use environment variables
Chunk 2:
ii. Never commit to git
c. Test connection
2. Set up webhooks
- Create webhook endpoint
Lost: "ii" disconnected from parent "b", context unclear
Deep Technical Analysis
List Hierarchy Representation
Nested lists have parent-child relationships:
Hierarchical Structure:
Level 0: Main items (1, 2, 3)
Level 1: Sub-items (a, b, c)
Level 2: Sub-sub-items (i, ii, iii)
Level 3: Sub-sub-sub-items (α, β, γ)
Relationships:
→ 1.b.ii belongs to 1.b which belongs to 1
→ Cannot understand "ii" without knowing parent context
The Context Loss Problem:
Chunk contains only:
"ii. Never commit to git"
Without parent context:
→ What does "ii" refer to?
→ "Never commit to git" - commit what?
→ Why is this important?
Full context needed:
"1. Configure API access > b. Store key securely > ii. Never commit to git"
Hierarchy provides meaning
Markdown Indentation Detection
Nested lists use indentation:
Format Variations:
2-space indent:
- Parent
- Child
- Grandchild
4-space indent:
- Parent
- Child
- Grandchild
Tab indent:
- Parent
- Child
- Grandchild
Detection Challenges:
Mixed indentation:
- Parent (0 spaces)
- Child (2 spaces)
- Grandchild (6 spaces) ← Inconsistent jump
Or:
- Parent
- Child (3 spaces) ← Not multiple of 2
- Grandchild (6 spaces)
Parser must:
→ Detect indent size (2 vs 4 vs tab)
→ Handle inconsistencies
→ Infer hierarchy from relative indents
The Whitespace Ambiguity:
Spaces vs tabs:
- Parent (space-indented)
- Child (tab-indented)
- Another child (space-indented)
Is "Another child" at same level as "Child"?
→ Depends on tab width (4 or 8 spaces?)
→ Visual appearance may differ from structure
Numbered List State
Numbered lists have sequential state:
Numbering Types:
1. First
2. Second
3. Third
vs.
1. First
1. Second
1. Third ← Auto-renumbered by renderer
vs.
a. First
b. Second
c. Third
vs.
i. First
ii. Second
iii. Third
State Tracking:
Parser must track:
→ Current number/letter
→ Numbering scheme (1, a, i, I, A)
→ Nesting level
→ Reset points (when does numbering restart?)
Chunk boundary issues:
Chunk 1 ends at: "2. Second item"
Chunk 2 starts at: "3. Third item"
Chunk 2 isolated:
→ "Why does it start at 3?"
→ Missing items 1 and 2
→ Incomplete sequence
Mixed List Types
Lists can mix ordered and unordered:
Hybrid Structure:
1. Main step
- Detail point
- Another detail
2. Next main step
a. Sub-step A
b. Sub-step B
- Implementation note
- Another note
3. Final step
Type Transitions:
Complexity:
→ Ordered (1, 2, 3)
→ Unordered (-, -)
→ Ordered (a, b)
→ Unordered (-, -)
Each transition must be tracked:
→ What type is parent?
→ What type is current level?
→ Numbering vs bullets
Chunking must preserve:
"Step 2 > Sub-step b > Implementation note (bullet)"
Semantic Meaning of Lists
Different lists serve different purposes:
Procedural Steps:
1. Start the server
2. Configure settings
3. Test connection
Semantic requirement: → ORDER MATTERS → Must do step 1 before step 2 → Cannot rearrange
If chunked separately: → Chunk with step 2 is incomplete → User doesn't know prerequisites → Procedure fails
Feature Lists:
Features:
- Fast performance
- Easy to use
- Highly scalable
Semantic requirement: → Order doesn't matter → Each item independent → Can present in any order
Chunking impact: → Less critical if split → But: Loses grouping under "Features"
Decision Trees:
If error occurs:
1. Check API key
- Valid? Proceed to step 2
- Invalid? Generate new key
2. Check rate limits
- Within limits? Proceed to step 3
- Exceeded? Wait and retry
3. Check server status
Semantic requirement: → Conditional logic → Decision branches → Cannot isolate one branch
Chunking breaks: → Decision tree structure → Conditional relationships → User can't follow logic
Multi-Paragraph List Items
List items can contain multiple paragraphs:
Complex Item Structure:
1. First step: Do something important
This requires careful attention. Make sure you:
- Check all prerequisites
- Verify permissions
- Back up data
Once complete, move to step 2.
2. Second step: Do the next thing
Chunking Challenge:
How to keep together:
→ Item number/bullet
→ First paragraph
→ Sub-lists within item
→ Additional paragraphs
→ All belong to same list item
Naive chunker sees:
→ Line: "1. First step: Do something important"
→ Blank line
→ Paragraph: "This requires careful attention..."
→ Sub-list: "- Check all prerequisites"
May think these are separate blocks:
→ List item
→ Regular paragraph (not in list)
→ New list (not a sub-list)
Structure lost
List Continuation After Interruption
Lists can resume after other content:
Interrupted Lists:
1. First item
2. Second item
> Note: Pay attention to the following steps
3. Third item (continues list!)
4. Fourth item
Parser Challenge:
Is "3. Third item" a new list or continuation?
Markdown interpretation:
→ Blank line + blockquote = list interruption
→ "3. Third item" continues original list
HTML output:
<ol>
<li>First item</li>
<li>Second item</li>
</ol>
<blockquote>Note...</blockquote>
<ol start="3">
<li>Third item</li>
<li>Fourth item</li>
</ol>
Chunking implications:
→ Items 1-2 in one chunk
→ Items 3-4 in another chunk
→ Must track: This is list continuation, not new list
→ "start=3" attribute matters
Nested List Flattening Strategies
Converting hierarchical lists to linear text:
Strategy 1: Flatten with Prefixes:
Original:
1. Config
a. API key
b. Endpoint
2. Deploy
Flattened:
"1. Config - a. API key"
"1. Config - b. Endpoint"
"2. Deploy"
Pros: Preserves context
Cons: Repetitive, verbose
Strategy 2: Breadcrumb Style:
"1. Config > a. API key"
"1. Config > b. Endpoint"
"2. Deploy"
Pros: Clear hierarchy, compact
Cons: Non-standard notation
Strategy 3: Natural Language:
"Step 1 (Config): Sub-step a (API key)"
"Step 1 (Config): Sub-step b (Endpoint)"
"Step 2: Deploy"
Pros: Readable, explicit
Cons: Very verbose, more tokens
Strategy 4: Indented Text:
"1. Config
a. API key
b. Endpoint
2. Deploy"
Pros: Preserves visual structure
Cons: Indentation may not render correctly in all contexts
How to Solve
Implement indent-level tracking + keep list items with all their children together + flatten nested lists with breadcrumb notation (parent > child) + preserve numbering continuity + add semantic labels ("Step 1a", "Sub-item"). See List Structure Preservation.
Agent Instructions: Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.
Perform an HTTP GET request on the current page URL with the ask query parameter:
GET /dev/rag-scenarios-and-solutions/chunking/nested-lists.md?ask=<question>
The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.
Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Related Pages
Last updated January 26, 2026


