Cedar - Context-Aware

Cedar is a balanced RAG strategy that enhances retrieval accuracy by rewriting user queries based on conversation context and memory before searching the vector database.

Overview

Cedar improves upon standard RAG by understanding conversation context:

Analyzes conversation history (memory)
Rewrites ambiguous queries to be more explicit
Maintains context across multiple turns
Better handles follow-up questions

Performance: ~2-3 seconds Ideal for: Conversational queries, follow-up questions, ambiguous phrasing

How Cedar Works

Processing Flow

User Query: "What about pricing?"
     ↓
[1] Analyze Conversation Memory
    → Previous: "Tell me about the Enterprise plan"
     ↓
[2] Rewrite Query with Context
    → Rewritten: "What is the pricing for the Enterprise plan?"
     ↓
[3] Embed Rewritten Query → Vector [0.15, -0.43, 0.76, ...]
     ↓
[4] Vector Search (Pinecone/TigrisDB)
     ↓
[5] Retrieve Top 5-10 Documents
     ↓
[6] Build Context from Documents
     ↓
[7] LLM Completion (with context + original query)
     ↓
Response: "The Enterprise plan costs $299/month..."

Technical Details

Step 1: Memory Analysis

Reviews last 3-5 conversation turns
Identifies context: entities, topics, intents
Determines if query needs clarification

Step 2: Query Rewriting

Model: gpt-4o-mini (fast, cost-effective)
Prompt: "Rewrite this query based on conversation history"
Output: More explicit, self-contained query

Example Rewrites:

Original: "How do I do that?"
Context: Previous discussion about password reset
Rewritten: "How do I reset my password?"

Original: "What about the other option?"
Context: Comparing Pro vs Enterprise plans
Rewritten: "What features are included in the Pro plan?"

Original: "Is it available?"
Context: Asking about SSO feature
Rewritten: "Is SSO (Single Sign-On) available in the platform?"

Step 3-7: Standard RAG Flow Same as Redwood after query rewriting

Performance Characteristics

Latency Breakdown

Memory Analysis:       ~100ms
Query Rewriting:       ~500ms ← Additional cost vs Redwood
Query Embedding:       ~100ms
Vector Search:         ~200ms
Context Building:      ~50ms
LLM Completion:        ~800ms
Response Streaming:    ~200ms
────────────────────────────
Total:                 ~2.0s

Token Usage

Component	Tokens	Notes
Memory Context	100-300	Last 3-5 turns
Rewriting Prompt	50-100	Instructions + query
Rewritten Query	10-30	Output of rewriting
System Prompt	150-300	Agent instructions
Retrieved Context	800-1500	Top 5-10 documents
User Query	10-50	Original question
Response	150-400	Generated answer
Total	~1,800-2,500	Per request

Cost Implications

Per 1,000 Requests (GPT-3.5-turbo):

Embedding: ~$0.01
Query Rewriting: ~$0.10 ← Additional cost
LLM Completion: ~$0.40
Vector Search: ~$0.05
Total: ~$0.56 (+55% vs Redwood)

When to Use Cedar

✅ Ideal Use Cases

1. Conversational Support Bots

User: "What are your pricing plans?"
Agent: "We have Free, Pro, and Enterprise..."
User: "What's included in the middle one?" ✅
→ Cedar rewrites: "What's included in the Pro plan?"

2. Multi-Turn Consultations

User: "I need help with email integration"
Agent: "We support Gmail and Outlook..."
User: "How do I set up the second one?" ✅
→ Cedar rewrites: "How do I set up Outlook integration?"

3. Contextual Knowledge Queries

User: "Tell me about data sources"
Agent: "We support Google Drive, Confluence..."
User: "How often does it sync?" ✅
→ Cedar rewrites: "How often do data sources sync?"

4. Ambiguous Questions

User: "What's the difference?" ✅
→ Context: Comparing RAG strategies
→ Cedar rewrites: "What's the difference between Redwood and Cedar RAG strategies?"

❌ Not Ideal For

1. Clear, Self-Contained Questions

User: "What is your refund policy?" ❌
→ No rewriting needed, Redwood is faster
→ Cedar adds unnecessary latency

2. First-Time Queries (No Context)

User: "Hello, what can you help with?" ❌
→ No previous conversation to leverage
→ Redwood sufficient

3. Cost-Sensitive High-Volume

10,000+ queries/day with tight budget ❌
→ 55% higher cost than Redwood
→ Consider Redwood for majority, Cedar for subset

Configuration

Agent Settings

When using Cedar strategy:

{
  "strategyCode": "INFER_QUESTION",
  "topK": 5-10,                    // Number of documents
  "temperature": 0.7,              // LLM creativity
  "maxTokens": 500,                // Response length
  "model": "gpt-4o",               // Or gpt-3.5-turbo
  "memoryTurns": 3,                // Conversation turns to remember
  "rewritingModel": "gpt-4o-mini"  // Fast model for rewriting
}

Optimization Tips

1. Tune Memory Length

Too few (1-2):  Misses context
Sweet spot (3-5): Good balance
Too many (10+):  Noise, slower

2. Rewriting Prompt Quality

Good Prompt:
"Rewrite the query to be self-contained based on conversation history.
Focus on entities, topics, and intents mentioned earlier."

Bad Prompt:
"Make query better"  ← Too vague

3. Hybrid Approach

# Use Cedar only when needed
if has_conversation_history and query_is_ambiguous:
    use_cedar()
else:
    use_redwood()  # Faster for clear questions

Comparison with Other Strategies

vs. Redwood (Standard)

Metric	Redwood	Cedar	Advantage
Speed	~1.2s	~2.0s	Redwood
Cost	$0.36/1k	$0.56/1k	Redwood
Conversational	❌ Poor	✅ Excellent	Cedar
Follow-ups	❌ Poor	✅ Excellent	Cedar
Clear queries	✅ Perfect	✅ Good	Redwood
Complexity	Low	Medium	Redwood

When to Switch:

Redwood → Cedar: When > 30% of queries are follow-ups
Cedar → Redwood: When speed is critical and questions are clear

vs. Cypress (Advanced)

Metric	Cedar	Cypress	Advantage
Speed	~2.0s	~3.5s	Cedar
Cost	$0.56/1k	$0.90/1k	Cedar
Accuracy	Good	Excellent	Cypress
Reranking	❌ No	✅ Yes	Cypress
Query Expansion	❌ No	✅ Yes	Cypress
Tier-Based	❌ No	✅ Yes	Cypress

When to Switch:

Cedar → Cypress: When accuracy is more important than speed/cost
Cypress → Cedar: When speed matters and accuracy is good enough

Real-World Performance

Case Study: E-commerce Customer Support

Setup:

2,000 product FAQs
Average conversation: 3-4 turns
50,000 queries/month
70% are follow-up questions

Redwood Results (Before):

Average latency: 1.1s ✅
User satisfaction: 3.2/5 ❌
Common complaint: "Doesn't understand my follow-up"
Accuracy: 72%

Cedar Results (After):

Average latency: 2.1s (slower but acceptable)
User satisfaction: 4.4/5 ✅
Accuracy: 89% ✅
ROI: +37% fewer support escalations

Case Study: SaaS Documentation Bot

Setup:

5,000 documentation pages
Technical audience
Average query: "How to configure X"
Follow-ups: "What about Y setting?"

Cedar Performance:

Latency: 1.9s (p95)
Follow-up handling: 94% success rate
Cost: $0.53/1k requests
User feedback: "Finally understands context!"

Key Insight: Cedar excels when users naturally have multi-turn conversations, even if each individual query takes slightly longer.

Advanced Features

Memory Summarization

When conversation gets long, Cedar automatically summarizes:

Turns 1-10: Full context (tokens: 2000)
Turns 11+:  Summarized (tokens: 300)

Summary Example:
"User is asking about Enterprise plan features,
specifically interested in SSO, API access, and
white-labeling. Budget conscious."

Configuration:

{
  "memoryTurns": 5,           // Remember last 5 turns
  "summarizeAfter": 10,       // Summarize after 10 turns
  "maxMemoryTokens": 500      // Max memory context
}

Contextual Entity Tracking

Cedar tracks entities across turns:

Turn 1: "Tell me about the Enterprise plan"
→ Tracked: product=Enterprise plan

Turn 2: "What's the pricing?"
→ Rewritten: "What is the pricing for the Enterprise plan?"
→ Uses tracked entity

Turn 3: "How about for 100 users?"
→ Rewritten: "What is the pricing for the Enterprise plan for 100 users?"
→ Compounds context

Intent Preservation

Cedar maintains user intent:

User: "I need to integrate with Google"
→ Intent: Integration inquiry
→ Entity: Google

User: "How long does setup take?"
→ Rewritten: "How long does Google integration setup take?"
→ Preserves integration intent

Monitoring Cedar

Key Metrics to Track

1. Rewriting Quality

Metric: % queries actually rewritten
Target: 40-60% (not everything needs rewriting)
Alert:  > 80% (too aggressive) or < 20% (not working)

2. Context Relevance

Metric: % rewritten queries with improved results
Target: > 85%
Measure: Compare retrieval results before/after rewriting

3. Latency Impact

Metric: Average rewriting latency
Target: < 600ms
Alert:  > 1s (investigate)

4. User Satisfaction

Metric: Thumbs up/down on responses
Compare: Sessions with > 3 turns vs single turn
Target:  Multi-turn satisfaction > 85%

Common Issues

Rewriting Too Aggressively:

Symptom: Every query is rewritten
Cause:   Memory window too large
Fix:     Reduce memoryTurns to 3-4

Missing Context:

Symptom: Rewritten queries still ambiguous
Cause:   Important entities not tracked
Fix:     Improve entity extraction prompt

Slow Performance:

Symptom: > 3s response time
Cause:   Rewriting model too large
Fix:     Switch to gpt-4o-mini from gpt-4o

Best Practices

1. Memory Management

✅ Keep memory focused on relevant context ✅ Clear memory for new topics ✅ Summarize long conversations ❌ Don't include entire conversation verbatim

2. Rewriting Prompts

✅ Be specific about what to preserve ✅ Include examples of good rewrites ✅ Instruct to add context, not change meaning ❌ Don't make rewrites too verbose

3. Testing

✅ Test with real conversation flows ✅ Compare results: original vs rewritten query ✅ A/B test Cedar vs Redwood ❌ Don't test only single-turn queries

4. Monitoring

✅ Track rewriting effectiveness ✅ Monitor latency impact ✅ Measure user satisfaction ✅ Review failed rewrites ❌ Don't assume it's working without data

Migration Guide

From Redwood to Cedar

Step 1: Enable Cedar

agent.strategyCode = "INFER_QUESTION"
agent.save()

Step 2: Monitor Initial Performance

First 24 hours: Watch latency
Compare accuracy metrics
Check user feedback

Step 3: Tune Configuration

// Start conservative
memoryTurns: 3
maxMemoryTokens: 300

// Adjust based on performance
if (accuracy < target) memoryTurns++
if (latency > 2.5s) memoryTurns--

Step 4: Gradual Rollout

Day 1-7:   10% traffic
Day 8-14:  50% traffic
Day 15+:   100% traffic (if metrics good)

From Cedar to Cypress

When to upgrade:

Accuracy is critical
Budget allows for higher costs
Willing to accept 3-4s latency
Need tier-based source control

Code Examples

Using Cedar via API

curl -X POST https://api.twig.so/api/chat \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What about pricing?",
    "agentId": "agent-123",
    "strategyCode": "INFER_QUESTION",
    "sessionId": "session-456",
    "memory": [
      {
        "role": "user",
        "content": "Tell me about the Enterprise plan"
      },
      {
        "role": "assistant",
        "content": "The Enterprise plan includes..."
      }
    ]
  }'

Response Format

{
  "response": "The Enterprise plan costs $299/month for up to 100 users...",
  "rewrittenQuery": "What is the pricing for the Enterprise plan?",
  "sources": [
    {
      "title": "Enterprise Pricing",
      "url": "https://example.com/pricing",
      "relevance": 0.96
    }
  ],
  "metadata": {
    "strategy": "INFER_QUESTION",
    "latency": 2.1,
    "tokensUsed": 2134,
    "rewritingLatency": 0.48,
    "confidence": 0.94
  }
}

Next Steps

Cypress Strategy - Maximum accuracy with reranking
Redwood Strategy - Fastest option for clear queries
Performance Tuning
Cost Optimization

Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the ask query parameter:

GET /dev/product/overview-1/cedar.md?ask=<question>

The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.

Key Takeaways