Product

Cedar - Context-Aware

Cedar is a balanced RAG strategy that enhances retrieval accuracy by rewriting user queries based on conversation context and memory before searching the vector database.

TL;DR

Cedar is a balanced RAG strategy that enhances retrieval accuracy by rewriting user queries based on conversation context and memory before searching the vector database.

Key Takeaways

  • Analy

Cedar is a balanced RAG strategy that enhances retrieval accuracy by rewriting user queries based on conversation context and memory before searching the vector database.

Overview

Cedar improves upon standard RAG by understanding conversation context:

  • Analyzes conversation history (memory)
  • Rewrites ambiguous queries to be more explicit
  • Maintains context across multiple turns
  • Better handles follow-up questions

Performance: ~2-3 seconds Ideal for: Conversational queries, follow-up questions, ambiguous phrasing

How Cedar Works

Processing Flow

User Query: "What about pricing?"
     ↓
[1] Analyze Conversation Memory
    → Previous: "Tell me about the Enterprise plan"
     ↓
[2] Rewrite Query with Context
    → Rewritten: "What is the pricing for the Enterprise plan?"
     ↓
[3] Embed Rewritten Query → Vector [0.15, -0.43, 0.76, ...]
     ↓
[4] Vector Search (Pinecone/TigrisDB)
     ↓
[5] Retrieve Top 5-10 Documents
     ↓
[6] Build Context from Documents
     ↓
[7] LLM Completion (with context + original query)
     ↓
Response: "The Enterprise plan costs $299/month..."

Technical Details

Step 1: Memory Analysis

  • Reviews last 3-5 conversation turns
  • Identifies context: entities, topics, intents
  • Determines if query needs clarification

Step 2: Query Rewriting

  • Model: gpt-4o-mini (fast, cost-effective)
  • Prompt: "Rewrite this query based on conversation history"
  • Output: More explicit, self-contained query

Example Rewrites:

Original: "How do I do that?"
Context: Previous discussion about password reset
Rewritten: "How do I reset my password?"

Original: "What about the other option?"
Context: Comparing Pro vs Enterprise plans
Rewritten: "What features are included in the Pro plan?"

Original: "Is it available?"
Context: Asking about SSO feature
Rewritten: "Is SSO (Single Sign-On) available in the platform?"

Step 3-7: Standard RAG Flow Same as Redwood after query rewriting

Performance Characteristics

Latency Breakdown

Memory Analysis:       ~100ms
Query Rewriting:       ~500ms ← Additional cost vs Redwood
Query Embedding:       ~100ms
Vector Search:         ~200ms
Context Building:      ~50ms
LLM Completion:        ~800ms
Response Streaming:    ~200ms
────────────────────────────
Total:                 ~2.0s

Token Usage

ComponentTokensNotes
Memory Context100-300Last 3-5 turns
Rewriting Prompt50-100Instructions + query
Rewritten Query10-30Output of rewriting
System Prompt150-300Agent instructions
Retrieved Context800-1500Top 5-10 documents
User Query10-50Original question
Response150-400Generated answer
Total~1,800-2,500Per request

Cost Implications

Per 1,000 Requests (GPT-3.5-turbo):

  • Embedding: ~$0.01
  • Query Rewriting: ~$0.10 ← Additional cost
  • LLM Completion: ~$0.40
  • Vector Search: ~$0.05
  • Total: ~$0.56 (+55% vs Redwood)

When to Use Cedar

✅ Ideal Use Cases

1. Conversational Support Bots

User: "What are your pricing plans?"
Agent: "We have Free, Pro, and Enterprise..."
User: "What's included in the middle one?" ✅
→ Cedar rewrites: "What's included in the Pro plan?"

2. Multi-Turn Consultations

User: "I need help with email integration"
Agent: "We support Gmail and Outlook..."
User: "How do I set up the second one?" ✅
→ Cedar rewrites: "How do I set up Outlook integration?"

3. Contextual Knowledge Queries

User: "Tell me about data sources"
Agent: "We support Google Drive, Confluence..."
User: "How often does it sync?" ✅
→ Cedar rewrites: "How often do data sources sync?"

4. Ambiguous Questions

User: "What's the difference?" ✅
→ Context: Comparing RAG strategies
→ Cedar rewrites: "What's the difference between Redwood and Cedar RAG strategies?"

❌ Not Ideal For

1. Clear, Self-Contained Questions

User: "What is your refund policy?" ❌
→ No rewriting needed, Redwood is faster
→ Cedar adds unnecessary latency

2. First-Time Queries (No Context)

User: "Hello, what can you help with?" ❌
→ No previous conversation to leverage
→ Redwood sufficient

3. Cost-Sensitive High-Volume

10,000+ queries/day with tight budget ❌
→ 55% higher cost than Redwood
→ Consider Redwood for majority, Cedar for subset

Configuration

Agent Settings

When using Cedar strategy:

{
  "strategyCode": "INFER_QUESTION",
  "topK": 5-10,                    // Number of documents
  "temperature": 0.7,              // LLM creativity
  "maxTokens": 500,                // Response length
  "model": "gpt-4o",               // Or gpt-3.5-turbo
  "memoryTurns": 3,                // Conversation turns to remember
  "rewritingModel": "gpt-4o-mini"  // Fast model for rewriting
}

Optimization Tips

1. Tune Memory Length

Too few (1-2):  Misses context
Sweet spot (3-5): Good balance
Too many (10+):  Noise, slower

2. Rewriting Prompt Quality

Good Prompt:
"Rewrite the query to be self-contained based on conversation history.
Focus on entities, topics, and intents mentioned earlier."

Bad Prompt:
"Make query better"  ← Too vague

3. Hybrid Approach

# Use Cedar only when needed
if has_conversation_history and query_is_ambiguous:
    use_cedar()
else:
    use_redwood()  # Faster for clear questions

Comparison with Other Strategies

vs. Redwood (Standard)

MetricRedwoodCedarAdvantage
Speed~1.2s~2.0sRedwood
Cost$0.36/1k$0.56/1kRedwood
Conversational❌ Poor✅ ExcellentCedar
Follow-ups❌ Poor✅ ExcellentCedar
Clear queries✅ Perfect✅ GoodRedwood
ComplexityLowMediumRedwood

When to Switch:

  • Redwood → Cedar: When > 30% of queries are follow-ups
  • Cedar → Redwood: When speed is critical and questions are clear

vs. Cypress (Advanced)

MetricCedarCypressAdvantage
Speed~2.0s~3.5sCedar
Cost$0.56/1k$0.90/1kCedar
AccuracyGoodExcellentCypress
Reranking❌ No✅ YesCypress
Query Expansion❌ No✅ YesCypress
Tier-Based❌ No✅ YesCypress

When to Switch:

  • Cedar → Cypress: When accuracy is more important than speed/cost
  • Cypress → Cedar: When speed matters and accuracy is good enough

Real-World Performance

Case Study: E-commerce Customer Support

Setup:

  • 2,000 product FAQs
  • Average conversation: 3-4 turns
  • 50,000 queries/month
  • 70% are follow-up questions

Redwood Results (Before):

  • Average latency: 1.1s ✅
  • User satisfaction: 3.2/5 ❌
  • Common complaint: "Doesn't understand my follow-up"
  • Accuracy: 72%

Cedar Results (After):

  • Average latency: 2.1s (slower but acceptable)
  • User satisfaction: 4.4/5 ✅
  • Accuracy: 89% ✅
  • ROI: +37% fewer support escalations

Case Study: SaaS Documentation Bot

Setup:

  • 5,000 documentation pages
  • Technical audience
  • Average query: "How to configure X"
  • Follow-ups: "What about Y setting?"

Cedar Performance:

  • Latency: 1.9s (p95)
  • Follow-up handling: 94% success rate
  • Cost: $0.53/1k requests
  • User feedback: "Finally understands context!"

Key Insight: Cedar excels when users naturally have multi-turn conversations, even if each individual query takes slightly longer.

Advanced Features

Memory Summarization

When conversation gets long, Cedar automatically summarizes:

Turns 1-10: Full context (tokens: 2000)
Turns 11+:  Summarized (tokens: 300)

Summary Example:
"User is asking about Enterprise plan features,
specifically interested in SSO, API access, and
white-labeling. Budget conscious."

Configuration:

{
  "memoryTurns": 5,           // Remember last 5 turns
  "summarizeAfter": 10,       // Summarize after 10 turns
  "maxMemoryTokens": 500      // Max memory context
}

Contextual Entity Tracking

Cedar tracks entities across turns:

Turn 1: "Tell me about the Enterprise plan"
→ Tracked: product=Enterprise plan

Turn 2: "What's the pricing?"
→ Rewritten: "What is the pricing for the Enterprise plan?"
→ Uses tracked entity

Turn 3: "How about for 100 users?"
→ Rewritten: "What is the pricing for the Enterprise plan for 100 users?"
→ Compounds context

Intent Preservation

Cedar maintains user intent:

User: "I need to integrate with Google"
→ Intent: Integration inquiry
→ Entity: Google

User: "How long does setup take?"
→ Rewritten: "How long does Google integration setup take?"
→ Preserves integration intent

Monitoring Cedar

Key Metrics to Track

1. Rewriting Quality

Metric: % queries actually rewritten
Target: 40-60% (not everything needs rewriting)
Alert:  > 80% (too aggressive) or < 20% (not working)

2. Context Relevance

Metric: % rewritten queries with improved results
Target: > 85%
Measure: Compare retrieval results before/after rewriting

3. Latency Impact

Metric: Average rewriting latency
Target: < 600ms
Alert:  > 1s (investigate)

4. User Satisfaction

Metric: Thumbs up/down on responses
Compare: Sessions with > 3 turns vs single turn
Target:  Multi-turn satisfaction > 85%

Common Issues

Rewriting Too Aggressively:

Symptom: Every query is rewritten
Cause:   Memory window too large
Fix:     Reduce memoryTurns to 3-4

Missing Context:

Symptom: Rewritten queries still ambiguous
Cause:   Important entities not tracked
Fix:     Improve entity extraction prompt

Slow Performance:

Symptom: > 3s response time
Cause:   Rewriting model too large
Fix:     Switch to gpt-4o-mini from gpt-4o

Best Practices

1. Memory Management

✅ Keep memory focused on relevant context ✅ Clear memory for new topics ✅ Summarize long conversations ❌ Don't include entire conversation verbatim

2. Rewriting Prompts

✅ Be specific about what to preserve ✅ Include examples of good rewrites ✅ Instruct to add context, not change meaning ❌ Don't make rewrites too verbose

3. Testing

✅ Test with real conversation flows ✅ Compare results: original vs rewritten query ✅ A/B test Cedar vs Redwood ❌ Don't test only single-turn queries

4. Monitoring

✅ Track rewriting effectiveness ✅ Monitor latency impact ✅ Measure user satisfaction ✅ Review failed rewrites ❌ Don't assume it's working without data

Migration Guide

From Redwood to Cedar

Step 1: Enable Cedar

agent.strategyCode = "INFER_QUESTION"
agent.save()

Step 2: Monitor Initial Performance

  • First 24 hours: Watch latency
  • Compare accuracy metrics
  • Check user feedback

Step 3: Tune Configuration

// Start conservative
memoryTurns: 3
maxMemoryTokens: 300

// Adjust based on performance
if (accuracy < target) memoryTurns++
if (latency > 2.5s) memoryTurns--

Step 4: Gradual Rollout

Day 1-7:   10% traffic
Day 8-14:  50% traffic
Day 15+:   100% traffic (if metrics good)

From Cedar to Cypress

When to upgrade:

  • Accuracy is critical
  • Budget allows for higher costs
  • Willing to accept 3-4s latency
  • Need tier-based source control

Code Examples

Using Cedar via API

curl -X POST https://api.twig.so/api/chat \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What about pricing?",
    "agentId": "agent-123",
    "strategyCode": "INFER_QUESTION",
    "sessionId": "session-456",
    "memory": [
      {
        "role": "user",
        "content": "Tell me about the Enterprise plan"
      },
      {
        "role": "assistant",
        "content": "The Enterprise plan includes..."
      }
    ]
  }'

Response Format

{
  "response": "The Enterprise plan costs $299/month for up to 100 users...",
  "rewrittenQuery": "What is the pricing for the Enterprise plan?",
  "sources": [
    {
      "title": "Enterprise Pricing",
      "url": "https://example.com/pricing",
      "relevance": 0.96
    }
  ],
  "metadata": {
    "strategy": "INFER_QUESTION",
    "latency": 2.1,
    "tokensUsed": 2134,
    "rewritingLatency": 0.48,
    "confidence": 0.94
  }
}

Next Steps


Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the ask query parameter:

GET /dev/product/overview-1/cedar.md?ask=<question>

The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.

Related Pages

Last updated January 25, 2026