Product
Cedar - Context-Aware
Cedar is a balanced RAG strategy that enhances retrieval accuracy by rewriting user queries based on conversation context and memory before searching the vector database.
TL;DR
Cedar is a balanced RAG strategy that enhances retrieval accuracy by rewriting user queries based on conversation context and memory before searching the vector database.
Key Takeaways
- Analy
Cedar is a balanced RAG strategy that enhances retrieval accuracy by rewriting user queries based on conversation context and memory before searching the vector database.
Overview
Cedar improves upon standard RAG by understanding conversation context:
- Analyzes conversation history (memory)
- Rewrites ambiguous queries to be more explicit
- Maintains context across multiple turns
- Better handles follow-up questions
Performance: ~2-3 seconds Ideal for: Conversational queries, follow-up questions, ambiguous phrasing
How Cedar Works
Processing Flow
User Query: "What about pricing?"
↓
[1] Analyze Conversation Memory
→ Previous: "Tell me about the Enterprise plan"
↓
[2] Rewrite Query with Context
→ Rewritten: "What is the pricing for the Enterprise plan?"
↓
[3] Embed Rewritten Query → Vector [0.15, -0.43, 0.76, ...]
↓
[4] Vector Search (Pinecone/TigrisDB)
↓
[5] Retrieve Top 5-10 Documents
↓
[6] Build Context from Documents
↓
[7] LLM Completion (with context + original query)
↓
Response: "The Enterprise plan costs $299/month..."
Technical Details
Step 1: Memory Analysis
- Reviews last 3-5 conversation turns
- Identifies context: entities, topics, intents
- Determines if query needs clarification
Step 2: Query Rewriting
- Model:
gpt-4o-mini(fast, cost-effective) - Prompt: "Rewrite this query based on conversation history"
- Output: More explicit, self-contained query
Example Rewrites:
Original: "How do I do that?"
Context: Previous discussion about password reset
Rewritten: "How do I reset my password?"
Original: "What about the other option?"
Context: Comparing Pro vs Enterprise plans
Rewritten: "What features are included in the Pro plan?"
Original: "Is it available?"
Context: Asking about SSO feature
Rewritten: "Is SSO (Single Sign-On) available in the platform?"
Step 3-7: Standard RAG Flow Same as Redwood after query rewriting
Performance Characteristics
Latency Breakdown
Memory Analysis: ~100ms
Query Rewriting: ~500ms ← Additional cost vs Redwood
Query Embedding: ~100ms
Vector Search: ~200ms
Context Building: ~50ms
LLM Completion: ~800ms
Response Streaming: ~200ms
────────────────────────────
Total: ~2.0s
Token Usage
| Component | Tokens | Notes |
|---|---|---|
| Memory Context | 100-300 | Last 3-5 turns |
| Rewriting Prompt | 50-100 | Instructions + query |
| Rewritten Query | 10-30 | Output of rewriting |
| System Prompt | 150-300 | Agent instructions |
| Retrieved Context | 800-1500 | Top 5-10 documents |
| User Query | 10-50 | Original question |
| Response | 150-400 | Generated answer |
| Total | ~1,800-2,500 | Per request |
Cost Implications
Per 1,000 Requests (GPT-3.5-turbo):
- Embedding: ~$0.01
- Query Rewriting: ~$0.10 ← Additional cost
- LLM Completion: ~$0.40
- Vector Search: ~$0.05
- Total: ~$0.56 (+55% vs Redwood)
When to Use Cedar
✅ Ideal Use Cases
1. Conversational Support Bots
User: "What are your pricing plans?"
Agent: "We have Free, Pro, and Enterprise..."
User: "What's included in the middle one?" ✅
→ Cedar rewrites: "What's included in the Pro plan?"
2. Multi-Turn Consultations
User: "I need help with email integration"
Agent: "We support Gmail and Outlook..."
User: "How do I set up the second one?" ✅
→ Cedar rewrites: "How do I set up Outlook integration?"
3. Contextual Knowledge Queries
User: "Tell me about data sources"
Agent: "We support Google Drive, Confluence..."
User: "How often does it sync?" ✅
→ Cedar rewrites: "How often do data sources sync?"
4. Ambiguous Questions
User: "What's the difference?" ✅
→ Context: Comparing RAG strategies
→ Cedar rewrites: "What's the difference between Redwood and Cedar RAG strategies?"
❌ Not Ideal For
1. Clear, Self-Contained Questions
User: "What is your refund policy?" ❌
→ No rewriting needed, Redwood is faster
→ Cedar adds unnecessary latency
2. First-Time Queries (No Context)
User: "Hello, what can you help with?" ❌
→ No previous conversation to leverage
→ Redwood sufficient
3. Cost-Sensitive High-Volume
10,000+ queries/day with tight budget ❌
→ 55% higher cost than Redwood
→ Consider Redwood for majority, Cedar for subset
Configuration
Agent Settings
When using Cedar strategy:
{
"strategyCode": "INFER_QUESTION",
"topK": 5-10, // Number of documents
"temperature": 0.7, // LLM creativity
"maxTokens": 500, // Response length
"model": "gpt-4o", // Or gpt-3.5-turbo
"memoryTurns": 3, // Conversation turns to remember
"rewritingModel": "gpt-4o-mini" // Fast model for rewriting
}
Optimization Tips
1. Tune Memory Length
Too few (1-2): Misses context
Sweet spot (3-5): Good balance
Too many (10+): Noise, slower
2. Rewriting Prompt Quality
Good Prompt:
"Rewrite the query to be self-contained based on conversation history.
Focus on entities, topics, and intents mentioned earlier."
Bad Prompt:
"Make query better" ← Too vague
3. Hybrid Approach
# Use Cedar only when needed
if has_conversation_history and query_is_ambiguous:
use_cedar()
else:
use_redwood() # Faster for clear questions
Comparison with Other Strategies
vs. Redwood (Standard)
| Metric | Redwood | Cedar | Advantage |
|---|---|---|---|
| Speed | ~1.2s | ~2.0s | Redwood |
| Cost | $0.36/1k | $0.56/1k | Redwood |
| Conversational | ❌ Poor | ✅ Excellent | Cedar |
| Follow-ups | ❌ Poor | ✅ Excellent | Cedar |
| Clear queries | ✅ Perfect | ✅ Good | Redwood |
| Complexity | Low | Medium | Redwood |
When to Switch:
- Redwood → Cedar: When > 30% of queries are follow-ups
- Cedar → Redwood: When speed is critical and questions are clear
vs. Cypress (Advanced)
| Metric | Cedar | Cypress | Advantage |
|---|---|---|---|
| Speed | ~2.0s | ~3.5s | Cedar |
| Cost | $0.56/1k | $0.90/1k | Cedar |
| Accuracy | Good | Excellent | Cypress |
| Reranking | ❌ No | ✅ Yes | Cypress |
| Query Expansion | ❌ No | ✅ Yes | Cypress |
| Tier-Based | ❌ No | ✅ Yes | Cypress |
When to Switch:
- Cedar → Cypress: When accuracy is more important than speed/cost
- Cypress → Cedar: When speed matters and accuracy is good enough
Real-World Performance
Case Study: E-commerce Customer Support
Setup:
- 2,000 product FAQs
- Average conversation: 3-4 turns
- 50,000 queries/month
- 70% are follow-up questions
Redwood Results (Before):
- Average latency: 1.1s ✅
- User satisfaction: 3.2/5 ❌
- Common complaint: "Doesn't understand my follow-up"
- Accuracy: 72%
Cedar Results (After):
- Average latency: 2.1s (slower but acceptable)
- User satisfaction: 4.4/5 ✅
- Accuracy: 89% ✅
- ROI: +37% fewer support escalations
Case Study: SaaS Documentation Bot
Setup:
- 5,000 documentation pages
- Technical audience
- Average query: "How to configure X"
- Follow-ups: "What about Y setting?"
Cedar Performance:
- Latency: 1.9s (p95)
- Follow-up handling: 94% success rate
- Cost: $0.53/1k requests
- User feedback: "Finally understands context!"
Key Insight: Cedar excels when users naturally have multi-turn conversations, even if each individual query takes slightly longer.
Advanced Features
Memory Summarization
When conversation gets long, Cedar automatically summarizes:
Turns 1-10: Full context (tokens: 2000)
Turns 11+: Summarized (tokens: 300)
Summary Example:
"User is asking about Enterprise plan features,
specifically interested in SSO, API access, and
white-labeling. Budget conscious."
Configuration:
{
"memoryTurns": 5, // Remember last 5 turns
"summarizeAfter": 10, // Summarize after 10 turns
"maxMemoryTokens": 500 // Max memory context
}
Contextual Entity Tracking
Cedar tracks entities across turns:
Turn 1: "Tell me about the Enterprise plan"
→ Tracked: product=Enterprise plan
Turn 2: "What's the pricing?"
→ Rewritten: "What is the pricing for the Enterprise plan?"
→ Uses tracked entity
Turn 3: "How about for 100 users?"
→ Rewritten: "What is the pricing for the Enterprise plan for 100 users?"
→ Compounds context
Intent Preservation
Cedar maintains user intent:
User: "I need to integrate with Google"
→ Intent: Integration inquiry
→ Entity: Google
User: "How long does setup take?"
→ Rewritten: "How long does Google integration setup take?"
→ Preserves integration intent
Monitoring Cedar
Key Metrics to Track
1. Rewriting Quality
Metric: % queries actually rewritten
Target: 40-60% (not everything needs rewriting)
Alert: > 80% (too aggressive) or < 20% (not working)
2. Context Relevance
Metric: % rewritten queries with improved results
Target: > 85%
Measure: Compare retrieval results before/after rewriting
3. Latency Impact
Metric: Average rewriting latency
Target: < 600ms
Alert: > 1s (investigate)
4. User Satisfaction
Metric: Thumbs up/down on responses
Compare: Sessions with > 3 turns vs single turn
Target: Multi-turn satisfaction > 85%
Common Issues
Rewriting Too Aggressively:
Symptom: Every query is rewritten
Cause: Memory window too large
Fix: Reduce memoryTurns to 3-4
Missing Context:
Symptom: Rewritten queries still ambiguous
Cause: Important entities not tracked
Fix: Improve entity extraction prompt
Slow Performance:
Symptom: > 3s response time
Cause: Rewriting model too large
Fix: Switch to gpt-4o-mini from gpt-4o
Best Practices
1. Memory Management
✅ Keep memory focused on relevant context ✅ Clear memory for new topics ✅ Summarize long conversations ❌ Don't include entire conversation verbatim
2. Rewriting Prompts
✅ Be specific about what to preserve ✅ Include examples of good rewrites ✅ Instruct to add context, not change meaning ❌ Don't make rewrites too verbose
3. Testing
✅ Test with real conversation flows ✅ Compare results: original vs rewritten query ✅ A/B test Cedar vs Redwood ❌ Don't test only single-turn queries
4. Monitoring
✅ Track rewriting effectiveness ✅ Monitor latency impact ✅ Measure user satisfaction ✅ Review failed rewrites ❌ Don't assume it's working without data
Migration Guide
From Redwood to Cedar
Step 1: Enable Cedar
agent.strategyCode = "INFER_QUESTION"
agent.save()
Step 2: Monitor Initial Performance
- First 24 hours: Watch latency
- Compare accuracy metrics
- Check user feedback
Step 3: Tune Configuration
// Start conservative
memoryTurns: 3
maxMemoryTokens: 300
// Adjust based on performance
if (accuracy < target) memoryTurns++
if (latency > 2.5s) memoryTurns--
Step 4: Gradual Rollout
Day 1-7: 10% traffic
Day 8-14: 50% traffic
Day 15+: 100% traffic (if metrics good)
From Cedar to Cypress
When to upgrade:
- Accuracy is critical
- Budget allows for higher costs
- Willing to accept 3-4s latency
- Need tier-based source control
Code Examples
Using Cedar via API
curl -X POST https://api.twig.so/api/chat \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "What about pricing?",
"agentId": "agent-123",
"strategyCode": "INFER_QUESTION",
"sessionId": "session-456",
"memory": [
{
"role": "user",
"content": "Tell me about the Enterprise plan"
},
{
"role": "assistant",
"content": "The Enterprise plan includes..."
}
]
}'
Response Format
{
"response": "The Enterprise plan costs $299/month for up to 100 users...",
"rewrittenQuery": "What is the pricing for the Enterprise plan?",
"sources": [
{
"title": "Enterprise Pricing",
"url": "https://example.com/pricing",
"relevance": 0.96
}
],
"metadata": {
"strategy": "INFER_QUESTION",
"latency": 2.1,
"tokensUsed": 2134,
"rewritingLatency": 0.48,
"confidence": 0.94
}
}
Next Steps
- Cypress Strategy - Maximum accuracy with reranking
- Redwood Strategy - Fastest option for clear queries
- Performance Tuning
- Cost Optimization
Agent Instructions: Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.
Perform an HTTP GET request on the current page URL with the ask query parameter:
GET /dev/product/overview-1/cedar.md?ask=<question>
The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.
Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Related Pages
Last updated January 25, 2026


