Product

Cypress - Advanced

Cypress is the most sophisticated RAG strategy, combining query expansion, tier-based source retrieval, and automatic reranking to deliver the highest accuracy.

TL;DR

Cypress is the most sophisticated RAG strategy, combining query expansion, tier-based source retrieval, and automatic reranking to deliver the highest accuracy.

Key Takeaways

  • Overview
  • How Cypress Works
  • Unique Features
  • Performance Characteristics
  • When to Use Cypress
  • Configuration

Cypress is the most sophisticated RAG strategy, combining query expansion, tier-based source retrieval, and automatic reranking to deliver the highest accuracy.

Overview

Cypress implements multiple optimization techniques:

  • Query Expansion: Adds synonyms and related terms for better semantic matching
  • Tier-Based Retrieval: Organizes sources by priority while treating all equally in reranking
  • Automatic Reranking: Uses cross-encoder model to improve precision
  • Higher Retrieval Volume: Fetches more candidates (50 vs 10) before filtering

Performance: ~3-4 seconds Ideal for: Complex queries requiring maximum accuracy, diverse terminology, high-stakes decisions

How Cypress Works

Processing Flow

User Query: "password reset"
     ↓
[1] Analyze Conversation Memory (if available)
     ↓
[2] Query Expansion for Vector Retrieval
    → "password reset, change password, recover account,
       reset credentials, account recovery, password recovery"
     ↓
[3] Embed Expanded Query → Vector
     ↓
[4] Tier 1 Retrieval (Official docs, high-priority)
    → Fetch top 50 results per source
     ↓
[5] Tier 2 Retrieval (Community content, secondary)
    → Fetch top 50 results per source
     ↓
[6] Combine All Results (up to 100 documents)
     ↓
[7] Automatic Reranking (bge-reranker-v2-m3)
    → Rerank to top 10 most relevant
     ↓
[8] Build Context from Top 10
     ↓
[9] Rewrite Prompt with Context (for LLM)
     ↓
[10] LLM Completion
     ↓
[11] Clean Response (remove artifacts)
     ↓
Response: "To reset your password..."

Unique Features

1. Query Expansion for Retrieval

Cypress expands queries before vector search:

Original Query:

"reset password"

Expanded Query:

"reset password, change password, recover account access, 
password reset process, account recovery, reset credentials,
forgotten password, password recovery, unlock account"

How it works:

  • Uses gpt-4o-mini for fast expansion
  • Adds synonyms and related terms
  • Includes alternative phrasings
  • Adds domain-specific terminology

Why it matters: Improves recall by matching documents that use different terminology than the user's query.

Example Impact:

User Query: "API authentication"

Without Expansion:
→ Matches: "API authentication" (exact)
→ Results: 5 documents

With Expansion:
→ "API authentication, API auth, API security, API keys,
   bearer tokens, OAuth, authentication methods"
→ Matches: All variations
→ Results: 50 documents → Reranked to best 10

2. Tier-Based Source Retrieval

Cypress organizes data sources into tiers:

Tier Structure:

Tier 1 (High Priority)
├─ Official Documentation
├─ Product Knowledge Base
├─ API Reference
└─ Admin-Approved Content

Tier 2 (Secondary)
├─ Community Content
├─ Blog Posts
├─ External References
└─ Supplementary Materials

Retrieval Process:

  1. Query Tier 1 sources (topK = 50 per source)
  2. Query Tier 2 sources (topK = 50 per source)
  3. Combine all results
  4. Rerank all together (both tiers treated equally)
  5. Top 10 most relevant selected

Important: Both tiers receive equal treatment in reranking. The tier organization is for source management, not quality weighting.

Configuration:

{
  "dataSources": [
    {
      "id": "ds-1",
      "name": "Official Docs",
      "tier": 1  // High priority
    },
    {
      "id": "ds-2", 
      "name": "Community Forums",
      "tier": 2  // Secondary
    }
  ]
}

3. Automatic Reranking

After retrieval, Cypress reranks using a sophisticated model:

Reranking Model: bge-reranker-v2-m3

  • Type: Cross-encoder (more accurate than vector similarity)
  • Input: Query + full document text
  • Output: Relevance score (0-1)
  • Method: Considers full semantic relationship

Vector Search vs Reranking:

Vector Search:
→ Fast approximate similarity
→ Based on embeddings only
→ Good recall

Reranking:
→ Slower but more accurate
→ Analyzes full query-document relationship
→ Excellent precision

Performance Impact:

Before Reranking:
Top 50 results, relevance: 0.65-0.85

After Reranking:
Top 10 results, relevance: 0.85-0.98
→ 20-30% improvement in precision

4. Higher Retrieval Volume

Cypress retrieves more candidates:

ModetopKPerSourceTotal RetrievedFinal Output
Standard50Up to 100Top 10 after rerank
AgenticAgent.topK (default 5)VariableAll reranked results

Why more is better:

  • More candidates for reranking = better final selection
  • Captures edge cases and variations
  • Reduces chance of missing relevant content

5. Response Cleaning

Cypress includes specialized response cleaning:

Removes:

  • Original prompt text (if echoed)
  • Markdown code block markers
  • Extra whitespace
  • Formatting artifacts

Example:

LLM Output (raw):
```html
User asked: "What is pricing?"
Our pricing plans are...

Cleaned Output: Our pricing plans are...


## Performance Characteristics

### Latency Breakdown

Memory Analysis: ~100ms Query Expansion: ~500ms ← Unique to Cypress Query Embedding: ~100ms Tier 1 Retrieval (50): ~300ms ← More than Redwood/Cedar Tier 2 Retrieval (50): ~300ms ← Additional tier Reranking (100→10): ~500ms ← Unique to Cypress Context Building: ~50ms Prompt Rewriting: ~400ms LLM Completion: ~800ms Response Cleaning: ~50ms ───────────────────────────────── Total: ~3.1s


### Token Usage

| Component | Tokens | Notes |
|-----------|--------|-------|
| Query Expansion | 50-100 | Expansion prompt |
| Memory Context | 100-300 | Conversation history |
| System Prompt | 150-300 | Agent instructions |
| Retrieved Context | 1200-2000 | Top 10 (higher quality) |
| User Query | 10-50 | Original question |
| Rewriting Prompt | 50-100 | Context-aware rewriting |
| Response | 150-400 | Generated answer |
| **Total** | **~2,200-3,000** | Per request |

### Cost Implications

**Per 1,000 Requests (GPT-4o):**
- Embedding: ~$0.01
- Query Expansion: ~$0.15
- Prompt Rewriting: ~$0.10
- LLM Completion: ~$0.50
- Vector Search: ~$0.08 (higher volume)
- Reranking: ~$0.06
- **Total**: ~$0.90 (+150% vs Redwood, +60% vs Cedar)

## When to Use Cypress

### ✅ Ideal Use Cases

**1. Medical/Legal Q&A (High Accuracy Critical)**

Query: "contraindications for medication X" → Cannot afford mistakes → Diverse medical terminology → Need highest precision ✅ Use Cypress


**2. Complex Technical Documentation**

Query: "configure OAuth with SAML SSO" → Multiple concepts → Various terminology (OAuth 2.0, OAuth2, etc.) → Need comprehensive results ✅ Use Cypress


**3. Multi-Domain Knowledge Bases**

Sources:

  • Official API docs (Tier 1)
  • Engineering blog (Tier 2)
  • Community tutorials (Tier 2) Query: "best practices for API rate limiting" → Benefits from tier organization → Reranking selects best across all sources ✅ Use Cypress

**4. Compliance-Sensitive Queries**

Query: "GDPR data retention requirements" → High-stakes information → Must be accurate and cited → Regulatory compliance ✅ Use Cypress


**5. Queries with Diverse Terminology**

Query: "machine learning model training" Also needs to match:

  • "ML model development"
  • "training neural networks"
  • "model fine-tuning" → Query expansion helps significantly ✅ Use Cypress

### ❌ Not Ideal For

**1. Simple FAQ Queries**

Query: "What are your business hours?" → Straightforward question → No terminology variations → Redwood is 2x faster ❌ Cypress is overkill


**2. High-Volume, Cost-Sensitive**

100,000+ queries/day, tight budget → Cypress costs 2.5x more than Redwood → Consider hybrid approach ❌ Use Cypress selectively


**3. Real-Time Requirements**

Need: < 1 second response → Cypress averages 3-4s → Too slow for real-time ❌ Use Redwood instead


## Configuration

### Agent Settings

```typescript
{
  "strategyCode": "CYPRESS",
  "topK": 5,                        // For agentic mode
  "topKPerSource": 50,              // Standard mode retrieval
  "temperature": 0.7,
  "maxTokens": 500,
  "model": "gpt-4o",
  "rewritingModel": "gpt-4o-mini",
  "enableQueryExpansion": true,
  "enableReranking": true,
  "rerankingModel": "bge-reranker-v2-m3",
  "tierBased": true
}

Data Source Tier Assignment

// Assign tiers to data sources
{
  "dataSources": [
    {
      "id": "official-docs",
      "tier": 1,              // Official documentation
      "priority": "HIGH"
    },
    {
      "id": "api-reference",
      "tier": 1,              // API docs
      "priority": "HIGH"
    },
    {
      "id": "blog-posts",
      "tier": 2,              // Supplementary content
      "priority": "MEDIUM"
    },
    {
      "id": "community-content",
      "tier": 2,              // User-generated
      "priority": "MEDIUM"
    }
  ]
}

Optimization Tips

1. Tune Retrieval Volume

Too low (20):   May miss relevant docs
Sweet spot (50): Good candidate pool
Too high (100): Slower reranking, no benefit

2. Query Expansion Quality

Good Expansion:
"Include synonyms, related terms, and common phrasings.
Focus on terminology variations used in documentation."

Bad Expansion:
"Expand the query"  ← Too vague

3. Tier Organization

Tier 1: Authoritative, frequently updated
Tier 2: Supplementary, less critical

Don't: Put everything in Tier 1
Do: Thoughtfully organize by importance

4. Reranking Threshold

Keep top 10 after reranking (default)
Consider top 5 for very high precision
Consider top 15 for comprehensive coverage

Comparison with Other Strategies

Complete Comparison Table

FeatureRedwoodCedarCypress
Performance
Average Latency1-2s2-3s3-4s
Cost per 1k (GPT-4o)$0.50$0.70$0.90
Token Usage1,5002,0002,500
Accuracy
Simple Queries⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Complex Queries⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Terminology Variations⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Features
Query Rewriting
Query Expansion
Reranking
Tier-Based
Memory Support
Best For
Clear questions✅ Best✅ Good⚠️ Overkill
Conversational❌ Poor✅ Best✅ Excellent
High accuracy❌ Adequate⚠️ Good✅ Best
High volume✅ Best⚠️ Good❌ Expensive
Complex terminology❌ Limited⚠️ Good✅ Best

Migration Paths

Redwood → Cedar: When conversational queries increase Cedar → Cypress: When accuracy becomes critical Cypress → Cedar: When cost/speed matters more than max accuracy

Real-World Performance

Case Study: Medical Knowledge Base

Setup:

  • 50,000 medical articles
  • Complex terminology (anatomical, pharmaceutical)
  • High accuracy requirements
  • Average query: "symptoms of X" or "treatment for Y"

Cedar Results (Before):

  • Latency: 2.3s ✅
  • Accuracy: 82% ⚠️
  • User complaints: "Missing alternative terms"
  • Cost: $0.68/1k

Cypress Results (After):

  • Latency: 3.6s (acceptable)
  • Accuracy: 96% ✅
  • User satisfaction: +41%
  • Cost: $0.89/1k
  • ROI: Worth it for medical accuracy

Key Insight: Query expansion captured terminology variations like:

  • "heart attack" → "myocardial infarction", "cardiac arrest", "MI"
  • "fever" → "pyrexia", "elevated temperature", "hyperthermia"

Case Study: Enterprise Software Documentation

Setup:

  • 10,000 technical docs
  • API references + guides + tutorials
  • 3 document tiers (official, community, archived)
  • Queries: Mix of simple and complex

Strategy Mix (Optimized):

Simple queries (40%):     Redwood    → 1.2s avg
Conversational (30%):     Cedar      → 2.1s avg
Complex/Critical (30%):   Cypress    → 3.5s avg

Blended Performance:
→ Average latency: 2.1s
→ Average cost: $0.65/1k
→ Overall accuracy: 91%

Key Insight: Hybrid approach optimizes for both cost and quality.

Advanced Configuration

Dynamic Strategy Selection

Automatically choose strategy based on query:

def select_strategy(query, conversation_history):
    # Simple, clear query
    if is_simple(query) and not conversation_history:
        return "REDWOOD"
    
    # Conversational or follow-up
    if conversation_history and is_ambiguous(query):
        return "CEDAR"
    
    # Complex or high-stakes
    if is_complex(query) or requires_high_accuracy(query):
        return "CYPRESS"
    
    # Default
    return "CEDAR"

Custom Reranking Parameters

{
  "reranking": {
    "model": "bge-reranker-v2-m3",
    "topN": 10,              // Results to keep
    "scoreThreshold": 0.7,   // Minimum relevance
    "truncation": "END",     // How to handle long docs
    "batchSize": 50          // Rerank in batches
  }
}

Query Expansion Control

{
  "queryExpansion": {
    "enabled": true,
    "maxTerms": 15,          // Max expansion terms
    "includeSynonyms": true,
    "includeAbbreviations": true,
    "includeDomainTerms": true,
    "expansionPrompt": "..."  // Custom prompt
  }
}

Monitoring Cypress

Key Metrics

1. Reranking Effectiveness

Metric: Improvement in relevance scores post-reranking
Target: +20% improvement
Track:  Average score before vs after

2. Query Expansion Impact

Metric: % increase in retrieved candidates
Target: 2-3x more candidates than unexpanded
Track:  Results with vs without expansion

3. Tier Distribution

Metric: % of final results from Tier 1 vs Tier 2
Target: Varies by use case
Track:  Ensure both tiers contributing

4. Overall Performance

Latency:  < 4s (p95)
Accuracy: > 90% (user ratings)
Cost:     Within budget

Common Issues

Very Slow Responses (> 5s):

Cause:   Retrieving too many documents
Fix:     Reduce topKPerSource to 30-40

Poor Reranking:

Cause:   Reranking model mismatch
Fix:     Verify model compatibility with your content type

High Costs:

Cause:   Query expansion generating too many tokens
Fix:     Limit maxTerms to 10-12

Best Practices

1. Tier Organization

✅ Tier 1: Official, verified content ✅ Tier 2: Supplementary, community content ✅ Review tier assignments quarterly ❌ Don't put everything in Tier 1

2. Query Expansion

✅ Focus on domain-specific terminology ✅ Include common abbreviations ✅ Test expansion quality ❌ Don't over-expand (diminishing returns)

3. Performance Monitoring

✅ Track reranking effectiveness ✅ Monitor latency by query type ✅ Review cost vs quality trade-offs ✅ A/B test against Cedar ❌ Don't assume it's working optimally

4. Hybrid Approach

✅ Use Cypress for critical queries ✅ Use Cedar for conversational ✅ Use Redwood for simple queries ✅ Route intelligently based on context ❌ Don't use one strategy for everything

Code Examples

Using Cypress via API

curl -X POST https://api.twig.so/api/chat \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "explain OAuth 2.0 flow",
    "agentId": "agent-123",
    "strategyCode": "CYPRESS",
    "sessionId": "session-456",
    "enableQueryExpansion": true,
    "enableReranking": true
  }'

Response Format

{
  "response": "OAuth 2.0 is an authorization framework...",
  "expandedQuery": "OAuth 2.0 flow, OAuth2 authorization, authentication flow, access token flow, authorization code grant",
  "sources": [
    {
      "title": "OAuth 2.0 Specification",
      "url": "https://example.com/oauth",
      "relevance": 0.97,
      "tier": 1,
      "rerankScore": 0.95
    }
  ],
  "metadata": {
    "strategy": "CYPRESS",
    "latency": 3.4,
    "tokensUsed": 2687,
    "retrievedCount": 87,
    "rerankedCount": 10,
    "expansionLatency": 0.52,
    "rerankingLatency": 0.48
  }
}

Next Steps


Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the ask query parameter:

GET /dev/product/overview-1/cypress.md?ask=<question>

The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.

Last updated January 25, 2026