Getting Started
Core Concepts & Terminology
Technical reference for RAG terminology and Twig implementation details
TL;DR
Technical reference for RAG terminology and Twig implementation details. RAG injects retrieved context into the LLM prompt before generation.
Key Takeaways
- RAG (Retrieval-Augmented Generation)
- Agent
- Data Source
- Vector Embedding
- Semantic Search
- Chunking
Technical reference for RAG terminology and Twig implementation details.
RAG (Retrieval-Augmented Generation)
RAG injects retrieved context into the LLM prompt before generation.
RAG Flow in Twig
- Query embedding: Convert user query to 1536-dim vector (OpenAI ada-002)
- Vector search: Query Pinecone index, return top-k chunks by cosine similarity (threshold: 0.7)
- Context injection: Insert chunks into LLM prompt between system prompt and user query
- LLM generation: OpenAI API generates response based on injected context
- Citation extraction: Parse response, match claims to source chunks by span overlap
Observable behavior: Responses cite specific documents. If retrieval fails (no chunks above threshold), agent responds "I don't have information about that".
Agent
An agent is a configuration record with these fields:
- agent_id: Unique identifier (format:
agent_abc123) - name: Display name
- system_prompt: Instructions prepended to every query
- data_source_ids: Array of data sources to query
- rag_strategy:
redwood|cedar|cypress - model:
gpt-4|gpt-3.5-turbo|claude-3-sonnet - temperature: Float 0-2 (default: 0.7)
- max_tokens: Integer (default: 500)
Storage: PostgreSQL agents table
Observable behavior: Different agents querying same data sources return different responses based on system prompt and strategy.
Data Source
A data source is an ingestion job configuration:
- source_type:
file|website|confluence|slack|google_drive| etc. - connection_params: OAuth tokens, API keys, URLs
- sync_schedule:
hourly|daily|weekly|manual - filters: Include/exclude rules (e.g., file extensions, URL patterns)
Processing stages:
- Fetch (download documents)
- Parse (extract text)
- Chunk (split into 512-token segments with 50-token overlap)
- Embed (OpenAI ada-002)
- Index (upload vectors to Pinecone)
Status values: pending | processing | active | failed
Observable behavior: Data → [Source Name] → shows chunk count (e.g., "1,234 chunks indexed"). Last sync timestamp displayed.
Vector Embedding
A vector embedding is a 1536-dimensional float array representing text semantics.
Model: OpenAI text-embedding-ada-002
API: POST https://api.openai.com/v1/embeddings
Cost: $0.0001 per 1K tokens
Example:
Input: "reset password"
Output: [0.0123, -0.4567, 0.7890, ..., 0.2345] (1536 floats)
Distance metric: Cosine similarity (-1 to 1, higher = more similar)
Observable behavior:
- "reset password" and "change password" have cosine similarity ~0.85
- "reset password" and "pizza delivery" have cosine similarity ~0.10
Semantic Search
Vector search using cosine similarity between query embedding and chunk embeddings.
Algorithm:
- Embed query:
q_vec = embed("reset my password") - Query Pinecone:
results = index.query(q_vec, top_k=10, filter={org_id: "org_123"}) - Pinecone returns chunks with similarity scores (0.0-1.0)
- Filter chunks with score < 0.7 (configurable threshold)
Retrieval behavior:
- Query "How to reset password?" retrieves chunks containing "password recovery", "reset credentials", "forgot password"
- Does NOT require exact keyword match
- Fails if no chunks score above threshold
Chunking
Document splitting strategy:
- Chunk size: 512 tokens (default, configurable: 256-2048)
- Overlap: 50 tokens (default, configurable: 0-200)
- Splitting: Recursive by paragraph → sentence → token
Example:
Document (1500 tokens):
├─ Chunk 1: tokens 0-512
├─ Chunk 2: tokens 462-974 (50 token overlap)
└─ Chunk 3: tokens 924-1500
Rationale:
- Smaller chunks → more precise retrieval, but less context per chunk
- Larger chunks → more context, but lower precision
- Overlap → prevents concepts split across boundaries
Observable behavior: Data source shows "N chunks indexed" (e.g., 100-page PDF → ~400-600 chunks)
Context Window
Maximum tokens the LLM processes in one request:
- GPT-3.5-turbo: 16,384 tokens (~12,000 words)
- GPT-4: 8,192 tokens (standard), 32,768 (extended), 128,000 (turbo)
- Claude 3.5 Sonnet: 200,000 tokens
Token allocation (typical query):
System prompt: 200 tokens
Retrieved chunks (10 chunks × 512 tokens): 5,120 tokens
Conversation history: 500 tokens
User query: 50 tokens
Reserved for response: 500 tokens
---
Total: 6,370 tokens (fits in GPT-4 8K)
Observable failure: If total exceeds limit, API returns error:
{"error": "context_length_exceeded", "max": 8192, "actual": 9500}
Token
Text unit for LLM processing:
- 1 token ≈ 4 characters (English)
- 1 token ≈ 0.75 words (English)
Examples:
- "Hello world!" = 3 tokens
- "Retrieval-Augmented Generation" = 6 tokens
- "https://example.com" = 5 tokens
Pricing (OpenAI):
- GPT-4: $0.03/1K input tokens, $0.06/1K output tokens
- GPT-3.5-turbo: $0.001/1K input tokens, $0.002/1K output tokens
Observable behavior: Query cost displayed in Analytics (e.g., "$0.0042 per query")
Temperature
Controls randomness in LLM sampling:
- 0.0: Deterministic (always picks highest probability token)
- 0.7: Balanced (default)
- 1.0: High variability
- 2.0: Maximum randomness
Observable behavior:
- Temperature 0.0: Same query returns identical response every time
- Temperature 1.0: Same query returns different phrasing each time (content consistent)
Use cases:
- 0.0-0.3: Factual Q&A, documentation lookup
- 0.7-1.0: Creative writing, brainstorming
top_k
Number of chunks retrieved from vector DB:
- Redwood: top_k = 5-10
- Cedar: top_k = 10
- Cypress: top_k = 50 (pre-rerank) → 10 (post-rerank)
Configurable: Agent configuration → Advanced Settings → Top K (range: 1-100)
Tradeoff:
- Higher top_k → More context, slower retrieval, higher cost
- Lower top_k → Faster, cheaper, but may miss relevant chunks
Observable behavior: Sources panel shows exactly top_k chunks (or fewer if threshold filters some out)
Reranking
Two-stage retrieval: fast vector search → precise cross-encoder scoring.
Implementation (Cypress only):
- Vector search: Retrieve top_k=50 chunks (cosine similarity)
- Reranker API: Score all 50 chunks using
bge-reranker-v2-m3(cross-encoder) - Select top 10 by reranker score
- Send to LLM
Reranker model: BAAI/bge-reranker-v2-m3
Latency added: ~200-500ms for 50 chunks
Observable behavior:
- Cypress "Sources Used" panel shows higher precision than Redwood
- Chunks may have different order than pure vector search would produce
RAG Strategies
Redwood (Standard)
Algorithm:
- Embed user query
- Vector search (top_k=10)
- Filter by threshold (0.7)
- Inject into LLM prompt
Latency: 1-2s
Accuracy: 72% (internal eval)
Cost: ~$0.002 per query
Use when: Questions are clear, single-hop retrieval sufficient
Cedar (Context-Aware)
Algorithm:
- LLM rewrites query using conversation history
- Embed rewritten query
- Vector search (top_k=10)
- Filter by threshold (0.7)
- Inject into LLM prompt
Latency: 2-3s
Accuracy: 78% (internal eval)
Cost: ~$0.003 per query (extra LLM call for rewrite)
Use when: Multi-turn conversations, follow-up questions ("What about the other option?")
Observable behavior: Logs show "Rewritten query: [...]" in debug panel
Cypress (Advanced)
Algorithm:
- LLM generates 3 query variations
- Embed all 3 queries
- Vector search each (top_k=50 total, deduplicated)
- Rerank with cross-encoder → top 10
- Inject into LLM prompt
Latency: 3-5s
Accuracy: 85% (internal eval)
Cost: ~$0.006 per query
Use when: High accuracy required, complex queries, multi-document synthesis
Observable behavior: Sources panel shows "Retrieved via multi-query expansion"
Agentic Workflow
Multi-step reasoning with tool calling (requires Cypress strategy).
Tools available:
search_knowledge_base(query): Recursive retrievalcalculate(expression): Math evaluationcall_api(endpoint, params): Custom API integration
Flow:
- LLM decides if tools needed (function calling)
- Execute tool, get result
- LLM synthesizes final response
Latency: +1-3s per tool call
Enable: Agent Configuration → Advanced → Agentic Mode (toggle)
Observable behavior: Response shows "Used tools: search_knowledge_base, calculate" in debug panel
Session Memory
Conversation history stored per session.
Storage:
- Redis cache (key:
session:{session_id}:history) - Max 10 turns or 4K tokens (whichever reached first)
- Retention: 30 days
Behavior:
- Follow-up questions use previous context (e.g., "What about X?" → knows what "what" refers to)
- Session ID in API request:
{"session_id": "sess_abc123", "query": "..."} - New session: Omit session_id, new one generated
Observable failure: If session expires (>30 days), follow-ups fail. Error: "Session not found"
Interaction
A database record for each query-response pair.
Schema:
interactions (
id UUID PRIMARY KEY,
agent_id UUID,
session_id VARCHAR,
query TEXT,
response TEXT,
chunks_used JSONB,
latency_ms INT,
cost_usd DECIMAL,
feedback ENUM('positive', 'negative', NULL),
created_at TIMESTAMP
)
Observable behavior: Inbox shows all interactions, filterable by agent/date/feedback
Citation
Source reference in response.
Format:
Answer text [1] more text [2].
Sources:
[1] Document Name, page 5 (chunk_id: chk_abc123)
[2] Another Doc, section 3 (chunk_id: chk_def456)
Extraction: Regex parsing of response to match numbered citations to chunks
Link behavior: Click citation → opens source document URL (if available) or shows chunk text in modal
Observable failure: If LLM doesn't format citations correctly, they don't render as links (appears as plain text)
Knowledge Base (KB)
Human-curated article collection (separate from data sources).
Storage: PostgreSQL kb_articles table
Fields: title, content, tags, version, author, status (draft/published)
Generation flow:
- Inbox → Select interaction → Click "Generate KB Article"
- AI drafts article from interaction
- Human edits, approves
- Published to KB
Important: KB articles are NOT indexed for retrieval. They are for human reference only.
Observable behavior: KB section shows article list. Editing creates new version (version history tracked).
Inbox
Review queue for agent interactions.
Location: Review → Inbox
Filters:
- Agent
- Date range
- Feedback status (positive/negative/no feedback)
- Keyword search
Actions per interaction:
- View full query/response/sources
- Mark accurate/inaccurate (thumbs up/down)
- Edit response (creates KB article draft)
- Flag for review
Observable behavior: Counter shows unreviewed interactions (e.g., "245 pending")
Playground
Agent testing interface.
Location: Playground (top nav)
Features:
- Agent selector (dropdown)
- Query input
- Response display with citations
- Sources panel (right sidebar): shows chunks retrieved, similarity scores
- Debug panel (expandable): shows latency breakdown, token counts, cost
Use cases:
- Test before API integration
- Compare RAG strategies (switch in agent config, re-run same query)
- Debug retrieval (check which chunks returned)
Observable behavior: All queries logged to Inbox with tag "playground"
Evaluation (Evals)
Automated testing framework.
Location: Evaluation → Test Sets
Test set structure:
{
"name": "Product FAQ Eval",
"questions": [
{"query": "What is pricing?", "expected": "Starts at $99/mo"},
{"query": "Free trial?", "expected": "14 days"}
]
}
Metrics computed:
- Accuracy: LLM judges if response matches expected (0-1)
- Latency: p50, p95, p99 (milliseconds)
- Citation rate: % responses with sources
- Cost: Total USD for test set
Run: Test Sets → [Your Set] → Select agent → Run Eval
Observable behavior: Results table shows pass/fail per question, aggregate metrics. Historical runs tracked for regression detection.
Private Data Mode
Agent configuration that blocks external LLM knowledge.
Enable: Agent Configuration → Privacy → Private Data Mode (toggle)
Behavior:
- System prompt includes: "ONLY use information from provided sources. Never use your training data."
- LLM still has base knowledge, but instructed to ignore it
Observable failure: If no relevant chunks retrieved, agent responds "I don't have information about that" (won't hallucinate from training data)
Limitations: Not a technical constraint, relies on LLM following instructions. For 100% guarantee, use fine-tuned model.
Public Agent
Agent shared in Agent Hub (marketplace).
Enable: Agent → Settings → Publish to Hub
Visibility: Other organizations can:
- View agent name, description, example queries
- Install (creates copy in their org)
- Customize copy (can't modify original)
Data isolation: Data sources NOT shared, only agent configuration (prompts, RAG strategy, model)
Observable behavior: Agent Hub shows install count, ratings (1-5 stars)
Tier-Based Retrieval
Data source prioritization (Cypress only).
Configuration: Data Sources → [Source] → Tier (dropdown: 1 or 2)
Retrieval:
- Search tier 1 sources (top_k=30)
- Search tier 2 sources (top_k=20)
- Combine results (50 total)
- Rerank (top 10 final)
Use case: Prioritize official docs over community forums, but still include forums if official docs don't have answer
Observable behavior: Sources panel shows tier badge (T1 or T2) per chunk
API Key
Authentication credential for REST API.
Generate: Settings → API Keys → Generate New Key
Format: twigsk_live_abc123def456... (prefix indicates env: twigsk_live_ or twigsk_test_)
Usage:
curl -H "Authorization: Bearer twigsk_live_abc123..." \
https://api.twig.so/v1/query
Permissions: Read (view data), Write (modify agents/data sources), Execute (run queries), Admin (all)
Rate limit: 100 req/min (Execute scope), 10 req/min (Write scope)
Rotation: Generate new key, update apps, delete old key (zero downtime)
Observable failure: Invalid key returns 401 Unauthorized with JSON: {"error": "Invalid API key"}
Next Steps
Authentication - API key management and SSO setup
Agent Configuration - Detailed agent settings
RAG Strategy Selection - When to use Redwood/Cedar/Cypress
Agent Instructions: Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.
Perform an HTTP GET request on the current page URL with the ask query parameter:
GET /dev/getting-started/core-concepts.md?ask=<question>
The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.
Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Related Pages
Last updated January 26, 2026


