Rag Scenarios And Solutions
Webhook Delivery Failures
Webhooks from your data sources fail to reach Twig, arrive late, or get delivered multiple times, causing inconsistent knowledge base updates.
TL;DR
Webhooks from your data sources fail to reach Twig, arrive late, or get delivered multiple times, causing inconsistent knowledge base updates.
Key Takeaways
- The Problem
- Deep Technical Analysis
- How to Solve
- Agent Instructions: Querying This Documentation
The Problem
Webhooks from your data sources fail to reach Twig, arrive late, or get delivered multiple times, causing inconsistent knowledge base updates.
Symptoms
- ❌ Real-time updates not working despite webhook setup
- ❌ "Webhook endpoint unreachable" errors
- ❌ Same document processed 5 times
- ❌ Updates arrive 20 minutes late
- ❌ Webhooks only work sometimes
Real-World Example
Confluence configured to send webhooks on page updates
User updates page at 10:00 AM
Expected: Immediate knowledge base update
Actual: No update
Webhook delivery attempts:
10:00:00 → 502 Bad Gateway
10:00:05 → Timeout (30s)
10:00:35 → Connection refused
10:05:00 → Retries exhausted, webhook dropped
Result: Update never synced, AI agent has stale data
Deep Technical Analysis
Webhook Delivery Guarantees
Webhooks are fundamentally unreliable:
HTTP Push Model:
Traditional API (pull):
→ Twig controls timing
→ Retries on failure
→ Guaranteed processing
Webhook (push):
→ Data source controls timing
→ Limited retries (5-10 attempts)
→ No guarantee of delivery
Failure Modes:
1. Network failure:
→ DNS resolution fails
→ Connection timeout
→ Packet loss
→ Result: Webhook never arrives
2. Server unavailable:
→ Twig server restarting
→ Load balancer down
→ 502/503 errors
→ Result: Source retries, then gives up
3. Request timeout:
→ Webhook delivered
→ Twig processing slow (>30s)
→ Source times out before response
→ Source thinks: delivery failed, retries
→ Result: Duplicate processing
4. Invalid response:
→ Twig returns 400 (bug in handler)
→ Source interprets as permanent failure
→ No retries
→ Result: Event lost
At-Most-Once vs At-Least-Once:
Most webhook providers guarantee:
→ At-least-once delivery
→ May deliver same event multiple times
→ Twig must handle idempotency
Alternative (rare):
→ Exactly-once delivery
→ Requires distributed transaction protocol
→ Most providers don't support this
Webhook Endpoint Requirements
Receiving webhooks requires public infrastructure:
Public Accessibility:
Webhook sender (e.g., Confluence):
→ Must reach Twig's webhook endpoint
→ Requires public IP/domain
→ HTTPS required (most sources mandate TLS)
→ Valid SSL certificate
Development challenges:
→ Local development: localhost not reachable
→ Need ngrok/localtunnel for testing
→ Firewall rules must allow inbound HTTPS
→ DNS must resolve correctly
Load Balancer Complications:
Architecture:
Internet → Load Balancer → Twig Servers (3 instances)
Webhook delivery to: https://api.twig.com/webhooks/confluence
Load balancer distributes requests:
→ 33% to server 1
→ 33% to server 2
→ 34% to server 3
If server 2 is down:
→ Load balancer may route webhook there
→ Connection fails
→ Source retries
→ May hit different server (server 1)
→ Successful delivery
But:
→ Source logs show: 1 failure, 1 success
→ Looks like duplicate delivery
→ Idempotency needed
Signature Verification and Security
Webhooks must verify authenticity to prevent attacks:
Unsigned Webhooks (insecure):
POST /webhooks/confluence
Body: { "page_id": 123, "action": "updated" }
Problem:
→ Anyone can POST this endpoint
→ Attacker sends fake webhook
→ Twig processes it as real
→ Malicious data injected into knowledge base
HMAC Signature Verification:
Confluence webhook:
Header: X-Hub-Signature: sha256=abc123...
Body: { "page_id": 123, "action": "updated" }
Verification:
1. Twig looks up secret for this data source
2. Computes: HMAC-SHA256(secret, request_body)
3. Compares with X-Hub-Signature
4. If match: authentic
5. If mismatch: reject (403 Forbidden)
Challenges:
→ Different sources use different header names
→ Different HMAC algorithms (SHA1, SHA256, SHA512)
→ Some use URL encoding, some don't
→ Secret rotation: old webhooks use old secret
→ Clock skew: timestamp-based signatures expire
Timestamp Validation:
Webhook header:
X-Slack-Request-Timestamp: 1642204800
Verification:
1. Extract timestamp
2. Current time: now()
3. Difference: abs(now - timestamp)
4. If difference > 5 minutes: reject (replay attack)
But:
→ Twig server clock off by 10 minutes
→ All webhooks rejected as "too old"
→ Need NTP sync
Idempotency and Duplicate Handling
Webhooks may be delivered multiple times:
The Duplicate Problem:
Scenario:
10:00:00 → Webhook delivered, Twig processing
10:00:25 → Processing not complete yet
10:00:30 → Source times out, assumes failure
10:00:35 → Source retries, sends same webhook again
10:00:40 → Twig finishes first processing, returns 200
10:00:42 → Twig processes second (duplicate) webhook
Result:
→ Same document processed twice
→ Duplicate embeddings in vector DB
→ Wasted compute and storage
Idempotency Key:
Best practice: webhook includes unique ID
{
"event_id": "evt_abc123", ← Idempotency key
"page_id": 456,
"action": "updated"
}
Twig handler:
1. Check if event_id already processed
2. If yes: return 200 (idempotent)
3. If no: process and record event_id
Requires:
→ Persistent storage of processed event IDs
→ TTL/expiry (can't store forever)
→ Typically keep for 24-48 hours
Stateless Idempotency:
Alternative: use content hash
1. Hash webhook body: hash(JSON.stringify(body))
2. Check if hash processed recently
3. If yes: skip (probable duplicate)
4. If no: process
Pros:
+ No need for explicit event_id field
+ Works with any webhook source
Cons:
- False negatives (two different updates with same hash)
- Hash collisions (rare but possible)
- Doesn't work if webhook includes timestamp
Ordering and Sequencing
Webhooks may arrive out of order:
The Race Condition:
User actions:
10:00:00 → Create page "Guide"
10:00:05 → Update page "Guide" (add content)
10:00:10 → Update page "Guide" (fix typo)
Webhook delivery:
10:00:01 → Created webhook sent
10:00:06 → Updated webhook sent
10:00:11 → Updated webhook sent
But network delays:
10:00:15 → "Updated" arrives first
10:00:18 → "Created" arrives second
10:00:19 → "Updated" arrives third
Twig processing:
→ Update page (but it doesn't exist yet!)
→ Create page (replaces update with empty version)
→ Update page (now correct, but lost first update)
Final state: Incomplete content
Sequence Number Solution:
Webhook payload:
{
"event_id": "evt_abc123",
"sequence": 456, ← Global counter
"page_id": 789,
"action": "updated"
}
Twig handler:
1. Check last_processed_sequence for page 789
2. If incoming sequence <= last_processed: skip (old event)
3. If incoming sequence > last_processed + 1: gap detected
→ Store for later
→ Wait for missing sequences
4. If sequence == last_processed + 1: process normally
Complexity:
→ Must track sequence per document
→ Gap handling (what if missing event never arrives?)
→ Not all webhook sources provide sequence numbers
Retry Backoff and Thundering Herd
Transient failures trigger retries:
Exponential Backoff:
Webhook delivery attempts:
1. Immediate: POST /webhook
2. 5s later: POST /webhook (retry 1)
3. 25s later: POST /webhook (retry 2)
4. 125s later: POST /webhook (retry 3)
5. Give up
But:
→ If Twig is down for 10 minutes
→ All webhooks from 10-minute window retry simultaneously
→ When Twig comes back: 600 webhooks hit at once
→ Thundering herd problem
→ Twig overloaded, fails again
Rate Limiting on Receiver:
Twig must limit incoming webhook rate:
→ 100 webhooks/second max
→ If exceeded: return 429 (Too Many Requests)
→ Retry-After header: 60 seconds
But source may not respect Retry-After:
→ Retry immediately anyway
→ More 429 errors
→ Eventually give up
→ Events lost
Long Processing and Timeout
Webhook processing must be fast:
The Timeout Problem:
Webhook arrives: page_updated event
Twig handler:
1. Verify signature (50ms)
2. Validate payload (10ms)
3. Fetch full page content from Confluence API (2s)
4. Chunk content (500ms)
5. Generate embeddings (5s)
6. Store in vector DB (1s)
Total: 8.56 seconds
But:
→ Webhook source timeout: 5 seconds
→ Source sees no response after 5s
→ Source marks as failed, retries
→ Twig finishes at 8.56s, returns 200 (to closed connection)
→ Result: Duplicate processing on retry
Async Processing Pattern:
Better approach:
1. Webhook arrives
2. Verify signature (50ms)
3. Validate payload (10ms)
4. Add to processing queue (Redis, SQS)
5. Return 202 Accepted immediately (total: 60ms)
Background worker:
→ Dequeue event
→ Process fully (fetch, chunk, embed)
→ No timeout risk
Pros:
+ Fast webhook response
+ No duplicate processing
+ Can handle bursts
Cons:
- More complex architecture
- Need queue infrastructure
- Harder debugging (async)
Failed Webhook Recovery
When webhooks are lost, fallback is needed:
Hybrid Sync Strategy:
Primary: Webhooks for real-time updates
Fallback: Periodic polling for missed events
Example:
→ Webhooks every second (real-time)
→ Full sync every 6 hours (catch missed events)
But:
→ Polling finds changes already processed by webhooks
→ Duplicate detection needed
→ Wasted API calls
→ Complex coordination logic
Dead Letter Queue:
Failed webhook handling:
1. Webhook processing fails (bug, bad data)
2. Retry 3 times
3. Still failing? Move to DLQ (Dead Letter Queue)
4. Alert engineers
5. Manual investigation
Prevents:
→ Blocking queue with poison messages
→ Infinite retry loops
→ But requires manual intervention
How to Solve
Implement HMAC signature verification + use async processing with queue + store idempotency keys + add periodic polling fallback + handle retry storms with rate limiting. See Webhook Configuration.
Agent Instructions: Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.
Perform an HTTP GET request on the current page URL with the ask query parameter:
GET /dev/rag-scenarios-and-solutions/data-integration/webhook-failures.md?ask=<question>
The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.
Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Related Pages
Last updated January 26, 2026


