Rag Scenarios And Solutions
API Rate Limit Exhaustion
Your data source integration hits rate limits, causing slow syncs, partial data ingestion, or complete sync failures.
TL;DR
Your data source integration hits rate limits, causing slow syncs, partial data ingestion, or complete sync failures.
Key Takeaways
- The Problem
- Deep Technical Analysis
- How to Solve
- Agent Instructions: Querying This Documentation
The Problem
Your data source integration hits rate limits, causing slow syncs, partial data ingestion, or complete sync failures.
Symptoms
- ❌ "429 Too Many Requests" errors
- ❌ Sync takes 10x longer than expected
- ❌ Only syncs 500 docs before stopping
- ❌ Works fine for small datasets, fails for large
- ❌ Other integrations stop working during Twig sync
Real-World Example
Zendesk API rate limit: 700 requests/minute
Your help center: 2,000 articles
Twig sync attempts:
→ Enumerate all categories: 5 requests
→ List articles in each category: 50 requests
→ Fetch full content for each article: 2,000 requests
→ Total: 2,055 requests
At 700/min max: 3 minute minimum
But:
→ Twig makes 50 concurrent requests/sec (3,000/min)
→ After 700 requests: 429 rate limit
→ Twig backs off 60 seconds
→ Resumes, hits limit again
→ Sync takes 45 minutes instead of 3
Deep Technical Analysis
Rate Limit Types and Granularity
APIs implement multiple layers of rate limiting:
Per-User Limits:
Limit: 100 requests/minute per user token
If Twig uses API token for user@company.com:
→ All Twig requests count against user@company.com's quota
→ If user also uses Zendesk app: shares same quota
→ Twig sync can block user's manual API access
Per-App Limits:
Limit: 10,000 requests/hour per OAuth app
If Twig has OAuth app ID: app_123:
→ All requests from Twig app_123 (all customers) share quota
→ 100 customers syncing simultaneously
→ Each uses 100 req/hr → 10,000 total
→ Quota exhausted, all customers affected
Per-Endpoint Limits:
Slack example:
→ conversations.list: 20 req/min (Tier 2)
→ conversations.history: 50 req/min (Tier 3)
→ users.info: 100 req/min (Tier 4)
Twig must:
→ Track quota per endpoint separately
→ Can't assume global rate limit
→ Complex quota bookkeeping
Burst vs Sustained:
Many APIs allow bursts:
Token bucket algorithm:
→ Bucket capacity: 100 tokens
→ Refill rate: 10 tokens/second
→ Each request: consumes 1 token
Burst behavior:
→ 0s: 100 tokens available
→ Make 100 requests instantly: ✓ allowed
→ 1s: 10 tokens refilled
→ Make 10 more: ✓ allowed
→ Attempt 11th: ✗ rate limited
Twig must understand:
→ Initial burst allowance
→ Sustained rate after burst
→ Bucket refill rate
Rate Limit Discovery and Headers
APIs communicate limits via HTTP headers:
Standard Headers (RFCs):
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 742
X-RateLimit-Reset: 1642204800
But variations exist:
GitHub:
→ X-RateLimit-*
Stripe:
→ Stripe-RateLimit-*
Twitter:
→ x-rate-limit-* (lowercase)
Google:
→ No headers (must infer from 429 errors)
Inconsistency:
→ Twig must handle all formats
→ Or: No headers available, must estimate
The Reset Timestamp Ambiguity:
X-RateLimit-Reset: 1642204800
Is this:
→ Unix timestamp (seconds since epoch)?
→ Milliseconds since epoch?
→ Seconds until reset (relative)?
→ ISO 8601 datetime string?
Spec says: Unix timestamp
Reality: Providers differ
Wrong interpretation:
→ Wait 1642204800 seconds (52 years!)
→ Or wait 1 second when should wait 3600
Distributed Rate Limiting
Multiple Twig servers must share quota:
Naive Approach (broken):
3 Twig servers:
→ Server A: tracks "450 requests made"
→ Server B: tracks "380 requests made"
→ Server C: tracks "410 requests made"
Total: 1,240 requests
But actual limit: 1,000 requests/min
Each server thinks it's under limit
All servers make requests
Actual total: Exceeds quota → 429 errors
Centralized Rate Limiter:
Redis-based solution:
→ All servers increment shared counter in Redis
→ INCR api:zendesk:requests:minute_12345
→ SET TTL 60 seconds
→ If counter > 1000: deny request
Pros:
+ Accurate shared state
+ Prevents over-limit requests
Cons:
- Redis becomes bottleneck (1ms latency per request)
- Redis failure blocks all API calls
- Network overhead
The Race Condition:
Two servers simultaneously:
Server A:
1. GET counter → 999
2. Check: 999 < 1000 → OK
3. Make API call
4. INCR counter → 1000
Server B (concurrent):
1. GET counter → 999 (before Server A's INCR)
2. Check: 999 < 1000 → OK
3. Make API call
4. INCR counter → 1001
Both requests approved, but limit is 1000
Result: Quota exceeded by 1
Solution: Atomic operation (INCR then check)
But: Check happens after INCR, too late to prevent
Retry and Backoff Strategies
Handling 429 responses requires exponential backoff:
Naive Retry (bad):
Request fails with 429:
→ Wait 1 second
→ Retry
→ 429 again
→ Wait 1 second
→ Retry...
→ Hammering API with retries
→ Makes rate limiting worse
Exponential Backoff (better):
Attempt 1: Immediate request → 429
→ Wait 1 second
Attempt 2: Retry → 429
→ Wait 2 seconds
Attempt 3: Retry → 429
→ Wait 4 seconds
Attempt 4: Retry → 429
→ Wait 8 seconds
Attempt 5: Retry → Success
Respects provider's recovery time
Jitter to Prevent Thundering Herd:
Without jitter:
→ 100 requests hit limit at same time
→ All wait exactly 8 seconds
→ All retry at exact same moment
→ Thundering herd, 429 again
With jitter:
wait_time = 8 + random(0, 2)
→ Request A: waits 8.3s
→ Request B: waits 9.1s
→ Request C: waits 8.7s
→ Spread out retries, smoother traffic
The Give-Up Point:
How long to retry?
Option 1: Fixed attempts (e.g., 5 retries)
→ Total time: 1 + 2 + 4 + 8 + 16 = 31 seconds
→ If still failing: give up
Option 2: Fixed duration (e.g., 5 minutes)
→ Keep retrying until 5 min elapsed
→ Then fail
Option 3: Retry-After header
→ 429 response includes: Retry-After: 120
→ Wait exactly 120 seconds
→ One retry, then give up if still failing
Best: Respect Retry-After, fallback to exponential backoff
Request Batching and Pagination
Reducing API calls through efficient patterns:
The Pagination Efficiency Problem:
List 5,000 articles:
Approach A (inefficient):
→ Page size: 10
→ 500 pages × 1 request each = 500 API calls
Approach B (efficient):
→ Page size: 100
→ 50 pages × 1 request each = 50 API calls
10x fewer requests, same data
But maximum page size limits:
Most APIs:
→ Max page size: 100 (Google Drive)
→ Max page size: 50 (Zendesk)
→ No way to request more per call
Twig must:
→ Use largest allowed page size
→ Minimize paginated requests
Batching Where Supported:
Some APIs allow batch requests:
GitHub GraphQL:
query {
repo1: repository(owner: "x", name: "y") { ... }
repo2: repository(owner: "a", name: "b") { ... }
repo3: repository(owner: "c", name: "d") { ... }
}
Single request fetches 3 repositories
→ 3x efficiency vs REST
But:
- Not all APIs support batching
- GraphQL has query complexity limits
- Still counts as multiple units toward quota
Quota Pooling and Reservation
Advanced rate limit management:
The Multi-Data-Source Problem:
Customer has:
→ Data Source A: needs 500 req/min
→ Data Source B: needs 300 req/min
→ Data Source C: needs 400 req/min
Total: 1,200 req/min
But API quota: 1,000 req/min
All three sources can't sync simultaneously
Must coordinate:
→ Source A: 500 req/min
→ Source B: 300 req/min
→ Source C: 200 req/min (throttled)
→ After A finishes: C gets remaining quota
Priority-Based Scheduling:
Assign priorities:
1. Real-time updates (webhook-triggered): High priority
2. Incremental syncs (hourly): Medium priority
3. Full syncs (daily): Low priority
Quota allocation:
→ Reserve 30% for high priority
→ 50% for medium
→ 20% for low
If high priority needs more:
→ Borrow from medium/low
→ Ensure critical updates never blocked
The Cold Start Problem:
New customer connects Zendesk:
→ 10,000 articles to sync (initial full sync)
→ At 700 req/min: 15 minute sync time
But:
→ Existing customers' incremental syncs running
→ Using 600/700 req/min quota
→ New customer gets 100 req/min remaining
→ Sync takes 100 minutes instead of 15
Solution:
→ Queue new customer's full sync for off-peak hours
→ Or: Throttle existing customers temporarily
→ Tradeoff: New customer waits longer
Cost vs Speed Tradeoffs
Respecting rate limits reduces sync speed:
The Impatience Problem:
User connects data source at 10:00 AM
Expects: Data ready in 5 minutes
Reality: Rate limits mean 45 minute sync
User perception:
→ "This is so slow!"
→ "Is it stuck?"
→ Cancels sync
→ Incomplete knowledge base
Paid Tiers for Higher Limits:
Many APIs offer:
→ Free tier: 100 req/min
→ Pro tier: 1,000 req/min
→ Enterprise tier: 10,000 req/min
Twig options:
1. Absorb cost: Upgrade all customers to Pro
→ Expensive at scale
2. Pass through cost: Customer pays for their API tier
→ Better user experience if they upgrade
→ But complexity: billing, provisioning
3. Optimize within free tier
→ Slower but free
→ Current approach for most
How to Solve
Implement exponential backoff with jitter + respect Retry-After headers + use Redis for distributed rate limiting + batch requests where possible + use largest page sizes + reserve quota for high-priority updates. See Rate Limit Handling.
Agent Instructions: Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.
Perform an HTTP GET request on the current page URL with the ask query parameter:
GET /dev/rag-scenarios-and-solutions/data-integration/rate-limit-exhaustion.md?ask=<question>
The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.
Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Related Pages
Last updated January 26, 2026


