Rag Scenarios And Solutions

API Rate Limit Exhaustion

Your data source integration hits rate limits, causing slow syncs, partial data ingestion, or complete sync failures.

TL;DR

Your data source integration hits rate limits, causing slow syncs, partial data ingestion, or complete sync failures.

Key Takeaways

  • The Problem
  • Deep Technical Analysis
  • How to Solve
  • Agent Instructions: Querying This Documentation

The Problem

Your data source integration hits rate limits, causing slow syncs, partial data ingestion, or complete sync failures.

Symptoms

  • ❌ "429 Too Many Requests" errors
  • ❌ Sync takes 10x longer than expected
  • ❌ Only syncs 500 docs before stopping
  • ❌ Works fine for small datasets, fails for large
  • ❌ Other integrations stop working during Twig sync

Real-World Example

Zendesk API rate limit: 700 requests/minute
Your help center: 2,000 articles

Twig sync attempts:
→ Enumerate all categories: 5 requests
→ List articles in each category: 50 requests
→ Fetch full content for each article: 2,000 requests
→ Total: 2,055 requests

At 700/min max: 3 minute minimum

But:
→ Twig makes 50 concurrent requests/sec (3,000/min)
→ After 700 requests: 429 rate limit
→ Twig backs off 60 seconds
→ Resumes, hits limit again
→ Sync takes 45 minutes instead of 3

Deep Technical Analysis

Rate Limit Types and Granularity

APIs implement multiple layers of rate limiting:

Per-User Limits:

Limit: 100 requests/minute per user token

If Twig uses API token for user@company.com:
→ All Twig requests count against user@company.com's quota
→ If user also uses Zendesk app: shares same quota
→ Twig sync can block user's manual API access

Per-App Limits:

Limit: 10,000 requests/hour per OAuth app

If Twig has OAuth app ID: app_123:
→ All requests from Twig app_123 (all customers) share quota
→ 100 customers syncing simultaneously
→ Each uses 100 req/hr → 10,000 total
→ Quota exhausted, all customers affected

Per-Endpoint Limits:

Slack example:
→ conversations.list: 20 req/min (Tier 2)
→ conversations.history: 50 req/min (Tier 3)
→ users.info: 100 req/min (Tier 4)

Twig must:
→ Track quota per endpoint separately
→ Can't assume global rate limit
→ Complex quota bookkeeping

Burst vs Sustained:

Many APIs allow bursts:

Token bucket algorithm:
→ Bucket capacity: 100 tokens
→ Refill rate: 10 tokens/second
→ Each request: consumes 1 token

Burst behavior:
→ 0s: 100 tokens available
→ Make 100 requests instantly: ✓ allowed
→ 1s: 10 tokens refilled
→ Make 10 more: ✓ allowed
→ Attempt 11th: ✗ rate limited

Twig must understand:
→ Initial burst allowance
→ Sustained rate after burst
→ Bucket refill rate

Rate Limit Discovery and Headers

APIs communicate limits via HTTP headers:

Standard Headers (RFCs):

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 742
X-RateLimit-Reset: 1642204800

But variations exist:

GitHub:
→ X-RateLimit-*

Stripe:
→ Stripe-RateLimit-*

Twitter:
→ x-rate-limit-* (lowercase)

Google:
→ No headers (must infer from 429 errors)

Inconsistency:
→ Twig must handle all formats
→ Or: No headers available, must estimate

The Reset Timestamp Ambiguity:

X-RateLimit-Reset: 1642204800

Is this:
→ Unix timestamp (seconds since epoch)?
→ Milliseconds since epoch?
→ Seconds until reset (relative)?
→ ISO 8601 datetime string?

Spec says: Unix timestamp
Reality: Providers differ

Wrong interpretation:
→ Wait 1642204800 seconds (52 years!)
→ Or wait 1 second when should wait 3600

Distributed Rate Limiting

Multiple Twig servers must share quota:

Naive Approach (broken):

3 Twig servers:
→ Server A: tracks "450 requests made"
→ Server B: tracks "380 requests made"
→ Server C: tracks "410 requests made"

Total: 1,240 requests
But actual limit: 1,000 requests/min

Each server thinks it's under limit
All servers make requests
Actual total: Exceeds quota → 429 errors

Centralized Rate Limiter:

Redis-based solution:
→ All servers increment shared counter in Redis
→ INCR api:zendesk:requests:minute_12345
→ SET TTL 60 seconds
→ If counter > 1000: deny request

Pros:
+ Accurate shared state
+ Prevents over-limit requests

Cons:
- Redis becomes bottleneck (1ms latency per request)
- Redis failure blocks all API calls
- Network overhead

The Race Condition:

Two servers simultaneously:

Server A:
1. GET counter → 999
2. Check: 999 < 1000 → OK
3. Make API call
4. INCR counter → 1000

Server B (concurrent):
1. GET counter → 999 (before Server A's INCR)
2. Check: 999 < 1000 → OK
3. Make API call
4. INCR counter → 1001

Both requests approved, but limit is 1000
Result: Quota exceeded by 1

Solution: Atomic operation (INCR then check)
But: Check happens after INCR, too late to prevent

Retry and Backoff Strategies

Handling 429 responses requires exponential backoff:

Naive Retry (bad):

Request fails with 429:
→ Wait 1 second
→ Retry
→ 429 again
→ Wait 1 second
→ Retry...
→ Hammering API with retries
→ Makes rate limiting worse

Exponential Backoff (better):

Attempt 1: Immediate request → 429
→ Wait 1 second
Attempt 2: Retry → 429
→ Wait 2 seconds
Attempt 3: Retry → 429
→ Wait 4 seconds
Attempt 4: Retry → 429
→ Wait 8 seconds
Attempt 5: Retry → Success

Respects provider's recovery time

Jitter to Prevent Thundering Herd:

Without jitter:
→ 100 requests hit limit at same time
→ All wait exactly 8 seconds
→ All retry at exact same moment
→ Thundering herd, 429 again

With jitter:
wait_time = 8 + random(0, 2)
→ Request A: waits 8.3s
→ Request B: waits 9.1s
→ Request C: waits 8.7s
→ Spread out retries, smoother traffic

The Give-Up Point:

How long to retry?

Option 1: Fixed attempts (e.g., 5 retries)
→ Total time: 1 + 2 + 4 + 8 + 16 = 31 seconds
→ If still failing: give up

Option 2: Fixed duration (e.g., 5 minutes)
→ Keep retrying until 5 min elapsed
→ Then fail

Option 3: Retry-After header
→ 429 response includes: Retry-After: 120
→ Wait exactly 120 seconds
→ One retry, then give up if still failing

Best: Respect Retry-After, fallback to exponential backoff

Request Batching and Pagination

Reducing API calls through efficient patterns:

The Pagination Efficiency Problem:

List 5,000 articles:

Approach A (inefficient):
→ Page size: 10
→ 500 pages × 1 request each = 500 API calls

Approach B (efficient):
→ Page size: 100
→ 50 pages × 1 request each = 50 API calls

10x fewer requests, same data

But maximum page size limits:

Most APIs:
→ Max page size: 100 (Google Drive)
→ Max page size: 50 (Zendesk)
→ No way to request more per call

Twig must:
→ Use largest allowed page size
→ Minimize paginated requests

Batching Where Supported:

Some APIs allow batch requests:

GitHub GraphQL:
query {
  repo1: repository(owner: "x", name: "y") { ... }
  repo2: repository(owner: "a", name: "b") { ... }
  repo3: repository(owner: "c", name: "d") { ... }
}

Single request fetches 3 repositories
→ 3x efficiency vs REST

But:
- Not all APIs support batching
- GraphQL has query complexity limits
- Still counts as multiple units toward quota

Quota Pooling and Reservation

Advanced rate limit management:

The Multi-Data-Source Problem:

Customer has:
→ Data Source A: needs 500 req/min
→ Data Source B: needs 300 req/min
→ Data Source C: needs 400 req/min
Total: 1,200 req/min

But API quota: 1,000 req/min

All three sources can't sync simultaneously
Must coordinate:
→ Source A: 500 req/min
→ Source B: 300 req/min
→ Source C: 200 req/min (throttled)
→ After A finishes: C gets remaining quota

Priority-Based Scheduling:

Assign priorities:
1. Real-time updates (webhook-triggered): High priority
2. Incremental syncs (hourly): Medium priority
3. Full syncs (daily): Low priority

Quota allocation:
→ Reserve 30% for high priority
→ 50% for medium
→ 20% for low

If high priority needs more:
→ Borrow from medium/low
→ Ensure critical updates never blocked

The Cold Start Problem:

New customer connects Zendesk:
→ 10,000 articles to sync (initial full sync)
→ At 700 req/min: 15 minute sync time

But:
→ Existing customers' incremental syncs running
→ Using 600/700 req/min quota
→ New customer gets 100 req/min remaining
→ Sync takes 100 minutes instead of 15

Solution:
→ Queue new customer's full sync for off-peak hours
→ Or: Throttle existing customers temporarily
→ Tradeoff: New customer waits longer

Cost vs Speed Tradeoffs

Respecting rate limits reduces sync speed:

The Impatience Problem:

User connects data source at 10:00 AM
Expects: Data ready in 5 minutes
Reality: Rate limits mean 45 minute sync

User perception:
→ "This is so slow!"
→ "Is it stuck?"
→ Cancels sync
→ Incomplete knowledge base

Paid Tiers for Higher Limits:

Many APIs offer:
→ Free tier: 100 req/min
→ Pro tier: 1,000 req/min
→ Enterprise tier: 10,000 req/min

Twig options:
1. Absorb cost: Upgrade all customers to Pro
   → Expensive at scale

2. Pass through cost: Customer pays for their API tier
   → Better user experience if they upgrade
   → But complexity: billing, provisioning

3. Optimize within free tier
   → Slower but free
   → Current approach for most

How to Solve

Implement exponential backoff with jitter + respect Retry-After headers + use Redis for distributed rate limiting + batch requests where possible + use largest page sizes + reserve quota for high-priority updates. See Rate Limit Handling.


Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the ask query parameter:

GET /dev/rag-scenarios-and-solutions/data-integration/rate-limit-exhaustion.md?ask=<question>

The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.

Related Pages

Last updated January 26, 2026