Rag Scenarios And Solutions
Zendesk Integration Errors
Your Zendesk data source shows errors, articles don't sync, or only some help center content appears in your AI agent's knowledge base.
TL;DR
Your Zendesk data source shows errors, articles don't sync, or only some help center content appears in your AI agent's knowledge base.
Key Takeaways
- The Problem
- Deep Technical Analysis
- How to Solve
- Agent Instructions: Querying This Documentation
The Problem
Your Zendesk data source shows errors, articles don't sync, or only some help center content appears in your AI agent's knowledge base.
Symptoms
- ❌ "Authentication Failed" despite valid API token
- ❌ Only 50 articles synced out of 500
- ❌ "Permission denied" for certain categories
- ❌ Sync works for English content but fails for other languages
- ❌ Ticket comments not appearing in knowledge base
Real-World Example
Your Zendesk has 3 help centers (English, Spanish, French)
with 450 articles total.
After connecting: Only 180 English articles sync.
Data Source Status: "Partial Sync - Access Denied"
Error: "Cannot access category: Internal KB (403 Forbidden)"
Deep Technical Analysis
Zendesk's Multi-Layered Content Structure
Zendesk isn't a simple document repository—it's a complex CMS with multiple content types and access controls:
Content Hierarchy:
Brand (e.g., support.company.com)
→ Help Center (English, Spanish, French)
→ Categories (Getting Started, Billing, Technical, etc.)
→ Sections (Sub-categories)
→ Articles (Individual docs)
→ Article Translations (same article, different language)
→ Article Attachments (images, PDFs)
→ Article Comments (if enabled)
Parallel Structure:
→ Internal Articles (agent-only)
→ Draft Articles (not published)
→ Archived Articles (hidden but exist)
The Access Control Problem:
Each layer has independent visibility settings:
Article Visibility Options:
1. "Everyone" - public, no auth required
2. "Signed-in users" - requires login
3. "Agents and admins" - internal only
4. "Agents in group X" - specific team only
Category Visibility:
→ Can override article visibility
→ "Internal KB" category → all articles inside are internal
→ API token must have agent permissions to access
Why This Causes Sync Failures:
When Twig connects with a standard API token:
API Call: GET /api/v2/help_center/articles.json
Zendesk filters response based on token permissions:
→ Public articles: ✓ returned
→ Signed-in user articles: ✓ returned (if token has auth)
→ Agent-only articles: ✗ filtered out (403 Forbidden)
From Twig's perspective:
- "Internal Troubleshooting" category: doesn't appear in API
- "Agent Guidelines" section: missing entirely
- 270 articles invisible
From user's perspective:
- "Why isn't our internal KB in the agent?"
- "The sync is broken, only half my content is here"
API Token Permissions vs Actual Access
Zendesk has multiple authentication methods with different access levels:
API Token Types:
1. End User Token (OAuth)
→ Can only access public + signed-in content
→ Cannot see agent-only articles
→ Cannot access tickets
→ Limited to help center content
2. Agent API Token
→ Full agent access
→ Can see all articles (public + internal)
→ Can access tickets and comments
→ But... permission depends on agent's role
3. Admin API Token
→ Full access to everything
→ Can access archived content
→ Can modify content
→ Required for complete knowledge sync
The Permission Cascade:
API Token Belongs To: agent@company.com (Agent role)
Agent's Group Membership: "Support Tier 1"
Articles with visibility:
→ "Agents in Sales Engineering" → 403 (wrong group)
→ "Admins only" → 403 (not an admin)
→ "Everyone" → 200 OK
Result: Sync succeeds but missing 40% of articles
User thinks: "Integration is broken"
Reality: Token doesn't have sufficient permissions
Multi-Brand and Multi-Locale Complexity
Enterprise Zendesk accounts often have multiple brands and locales:
The Brand Problem:
Zendesk Account: company.zendesk.com
Brands:
1. support.company.com (main product support)
2. help.partnerportal.com (partner help center)
3. internal.company.com (internal KB)
Each brand has its own:
→ Help center
→ Categories and articles
→ API endpoints
→ Access controls
API Enumeration Challenge:
Standard API call:
GET /api/v2/help_center/articles.json
Returns: Articles from DEFAULT brand only
To get all brands:
1. GET /api/v2/brands.json → list all brands
2. For each brand:
GET /api/v2/help_center/{brand_id}/articles.json
If Twig only queries default brand:
→ Missing 2 entire help centers
→ 600+ articles invisible
→ User sees "partial sync"
The Locale Problem:
Article: "Getting Started Guide"
Translations:
→ en-US (English)
→ es (Spanish)
→ fr-FR (French)
→ de (German)
API representation:
{
"id": 12345,
"title": "Getting Started Guide",
"locale": "en-US",
"translations": [
{ "locale": "es", "id": 12346, "title": "Guía de Inicio" },
{ "locale": "fr-FR", "id": 12347, "title": "Guide de Démarrage" }
]
}
RAG Challenges:
1. Should we embed all translations separately? (4x storage cost)
2. Or embed English only? (non-English queries fail)
3. Or detect query language and retrieve matching locale? (complex)
4. Do users expect multilingual responses?
Dynamic Content and Draft State Management
Zendesk articles have multiple states and versions:
Article States:
Draft:
→ Article exists but not published
→ Visible to agents in Zendesk UI
→ API: accessible with ?permission_group_id filter
→ Should NOT be in production knowledge base
→ But often accidentally synced
Published:
→ Live, visible according to visibility settings
→ Should be in knowledge base
Archived:
→ Removed from help center
→ Still accessible via API (with direct ID)
→ API list doesn't include by default
→ Old URLs still work, causing confusion
The Draft Sync Problem:
Scenario:
Agent writes new article: "New Feature X - Coming Soon"
Status: Draft (not published yet)
Visibility: Agents only
Twig syncs with agent token:
→ API returns draft articles
→ Draft gets embedded in vector DB
→ Goes live in AI agent
User asks: "How do I use Feature X?"
AI Agent: "Feature X is available now! Here's how..."
Reality: Feature not launched yet, article was draft
Problem: Sync doesn't distinguish draft from published
API Pagination and Rate Limiting
Zendesk API has strict pagination and rate limits:
Rate Limits:
Standard Plan: 200 requests/minute
Professional: 400 requests/minute
Enterprise: 700 requests/minute
But:
→ Rate limit shared across ALL API consumers
→ If other integrations (Slack, Jira, etc.) are active
→ Twig's sync may get throttled
→ 429 errors force backoff
Pagination Complexity:
Articles API uses cursor-based pagination:
Request 1:
GET /api/v2/help_center/articles.json?page[size]=100
→ Returns 100 articles + page[after] cursor
Request 2:
GET /api/v2/help_center/articles.json?page[size]=100&page[after]=xyz
→ Returns next 100 articles + next cursor
For 500 articles:
→ 5 requests minimum
→ Each request costs 1 rate limit unit
→ Plus attachment downloads
→ Plus metadata enrichment
→ Total: 15-20 requests
The Cursor Invalidation Problem:
Sync in Progress:
→ Retrieved 300 articles (3 pages)
→ On 4th API request, cursor: cursor_abc123
Meanwhile:
→ Agent publishes new article
→ Another agent archives old article
→ Article order changes
Next request with cursor_abc123:
→ Zendesk returns 400 Bad Request: "Cursor invalid"
→ Must restart sync from beginning
→ Previous 300 articles re-processed
→ Inefficient and slow
Attachment and Inline Image Handling
Zendesk articles often contain images and attachments:
Inline Images:
<img src="https://company.zendesk.com/hc/article_attachments/12345/screenshot.png">
The Extraction Problem:
RAG Challenge:
1. Extract article HTML from API
2. Parse HTML, find <img> tags
3. Should we:
a) Download images and extract text via OCR?
b) Use alt text if available?
c) Ignore images entirely?
d) Include image URLs as metadata?
Each choice has trade-offs:
→ OCR: expensive, slow, often inaccurate for screenshots
→ Alt text: often missing or generic ("image.png")
→ Ignore: lose important visual information
→ Metadata only: can't answer "how do I configure X?" if answer is in screenshot
Attachment Complexity:
Article may have attachments:
→ PDFs (pricing sheets, user guides)
→ Excel files (data templates)
→ ZIP files (code samples)
API Response:
{
"attachments": [
{
"file_name": "setup_guide.pdf",
"content_url": "https://company.zendesk.com/...",
"content_type": "application/pdf"
}
]
}
Question: Should Twig:
1. Download and parse PDF? (extra processing)
2. Just index that "article has PDF attachment"?
3. Ignore attachments?
Most users expect AI to answer questions from PDF content,
but API doesn't provide parsed text, requires separate extraction.
Webhook vs Polling Sync Strategy
Zendesk offers webhooks but with limitations:
Polling Strategy (current):
Every 30 minutes:
1. GET /api/v2/help_center/articles.json
2. Compare updated_at timestamps
3. Re-process changed articles
4. Re-embed if content changed
Cons:
→ 30-minute lag for new content
→ Wastes API quota on unchanged articles
→ Inefficient for large help centers
Webhook Strategy (ideal):
Zendesk sends webhook on article changes:
→ article.published
→ article.updated
→ article.archived
Twig receives webhook immediately:
→ Process only changed article
→ Real-time knowledge updates
→ Efficient API usage
But:
→ Zendesk webhooks require admin setup
→ Not available on all plans
→ Webhook delivery not guaranteed (retry logic needed)
→ Must maintain webhook endpoint security
How to Solve
Use admin API token + enable all brands/locales + filter out draft articles + implement cursor retry logic. See Zendesk Integration for setup guide.
Agent Instructions: Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.
Perform an HTTP GET request on the current page URL with the ask query parameter:
GET /dev/rag-scenarios-and-solutions/data-integration/zendesk-errors.md?ask=<question>
The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.
Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Related Pages
Last updated January 26, 2026


