Product
Getting Started
Add and configure data sources for ingestion
TL;DR
A data source is content that Twig indexes for retrieval. Supported types: Documentation sites and help centers File uploads (PDF, DOCX, TXT) Confluence spaces Slack channels Google Drive, SharePoint, OneDrive folders Zendesk articles
Key Takeaways
- What is a Data Source
- Navigate to Data Sources
- Data Sources Screen
- Add a New Data Source
- How to Verify
- Common Mistakes
What is a Data Source
A data source is content that Twig indexes for retrieval. Supported types:
- Documentation sites and help centers
- File uploads (PDF, DOCX, TXT)
- Confluence spaces
- Slack channels
- Google Drive, SharePoint, OneDrive folders
- Zendesk articles
Processing flow: Fetch documents → Parse text → Chunk (512 tokens) → Embed (OpenAI ada-002) → Index (Pinecone)
Navigate to Data Sources
- Click Data in left navigation
- Or: Home → Data Sources → Manage Data Sources
Expected view: List of existing data sources with status (Active, Processing, Failed)
Navigating to Data Sources
Data Sources Screen
Columns displayed:
- Name: Data source name
- Type: WEBSITE, FILE, CONFLUENCE, SLACK, etc.
- Status: Active (green), Processing (yellow), Failed (red)
- Chunks Indexed: Count (e.g., "1,234 chunks")
- Last Sync: Timestamp (e.g., "2 hours ago")
- Actions: Process, Edit, Delete buttons
Status meanings:
- Active: Ingestion complete, available for retrieval
- Processing: Currently chunking/embedding/indexing
- Failed: Error during processing (click for error log)
Data Sources Screen
Add a New Data Source
Click Add Data Source button (top right)
Supported Types
| Type | Input | Max Size | Processing Time |
|---|---|---|---|
| Website Sitemap | Sitemap.xml URL | 10,000 pages | 5-30 min |
| Website Crawler | Base URL | 10,000 pages | 10-60 min |
| File Upload | PDF, DOCX, TXT | 50MB per file | 1-5 min |
| Zip Upload | .zip with documents | 200MB | 5-20 min |
| Confluence Space | OAuth connection | Unlimited | 10-60 min |
| Slack Workspace | OAuth connection | Last 90 days | 10-30 min |
| Google Drive | OAuth connection | Unlimited | 10-60 min |
Website Sitemap
- Select Website Sitemap from modal
- Enter sitemap URL:
https://example.com/sitemap.xml - Click Add
- Status changes: "Pending" → "Processing" → "Active"
Expected result: Pages crawled count displayed (e.g., "250 pages → 1,200 chunks")
Common errors:
- "Sitemap not found (404)" → Verify URL is accessible
- "Rate limit exceeded" → Wait 1 hour, crawler resumes automatically
File Upload
- Select File Upload
- Click Choose Files or drag-and-drop
- Select files: PDF, DOCX, TXT (max 50MB each)
- Click Upload
Expected result: Each file shows progress bar → "Processing" → "Active"
Supported formats:
- PDF: Text-based (not scanned images)
- DOCX: Microsoft Word 2007+
- TXT: UTF-8 encoding
Confluence Space
- Select Confluence
- Click Connect to Confluence
- Authorize in Confluence OAuth screen
- Select spaces to index (checkboxes)
- Click Import
Expected result: Space count and page count displayed during processing
Permissions required: Confluence read access for selected spaces
Zip File
- Select Zip Upload
- Upload .zip file (max 200MB)
- Twig extracts and processes each file
Expected result: Shows file count (e.g., "50 files extracted → 200 chunks indexed")
Constraints:
- Zip must contain only supported file types (PDF, DOCX, TXT)
- Nested folders supported (files flattened during extraction)
How to Verify
- Data Sources list shows status "Active" (green)
- Chunks count > 0 (e.g., "450 chunks")
- Last sync timestamp recent (e.g., "5 minutes ago")
- Playground → Query agent → Check "Sources Used" panel shows chunks from this data source
Common Mistakes
Symptom: Status stuck at "Processing" for >30 minutes
Cause: Processing worker stalled or large dataset
Fix: Refresh page. If still processing after 1 hour, contact support with data source ID.
Symptom: Status "Failed" with error message
Cause: Invalid URL, authentication failure, or unsupported file format
Fix: Click data source name → Logs tab → check error message. Common fixes:
- "401 Unauthorized" → Reconnect OAuth (Edit → Reconnect)
- "Unsupported format" → Convert file to PDF/DOCX
- "URL not accessible" → Verify URL works in browser
When This Doesn't Apply
This guide covers standard data source types. For custom integrations (APIs, databases), contact support@twig.so.
Agent Instructions: Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.
Perform an HTTP GET request on the current page URL with the ask query parameter:
GET /dev/product/data-integrations/add-new-data-sources.md?ask=<question>
The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.
Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Related Pages
Last updated January 26, 2026


