Product

Getting Started

Add and configure data sources for ingestion

TL;DR

A data source is content that Twig indexes for retrieval. Supported types: Documentation sites and help centers File uploads (PDF, DOCX, TXT) Confluence spaces Slack channels Google Drive, SharePoint, OneDrive folders Zendesk articles

Key Takeaways

  • What is a Data Source
  • Navigate to Data Sources
  • Data Sources Screen
  • Add a New Data Source
  • How to Verify
  • Common Mistakes

What is a Data Source

A data source is content that Twig indexes for retrieval. Supported types:

  • Documentation sites and help centers
  • File uploads (PDF, DOCX, TXT)
  • Confluence spaces
  • Slack channels
  • Google Drive, SharePoint, OneDrive folders
  • Zendesk articles

Processing flow: Fetch documents → Parse text → Chunk (512 tokens) → Embed (OpenAI ada-002) → Index (Pinecone)

  1. Click Data in left navigation
  2. Or: Home → Data Sources → Manage Data Sources

Expected view: List of existing data sources with status (Active, Processing, Failed)

Navigating to Data Sources

Data Sources Screen

Columns displayed:

  • Name: Data source name
  • Type: WEBSITE, FILE, CONFLUENCE, SLACK, etc.
  • Status: Active (green), Processing (yellow), Failed (red)
  • Chunks Indexed: Count (e.g., "1,234 chunks")
  • Last Sync: Timestamp (e.g., "2 hours ago")
  • Actions: Process, Edit, Delete buttons

Status meanings:

  • Active: Ingestion complete, available for retrieval
  • Processing: Currently chunking/embedding/indexing
  • Failed: Error during processing (click for error log)

Data Sources Screen

Add a New Data Source

Click Add Data Source button (top right)

Supported Types

TypeInputMax SizeProcessing Time
Website SitemapSitemap.xml URL10,000 pages5-30 min
Website CrawlerBase URL10,000 pages10-60 min
File UploadPDF, DOCX, TXT50MB per file1-5 min
Zip Upload.zip with documents200MB5-20 min
Confluence SpaceOAuth connectionUnlimited10-60 min
Slack WorkspaceOAuth connectionLast 90 days10-30 min
Google DriveOAuth connectionUnlimited10-60 min

Website Sitemap

  1. Select Website Sitemap from modal
  2. Enter sitemap URL: https://example.com/sitemap.xml
  3. Click Add
  4. Status changes: "Pending" → "Processing" → "Active"

Expected result: Pages crawled count displayed (e.g., "250 pages → 1,200 chunks")

Common errors:

  • "Sitemap not found (404)" → Verify URL is accessible
  • "Rate limit exceeded" → Wait 1 hour, crawler resumes automatically

File Upload

  1. Select File Upload
  2. Click Choose Files or drag-and-drop
  3. Select files: PDF, DOCX, TXT (max 50MB each)
  4. Click Upload

Expected result: Each file shows progress bar → "Processing" → "Active"

Supported formats:

  • PDF: Text-based (not scanned images)
  • DOCX: Microsoft Word 2007+
  • TXT: UTF-8 encoding

Confluence Space

  1. Select Confluence
  2. Click Connect to Confluence
  3. Authorize in Confluence OAuth screen
  4. Select spaces to index (checkboxes)
  5. Click Import

Expected result: Space count and page count displayed during processing

Permissions required: Confluence read access for selected spaces

Zip File

  1. Select Zip Upload
  2. Upload .zip file (max 200MB)
  3. Twig extracts and processes each file

Expected result: Shows file count (e.g., "50 files extracted → 200 chunks indexed")

Constraints:

  • Zip must contain only supported file types (PDF, DOCX, TXT)
  • Nested folders supported (files flattened during extraction)

How to Verify

  1. Data Sources list shows status "Active" (green)
  2. Chunks count > 0 (e.g., "450 chunks")
  3. Last sync timestamp recent (e.g., "5 minutes ago")
  4. Playground → Query agent → Check "Sources Used" panel shows chunks from this data source

Common Mistakes

Symptom: Status stuck at "Processing" for >30 minutes

Cause: Processing worker stalled or large dataset

Fix: Refresh page. If still processing after 1 hour, contact support with data source ID.


Symptom: Status "Failed" with error message

Cause: Invalid URL, authentication failure, or unsupported file format

Fix: Click data source name → Logs tab → check error message. Common fixes:

  • "401 Unauthorized" → Reconnect OAuth (Edit → Reconnect)
  • "Unsupported format" → Convert file to PDF/DOCX
  • "URL not accessible" → Verify URL works in browser

When This Doesn't Apply

This guide covers standard data source types. For custom integrations (APIs, databases), contact support@twig.so.


Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the ask query parameter:

GET /dev/product/data-integrations/add-new-data-sources.md?ask=<question>

The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.

Related Pages

Last updated January 26, 2026