Product
Sitemap Integration
Import website content using a sitemap.xml file to efficiently index large documentation sites, blogs, and other structured web content.
TL;DR
Import website content using a sitemap.xml file to efficiently index large documentation sites, blogs, and other structured web content.
Key Takeaways
- Overview
- When to Use Sitemap Connector
- What is a Sitemap.xml?
- Finding a Website's Sitemap
- How to Add a Sitemap
- Creating Custom Sitemaps
Import website content using a sitemap.xml file to efficiently index large documentation sites, blogs, and other structured web content.
Overview
| Property | Details |
|---|---|
| Type | Static |
| Refresh | Manual |
| Tier | 1 (All Plans) |
| Format | sitemap.xml file |
| Max URLs | Varies by plan |
When to Use Sitemap Connector
The Sitemap connector is ideal for:
- Large Documentation Sites - Efficiently import hundreds of pages
- Structured Content - Sites with well-organized sitemaps
- Static Site Generators - Jekyll, Hugo, Docusaurus, etc.
- Archived Content - One-time import of website snapshots
- Selective Imports - When you want specific URLs from a site
What is a Sitemap.xml?
A sitemap.xml file is a list of URLs on a website, typically used to help search engines discover and index pages. It looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://docs.example.com/getting-started</loc>
<lastmod>2024-01-15</lastmod>
</url>
<url>
<loc>https://docs.example.com/api-reference</loc>
<lastmod>2024-01-14</lastmod>
</url>
<url>
<loc>https://docs.example.com/tutorials</loc>
<lastmod>2024-01-10</lastmod>
</url>
</urlset>
Finding a Website's Sitemap
Common Sitemap Locations
Most websites place their sitemap at:
https://example.com/sitemap.xmlhttps://example.com/sitemap_index.xmlhttps://example.com/sitemap1.xmlhttps://docs.example.com/sitemap.xml
Check robots.txt
Many sites list their sitemap in robots.txt:
https://example.com/robots.txt
Look for:
Sitemap: https://example.com/sitemap.xml
Use Browser Tools
- Open the website in your browser
- Right-click → View Page Source
- Search (Ctrl+F / Cmd+F) for "sitemap"
- Look for
<link rel="sitemap"tags
Ask the Website Owner
If you can't find the sitemap, contact the site administrator. They can:
- Provide the sitemap URL
- Generate a sitemap if one doesn't exist
- Create a custom sitemap with specific pages
How to Add a Sitemap
Step 1: Download the Sitemap
- Navigate to the sitemap URL in your browser
- Right-click on the page
- Select "Save As" or "Save Page As"
- Save with filename:
sitemap.xml
Alternative: Use command line:
curl https://docs.example.com/sitemap.xml -o sitemap.xml
or
wget https://docs.example.com/sitemap.xml
Step 2: Navigate to Data Sources
- Log in to your Twig AI account
- Click Data in the main navigation menu
- Click Add Data Source or the + button
Step 3: Select Sitemap Connector
- Choose Sitemap.xml from the list
- The connector shows: "Publicly accessible websites from a sitemap.xml file"
Step 4: Configure the Data Source
Basic Information
- Name (required): Descriptive name
- Example: "Documentation Sitemap", "Blog Sitemap", "Help Center Pages"
- Description (optional): Additional context
- Example: "Complete documentation site from sitemap dated 2024-01-15"
File Upload
- Click Choose File or drag-and-drop
- Select your downloaded
sitemap.xmlfile - Wait for upload to complete
Tags (Optional)
- Add organizational tags
- Examples: "documentation", "external", "sitemap"
Step 5: Save and Process
- Click Save or Create
- System will:
- Parse the sitemap file
- Fetch each URL listed
- Extract and index content
- Monitor processing status
Step 6: Verify Import
- Check record count (number of URLs processed)
- Verify status shows "END_PROCESS"
- Review process logs for any failed URLs
- Test with relevant questions
Creating Custom Sitemaps
If you need a sitemap for a specific subset of pages, you can create one manually.
Basic Sitemap Structure
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page1</loc>
</url>
<url>
<loc>https://example.com/page2</loc>
</url>
<url>
<loc>https://example.com/page3</loc>
</url>
</urlset>
With Optional Metadata
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/important-page</loc>
<lastmod>2024-01-15</lastmod>
<priority>1.0</priority>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://example.com/other-page</loc>
<lastmod>2024-01-10</lastmod>
<priority>0.8</priority>
<changefreq>weekly</changefreq>
</url>
</urlset>
Optional Tags:
<lastmod>- Last modified date (YYYY-MM-DD)<priority>- Importance (0.0 to 1.0)<changefreq>- Update frequency (daily, weekly, monthly)
Using Online Sitemap Generators
Several tools can generate sitemaps:
- Screaming Frog SEO Spider - Desktop app
- XML-Sitemaps.com - Online generator
- Sitemap Writer Pro - Desktop app
- Custom scripts - Python, Node.js, etc.
Examples
Example 1: Documentation Site
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://docs.example.com/</loc>
</url>
<url>
<loc>https://docs.example.com/getting-started</loc>
</url>
<url>
<loc>https://docs.example.com/api-reference</loc>
</url>
<url>
<loc>https://docs.example.com/tutorials</loc>
</url>
<url>
<loc>https://docs.example.com/faq</loc>
</url>
</urlset>
Name: Product Documentation
Description: Complete product documentation from sitemap
File: docs-sitemap.xml (5 URLs)
Tags: documentation, product, public
Example 2: Blog Posts
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://blog.example.com/2024/how-to-get-started</loc>
</url>
<url>
<loc>https://blog.example.com/2024/advanced-tips</loc>
</url>
<url>
<loc>https://blog.example.com/2023/year-in-review</loc>
</url>
</urlset>
Name: Technical Blog Posts
Description: Selected technical blog posts
File: blog-sitemap.xml (3 URLs)
Tags: blog, technical, public
Example 3: Help Center
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://help.example.com/en/articles/account-setup</loc>
</url>
<url>
<loc>https://help.example.com/en/articles/billing-faq</loc>
</url>
<url>
<loc>https://help.example.com/en/articles/troubleshooting</loc>
</url>
<url>
<loc>https://help.example.com/en/articles/api-integration</loc>
</url>
</urlset>
Name: Help Center Articles
Description: Customer support articles in English
File: help-sitemap.xml (4 URLs)
Tags: support, help, customer-facing
Best Practices
1. Filter Sitemap Content
Before uploading, edit the sitemap to include only relevant pages:
Good:
<url><loc>https://example.com/docs/getting-started</loc></url>
<url><loc>https://example.com/docs/api-reference</loc></url>
<url><loc>https://example.com/docs/tutorials</loc></url>
Remove:
<url><loc>https://example.com/login</loc></url>
<url><loc>https://example.com/signup</loc></url>
<url><loc>https://example.com/checkout</loc></url>
<url><loc>https://example.com/privacy-policy</loc></url>
2. Keep Sitemaps Organized
Create separate data sources for different content types:
docs-sitemap.xml- Documentation pageshelp-sitemap.xml- Support articlesblog-sitemap.xml- Blog posts
3. Version Your Sitemaps
When re-importing, keep versions:
docs-sitemap-2024-01.xml
docs-sitemap-2024-02.xml
docs-sitemap-2024-03.xml
4. Validate Before Upload
Use sitemap validators:
- https://www.xml-sitemaps.com/validate-xml-sitemap.html
- https://www.websiteplanet.com/webtools/sitemap-validator/
5. Check URL Accessibility
Ensure all URLs in sitemap are:
- Publicly accessible (no authentication required)
- Returning 200 status code (not 404 or redirects)
- Containing actual content (not empty pages)
Advantages Over Website Connector
| Feature | Sitemap | Website Crawler |
|---|---|---|
| Speed | Fast (only listed URLs) | Slower (discovers links) |
| Precision | Exact pages you want | May miss or include extra pages |
| Control | Full control over URLs | Limited by crawler settings |
| Resources | Less server load | More server requests |
| Freshness | Manual update needed | Can auto-refresh |
Use Sitemap when:
- You know exactly which pages to import
- Site has a complete, up-to-date sitemap
- You want a one-time import
- You need to minimize server load
Use Website Crawler when:
- You want automatic discovery
- Site structure changes frequently
- You want automatic updates
- You're not sure which pages exist
Handling Large Sitemaps
Sitemap Index Files
Large sites may use sitemap index files:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap1.xml</loc>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap2.xml</loc>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap3.xml</loc>
</sitemap>
</sitemapindex>
To import:
- Download each individual sitemap
- Create separate data sources for each, or
- Merge sitemaps into one file before uploading
Merging Multiple Sitemaps
Combine multiple sitemaps into one:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<!-- URLs from sitemap1.xml -->
<url><loc>https://example.com/page1</loc></url>
<url><loc>https://example.com/page2</loc></url>
<!-- URLs from sitemap2.xml -->
<url><loc>https://example.com/page3</loc></url>
<url><loc>https://example.com/page4</loc></url>
</urlset>
Updating Content
Since Sitemap is a static connector, updates require re-import:
To Update:
- Download updated sitemap from website
- Edit your data source in Twig
- Upload the new sitemap file
- Save to reprocess all URLs
Automation Options:
- Schedule periodic manual updates
- Use Website connector for automatic updates
- Set up external scripts to notify you of sitemap changes
Troubleshooting
URLs Not Accessible
Problem: Some URLs fail to process
Solutions:
- Verify URLs are publicly accessible
- Check for authentication requirements
- Test URLs in incognito browser window
- Review process logs for specific error codes
Invalid Sitemap Format
Problem: Sitemap upload fails
Solutions:
- Validate XML syntax using online validator
- Check for proper XML declaration
- Ensure proper namespace declaration
- Verify file encoding is UTF-8
Empty Pages Imported
Problem: URLs processed but no content extracted
Solutions:
- Check if pages contain actual text content
- Verify pages aren't JavaScript-heavy SPAs
- Look for content behind login walls
- Test URL manually in browser
Partial Import
Problem: Only some URLs processed
Solutions:
- Check plan limits on number of URLs
- Review process logs for errors
- Verify failed URLs are accessible
- Split large sitemaps into multiple sources
Advanced Tips
1. Filtering with Text Editor
Use find-and-replace in text editor to quickly filter sitemaps:
Remove URLs containing "blog":
Find: .*<url>.*blog.*</url>.*\n
Replace: (empty)
Keep only "/docs/" URLs:
- Copy entire sitemap
- Delete all content
- Paste back only lines containing "/docs/"
2. Combining Sitemaps from Different Sites
Create a consolidated sitemap:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<!-- Site 1 docs -->
<url><loc>https://docs.site1.com/guide</loc></url>
<!-- Site 2 docs -->
<url><loc>https://docs.site2.com/guide</loc></url>
<!-- Site 3 docs -->
<url><loc>https://docs.site3.com/guide</loc></url>
</urlset>
3. Priority-Based Import
Create multiple data sources based on priority:
High Priority (daily refresh needed):
<url><loc>https://example.com/getting-started</loc></url>
<url><loc>https://example.com/pricing</loc></url>
Low Priority (rarely changes):
<url><loc>https://example.com/company-history</loc></url>
<url><loc>https://example.com/team</loc></url>
Next Steps
After importing from sitemap:
- Test knowledge coverage
- Create AI agents for specific content areas
- Monitor usage to see which pages are most referenced
- Plan periodic sitemap updates
Related Connectors
- Website - Automated web crawling with refresh
- Files - Upload HTML or PDF exports
- Confluence - For wiki-based documentation
- Google Drive - For cloud-hosted documentation
Agent Instructions: Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.
Perform an HTTP GET request on the current page URL with the ask query parameter:
GET /dev/product/data-integrations/sitemap.md?ask=<question>
The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.
Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
People also ask
Related Pages
Last updated January 25, 2026


