Overview
riddle_map crawls a website starting from a given URL and returns every discoverable page. Use it to build sitemaps, plan crawls, or audit site structure.
How it works
Starting from your URL, the crawler follows internal links breadth-first, respecting robots.txt by default. You control the max page count and can filter URLs with include/exclude patterns.
Endpoint
POST
/v1/mapcurl -X POST "https://api.riddledc.com/v1/map" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "max_pages": 500}'Parameters
| Parameter | Type | Description |
|---|---|---|
url | string | Required. Starting URL to crawl from. |
max_pages | number | Maximum pages to discover. Default: 500 |
include_patterns | string[] | Only follow URLs matching these glob patterns. E.g. ["/blog/*"] |
exclude_patterns | string[] | Skip URLs matching these glob patterns. E.g. ["/admin/*"] |
respect_robots | boolean | Honor robots.txt directives. Default: true |
Response
{
"urls": [
"https://example.com",
"https://example.com/about",
"https://example.com/blog",
"https://example.com/blog/first-post",
"https://example.com/contact"
],
"count": 5
}Examples
JavaScript
const response = await fetch("https://api.riddledc.com/v1/map", {
method: "POST",
headers: {
"Authorization": `Bearer ${RIDDLE_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
url: "https://example.com",
max_pages: 500
})
});
const { urls, count } = await response.json();
console.log(`Found ${count} pages`);Filtered Crawl
const { urls } = await fetch("https://api.riddledc.com/v1/map", {
method: "POST",
headers: {
"Authorization": `Bearer ${RIDDLE_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
url: "https://example.com",
max_pages: 200,
include_patterns: ["/blog/*", "/docs/*"],
exclude_patterns: ["/admin/*", "/internal/*"]
})
}).then(r => r.json());Map Then Scrape
Combine with riddle_scrape to extract content from every discovered page:
// 1. Discover all pages
const { urls } = await fetch("https://api.riddledc.com/v1/map", {
method: "POST",
headers: {
"Authorization": `Bearer ${RIDDLE_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({ url: "https://example.com", max_pages: 100 })
}).then(r => r.json());
// 2. Scrape each page
const results = await Promise.all(
urls.map(url =>
fetch("https://api.riddledc.com/v1/scrape", {
method: "POST",
headers: {
"Authorization": `Bearer ${RIDDLE_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({ url })
}).then(r => r.json())
)
);Use Cases
Sitemap Generation
Discover all pages and generate an XML sitemap automatically.
Link Auditing
Find all internal links to check for broken pages or orphaned content.
Pre-Crawl Planning
Map the site first, then use riddle_crawl to extract content from targeted pages.
Site Structure Analysis
Understand URL patterns and information architecture at a glance.