riddle_map

Discover all URLs on a website by crawling from a starting URL.

Overview

riddle_map crawls a website starting from a given URL and returns every discoverable page. Use it to build sitemaps, plan crawls, or audit site structure.

How it works

Starting from your URL, the crawler follows internal links breadth-first, respecting robots.txt by default. You control the max page count and can filter URLs with include/exclude patterns.

Endpoint

POST/v1/map

curl -X POST "https://api.riddledc.com/v1/map" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "max_pages": 500}'

Parameters

Parameter	Type	Description
`url`	string	Required. Starting URL to crawl from.
`max_pages`	number	Maximum pages to discover. Default: `500`
`include_patterns`	string[]	Only follow URLs matching these glob patterns. E.g. `["/blog/*"]`
`exclude_patterns`	string[]	Skip URLs matching these glob patterns. E.g. `["/admin/*"]`
`respect_robots`	boolean	Honor `robots.txt` directives. Default: `true`

Response

{
  "urls": [
    "https://example.com",
    "https://example.com/about",
    "https://example.com/blog",
    "https://example.com/blog/first-post",
    "https://example.com/contact"
  ],
  "count": 5
}

Examples

JavaScript

const response = await fetch("https://api.riddledc.com/v1/map", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${RIDDLE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    url: "https://example.com",
    max_pages: 500
  })
});

const { urls, count } = await response.json();
console.log(`Found ${count} pages`);

Filtered Crawl

const { urls } = await fetch("https://api.riddledc.com/v1/map", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${RIDDLE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    url: "https://example.com",
    max_pages: 200,
    include_patterns: ["/blog/*", "/docs/*"],
    exclude_patterns: ["/admin/*", "/internal/*"]
  })
}).then(r => r.json());

Map Then Scrape

Combine with riddle_scrape to extract content from every discovered page:

// 1. Discover all pages
const { urls } = await fetch("https://api.riddledc.com/v1/map", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${RIDDLE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({ url: "https://example.com", max_pages: 100 })
}).then(r => r.json());

// 2. Scrape each page
const results = await Promise.all(
  urls.map(url =>
    fetch("https://api.riddledc.com/v1/scrape", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${RIDDLE_API_KEY}`,
        "Content-Type": "application/json"
      },
      body: JSON.stringify({ url })
    }).then(r => r.json())
  )
);

Use Cases

Sitemap Generation

Discover all pages and generate an XML sitemap automatically.

Link Auditing

Find all internal links to check for broken pages or orphaned content.

Pre-Crawl Planning

Map the site first, then use riddle_crawl to extract content from targeted pages.

Site Structure Analysis

Understand URL patterns and information architecture at a glance.