← Back to Docs

riddle_map

Discover all URLs on a website by crawling from a starting URL.

Overview

riddle_map crawls a website starting from a given URL and returns every discoverable page. Use it to build sitemaps, plan crawls, or audit site structure.

How it works

Starting from your URL, the crawler follows internal links breadth-first, respecting robots.txt by default. You control the max page count and can filter URLs with include/exclude patterns.

Endpoint

POST/v1/map
curl -X POST "https://api.riddledc.com/v1/map" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "max_pages": 500}'

Parameters

ParameterTypeDescription
urlstringRequired. Starting URL to crawl from.
max_pagesnumberMaximum pages to discover. Default: 500
include_patternsstring[]Only follow URLs matching these glob patterns. E.g. ["/blog/*"]
exclude_patternsstring[]Skip URLs matching these glob patterns. E.g. ["/admin/*"]
respect_robotsbooleanHonor robots.txt directives. Default: true

Response

{
  "urls": [
    "https://example.com",
    "https://example.com/about",
    "https://example.com/blog",
    "https://example.com/blog/first-post",
    "https://example.com/contact"
  ],
  "count": 5
}

Examples

JavaScript

const response = await fetch("https://api.riddledc.com/v1/map", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${RIDDLE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    url: "https://example.com",
    max_pages: 500
  })
});

const { urls, count } = await response.json();
console.log(`Found ${count} pages`);

Filtered Crawl

const { urls } = await fetch("https://api.riddledc.com/v1/map", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${RIDDLE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    url: "https://example.com",
    max_pages: 200,
    include_patterns: ["/blog/*", "/docs/*"],
    exclude_patterns: ["/admin/*", "/internal/*"]
  })
}).then(r => r.json());

Map Then Scrape

Combine with riddle_scrape to extract content from every discovered page:

// 1. Discover all pages
const { urls } = await fetch("https://api.riddledc.com/v1/map", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${RIDDLE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({ url: "https://example.com", max_pages: 100 })
}).then(r => r.json());

// 2. Scrape each page
const results = await Promise.all(
  urls.map(url =>
    fetch("https://api.riddledc.com/v1/scrape", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${RIDDLE_API_KEY}`,
        "Content-Type": "application/json"
      },
      body: JSON.stringify({ url })
    }).then(r => r.json())
  )
);

Use Cases

Sitemap Generation

Discover all pages and generate an XML sitemap automatically.

Link Auditing

Find all internal links to check for broken pages or orphaned content.

Pre-Crawl Planning

Map the site first, then use riddle_crawl to extract content from targeted pages.

Site Structure Analysis

Understand URL patterns and information architecture at a glance.