# riddle_map

Discover all URLs on a website by crawling from a starting URL.

Plain text for agents: https://riddledc.com/docs/map/markdown.md

## Overview

`riddle_map` crawls a website starting from a given URL and returns every discoverable page. Use it to build sitemaps, plan crawls, or audit site structure.

Starting from your URL, the crawler follows internal links breadth-first, respecting `robots.txt` by default. You control the max page count and can filter URLs with include and exclude patterns.

## Endpoint

POST /v1/map

```bash
curl -X POST "https://api.riddledc.com/v1/map" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "max_pages": 500}'
```

## Parameters

| Parameter | Type | Description |
| --- | --- | --- |
| `url` | string | Required. Starting URL to crawl from. |
| `max_pages` | number | Maximum pages to discover. Default: `500`. |
| `include_patterns` | string[] | Only follow URLs matching these glob patterns, such as `["/blog/*"]`. |
| `exclude_patterns` | string[] | Skip URLs matching these glob patterns, such as `["/admin/*"]`. |
| `respect_robots` | boolean | Honor `robots.txt` directives. Default: `true`. |

## Response

```json
{
  "urls": [
    "https://example.com",
    "https://example.com/about",
    "https://example.com/blog",
    "https://example.com/blog/first-post",
    "https://example.com/contact"
  ],
  "count": 5
}
```

## Examples

### JavaScript

```javascript
const response = await fetch("https://api.riddledc.com/v1/map", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${RIDDLE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    url: "https://example.com",
    max_pages: 500
  })
});

const { urls, count } = await response.json();
console.log(`Found ${count} pages`);
```

### Filtered Crawl

```javascript
const { urls } = await fetch("https://api.riddledc.com/v1/map", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${RIDDLE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    url: "https://example.com",
    max_pages: 200,
    include_patterns: ["/blog/*", "/docs/*"],
    exclude_patterns: ["/admin/*", "/internal/*"]
  })
}).then((response) => response.json());
```

## Use Cases

- Sitemap generation: discover all pages and generate an XML sitemap automatically.
- Link auditing: find internal links to check for broken pages or orphaned content.
- Pre-crawl planning: map the site first, then use `riddle_crawl` to extract content from targeted pages.
- Site structure analysis: understand URL patterns and information architecture at a glance.

