# riddle_scrape

Extract structured content from any URL. Returns title, description, markdown body, links, headings, and word count.

Plain text for agents: https://riddledc.com/docs/scrape/markdown.md

## Overview

`riddle_scrape` fetches a URL and extracts its content into a clean, structured format. The page is rendered with a real browser, so JavaScript-heavy sites work out of the box.

You get page title, meta description, full body as Markdown, all links, heading structure, and word count in a single API call.

## Endpoint

POST /v1/scrape

```bash
curl -X POST "https://api.riddledc.com/v1/scrape" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'
```

## Parameters

| Parameter | Type | Description |
| --- | --- | --- |
| `url` | string | Required. The URL to scrape. |
| `cookies` | object[] | Cookies to set before loading. Each object includes `name`, `value`, and `domain`. |
| `localStorage` | object | Key-value pairs injected into localStorage before page load. |
| `headers` | object | Custom HTTP headers sent with the request. |
| `stealth` | boolean | Use the Patchright engine to bypass bot detection. Default: `false`. |
| `proxy` | string | Proxy URL to route the request through. |

## Response

```json
{
  "title": "Example Domain",
  "description": "This domain is for use in illustrative examples.",
  "markdown": "# Example Domain\n\nThis domain is for use in...",
  "links": [
    { "href": "https://www.iana.org/domains/example", "text": "More information..." }
  ],
  "headings": [
    { "level": 1, "text": "Example Domain" }
  ],
  "word_count": 42
}
```

## Examples

### JavaScript

```javascript
const response = await fetch("https://api.riddledc.com/v1/scrape", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${RIDDLE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({ url: "https://example.com" })
});

const data = await response.json();
console.log(data.title);
console.log(data.word_count);
```

### With Authentication

```javascript
const data = await fetch("https://api.riddledc.com/v1/scrape", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${RIDDLE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    url: "https://app.example.com/dashboard",
    cookies: [{ name: "session", value: "abc123", domain: "app.example.com" }],
    localStorage: { "auth_token": "eyJhbG..." },
    headers: { "X-Custom-Header": "value" }
  })
}).then((response) => response.json());
```

## Use Cases

- Content extraction: pull article text, blog posts, or documentation into your pipeline as clean Markdown.
- SEO analysis: extract titles, descriptions, headings, and word counts for SEO audits at scale.
- Feed building: scrape multiple pages and assemble structured feeds from any website.
- Link discovery: extract all links from a page for crawling, auditing, or graph building.

