← Back to Docs

riddle_scrape

Extract structured content from any URL. Returns title, description, markdown body, links, headings, and word count.

Overview

riddle_scrape fetches a URL and extracts its content into a clean, structured format. The page is rendered with a real browser, so JavaScript-heavy sites work out of the box.

What you get back

Page title, meta description, full body as Markdown, all links, heading structure, and word count — in a single API call.

Endpoint

POST/v1/scrape
curl -X POST "https://api.riddledc.com/v1/scrape" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Parameters

ParameterTypeDescription
urlstringRequired. The URL to scrape.
cookiesobject[]Cookies to set before loading. Each: { name, value, domain }
localStorageobjectKey-value pairs injected into localStorage before page load.
headersobjectCustom HTTP headers sent with the request.
stealthbooleanUse the Patchright engine to bypass bot detection. Default: false
proxystringProxy URL to route the request through.

Response

{
  "title": "Example Domain",
  "description": "This domain is for use in illustrative examples.",
  "markdown": "# Example Domain\n\nThis domain is for use in...",
  "links": [
    { "href": "https://www.iana.org/domains/example", "text": "More information..." }
  ],
  "headings": [
    { "level": 1, "text": "Example Domain" }
  ],
  "word_count": 42
}

Examples

JavaScript

const response = await fetch("https://api.riddledc.com/v1/scrape", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${RIDDLE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({ url: "https://example.com" })
});

const data = await response.json();
console.log(data.title);      // "Example Domain"
console.log(data.word_count);  // 42

With Authentication

const data = await fetch("https://api.riddledc.com/v1/scrape", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${RIDDLE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    url: "https://app.example.com/dashboard",
    cookies: [{ name: "session", value: "abc123", domain: "app.example.com" }],
    localStorage: { "auth_token": "eyJhbG..." },
    headers: { "X-Custom-Header": "value" }
  })
}).then(r => r.json());

Stealth Mode

const data = await fetch("https://api.riddledc.com/v1/scrape", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${RIDDLE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    url: "https://protected-site.com",
    stealth: true
  })
}).then(r => r.json());

Use Cases

Content Extraction

Pull article text, blog posts, or documentation into your pipeline as clean Markdown.

SEO Analysis

Extract titles, descriptions, headings, and word counts for SEO audits at scale.

Feed Building

Scrape multiple pages and assemble structured feeds from any website.

Link Discovery

Extract all links from a page for crawling, auditing, or graph building.