← Back to Docs
riddle_scrape
Extract structured content from any URL. Returns title, description, markdown body, links, headings, and word count.
Overview
riddle_scrape fetches a URL and extracts its content into a clean, structured format. The page is rendered with a real browser, so JavaScript-heavy sites work out of the box.
What you get back
Page title, meta description, full body as Markdown, all links, heading structure, and word count — in a single API call.
Endpoint
POST
/v1/scrapecurl -X POST "https://api.riddledc.com/v1/scrape" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'Parameters
| Parameter | Type | Description |
|---|---|---|
url | string | Required. The URL to scrape. |
cookies | object[] | Cookies to set before loading. Each: { name, value, domain } |
localStorage | object | Key-value pairs injected into localStorage before page load. |
headers | object | Custom HTTP headers sent with the request. |
stealth | boolean | Use the Patchright engine to bypass bot detection. Default: false |
proxy | string | Proxy URL to route the request through. |
Response
{
"title": "Example Domain",
"description": "This domain is for use in illustrative examples.",
"markdown": "# Example Domain\n\nThis domain is for use in...",
"links": [
{ "href": "https://www.iana.org/domains/example", "text": "More information..." }
],
"headings": [
{ "level": 1, "text": "Example Domain" }
],
"word_count": 42
}Examples
JavaScript
const response = await fetch("https://api.riddledc.com/v1/scrape", {
method: "POST",
headers: {
"Authorization": `Bearer ${RIDDLE_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({ url: "https://example.com" })
});
const data = await response.json();
console.log(data.title); // "Example Domain"
console.log(data.word_count); // 42With Authentication
const data = await fetch("https://api.riddledc.com/v1/scrape", {
method: "POST",
headers: {
"Authorization": `Bearer ${RIDDLE_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
url: "https://app.example.com/dashboard",
cookies: [{ name: "session", value: "abc123", domain: "app.example.com" }],
localStorage: { "auth_token": "eyJhbG..." },
headers: { "X-Custom-Header": "value" }
})
}).then(r => r.json());Stealth Mode
const data = await fetch("https://api.riddledc.com/v1/scrape", {
method: "POST",
headers: {
"Authorization": `Bearer ${RIDDLE_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
url: "https://protected-site.com",
stealth: true
})
}).then(r => r.json());Use Cases
Content Extraction
Pull article text, blog posts, or documentation into your pipeline as clean Markdown.
SEO Analysis
Extract titles, descriptions, headings, and word counts for SEO audits at scale.
Feed Building
Scrape multiple pages and assemble structured feeds from any website.
Link Discovery
Extract all links from a page for crawling, auditing, or graph building.