← Back to Home

Your Agent's Browser

Stop managing browsers. Elevate your agents.

One API call. Many actions. Rich results.

Structured Steps API

Your agent generates JSON steps. We execute them and return rich results.

const response = await fetch("https://api.riddledc.com/v1/run", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    steps: [
      { goto: "https://example.com/login" },
      { fill: { selector: "#email", value: "user@example.com" } },
      { fill: { selector: "#password", value: "secret" } },
      { click: "button[type=submit]" },
      { waitForUrl: "**/dashboard**" },
      { screenshot: "dashboard" }
    ]
  })
});

// Sync by default - results returned directly
const { status, screenshots, console: logs } = await response.json();
// screenshots.dashboard = "https://cdn.riddledc.com/..."

Rich results

Get back screenshots, console logs, network HAR, assertion results, and downloaded files—not just pixels.

No Playwright required

Your agent generates JSON, not code. No syntax errors, no script injection risks.

Assert without vision

Use assert steps to check page state. Only screenshot when you need to.

Pack work into sessions

30s minimum per job. Do multiple navigations and screenshots in one call for maximum value.

The Problem with Vision Loops

The screenshot-per-action pattern is expensive and slow:

1Screenshot page
2Send to vision LLM
3Parse action
4Execute & repeat

Each browser call starts at ~$0.004 + vision API costs. A 50-step task with naive one-call-per-step architecture can cost $0.20+ in browser time alone. The fix: batch deterministic steps into single calls, screenshot only at decision points.

What Slows Your Agent Down

The Login Loop

Every screenshot restarts the browser. Every restart loses cookies. Your agent wastes 30+ seconds re-authenticating for each action.

Chrome Memory Instability

Running headless Chrome locally eats RAM. Parallel agents crash. Memory leaks accumulate. Your agent becomes unreliable after a few hundred steps.

Infrastructure Complexity

Docker containers, browser pools, scaling—you want to build agent logic, not manage Chrome infrastructure.

Cost Accumulation

Vision API calls run $0.01-0.02 per image. Your browser costs pile on top. Browser costs should be noise, not a line item.

Assert Before You Screenshot

Check page state without burning vision API credits. Only screenshot when you need human-level understanding.

const response = await fetch("https://api.riddledc.com/v1/run", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    steps: [
      { goto: "https://example.com/checkout" },
      // Check if we're on the right page without screenshotting
      { assert: [{ selectorExists: ".checkout-form" }], onFail: ["screenshot", "abort"] },
      // Fill the form
      { fill: { selector: "#card-number", value: "4242424242424242" } },
      { fill: { selector: "#expiry", value: "12/25" } },
      { click: "button.submit-payment" },
      // Assert success, only screenshot if something went wrong
      { assert: [{ urlIncludes: "/confirmation" }], onFail: ["screenshot", "abort"] },
      // Success! Screenshot the confirmation
      { screenshot: "confirmation" }
    ]
  })
});

// If all asserts pass: 1 screenshot
// If any assert fails: screenshot of failure state + abort

Or Just Grab a Screenshot

Don't need multi-step flows? Get a PNG in one call.

curl -X POST "https://api.riddledc.com/v1/run" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "sync": true}' \
  -o screenshot.png

# PNG bytes. 3-4 seconds. No polling.

Skip the Login Loop

Inject cookies or headers. Your agent authenticates once, screenshots forever.

# Pass session cookies - skip login entirely
curl -X POST "https://api.riddledc.com/v1/run" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://app.example.com/dashboard",
    "options": {
      "cookies": [
        {"name": "session_id", "value": "abc123", "domain": "app.example.com"}
      ]
    }
  }' -o dashboard.png

# Or use Bearer tokens for API-protected pages
curl -X POST "https://api.riddledc.com/v1/run" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://app.example.com/dashboard",
    "options": {
      "headers": {"Authorization": "Bearer YOUR_APP_TOKEN"}
    }
  }' -o dashboard.png

Your agent logs in once, extracts the session cookie, then passes it to every Riddle call. No more 30-second login loops burning tokens and time.Full auth guide →

Batch for Sub-Penny Screenshots

Need multiple screenshots? One call, multiple URLs, one billing minimum.

# 5 screenshots, one API call, ~$0.0008 each
curl -X POST "https://api.riddledc.com/v1/run" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com",
      "https://example.com/pricing",
      "https://example.com/docs",
      "https://example.com/about",
      "https://example.com/contact"
    ]
  }'

# Returns job_id, poll for screenshots
# Total cost: ~$0.004 for all 5

Python Integration

No SDK needed. Standard HTTP requests.

import requests
import base64

def screenshot_for_vision(url, api_key, cookies=None):
    """Take screenshot, return base64 for vision LLM."""
    body = {"url": url}
    if cookies:
        body["options"] = {"cookies": cookies}

    response = requests.post(
        "https://api.riddledc.com/v1/run",
        headers={"Authorization": f"Bearer {api_key}"},
        json=body
    )
    return base64.b64encode(response.content).decode()

# Use with GPT-4V
screenshot_b64 = screenshot_for_vision(
    "https://example.com",
    API_KEY,
    cookies=[{"name": "session", "value": "abc123", "domain": "example.com"}]
)

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What actions are available on this page?"},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{screenshot_b64}"}}
        ]
    }]
)

TypeScript / Node.js Integration

For LangChainJS, Next.js, or any Node-based agent.

async function screenshotForVision(
  url: string,
  apiKey: string,
  cookies?: { name: string; value: string; domain: string }[]
): Promise<string> {
  const body: Record<string, unknown> = { url };
  if (cookies) body.options = { cookies };

  const response = await fetch("https://api.riddledc.com/v1/run", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${apiKey}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify(body),
  });

  const buffer = await response.arrayBuffer();
  return Buffer.from(buffer).toString("base64");
}

// Use with OpenAI SDK
import OpenAI from "openai";
const openai = new OpenAI();

const screenshotB64 = await screenshotForVision(
  "https://example.com",
  process.env.RIDDLE_API_KEY!,
  [{ name: "session", value: "abc123", domain: "example.com" }]
);

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{
    role: "user",
    content: [
      { type: "text", text: "What actions are available on this page?" },
      { type: "image_url", image_url: { url: `data:image/png;base64,${screenshotB64}` } }
    ]
  }]
});

Works With Your Stack

Browser-Use

Replace local Chrome with Riddle API calls. Same observe-think-act loop, no infrastructure.

LangChain

Custom tool wrapper for PlayWrightBrowserToolkit. Simpler than managing browser pools.

CrewAI

Screenshot tool for your crew's web research and verification tasks.

Custom Agents

Simple REST API. Works with any language, any framework. No SDK required.

Need More Control?

For multi-step workflows, use script mode. Navigate, click, fill forms, then screenshot.

# Login flow + screenshot in one call
curl -X POST "https://api.riddledc.com/v1/run" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "script": "await page.goto("https://app.example.com/login");
await page.fill("input[name=email]", "user@example.com");
await page.fill("input[name=password]", "password123");
await page.click("button[type=submit]");
await page.waitForURL("**/dashboard");
await saveScreenshot("dashboard");"
  }'

Script mode runs full Playwright scripts server-side. Great for complex workflows, but most agents should use steps mode or url mode with cookie injection.

vs. Self-Hosted Chrome

Self-Hosted PuppeteerRiddle API
Memory per agent500MB-2GB0 (API call)
Parallel agentsLimited by RAMUnlimited
Setup timeHours (Docker, deps)Minutes (API key)
Session persistenceComplex poolingCookie injection
Cost per job~$0.001 + infra + your timefrom $0.004 (<$0.001/screenshot batched)

Pricing

from $0.004
Per job

30s minimum, sync by default

<$0.001
Per screenshot

Multiple screenshots per job

Browser costs should be ~5% of your LLM spend, not a major line item. No subscriptions. Pay for what you use.

Agent Guide

Building an agent? Get the complete technical reference—copy-paste ready for your agent's context.

Give Your Agent a Browser

Create an account and start making requests in minutes.