← Back to AI Agents

Agent Guide

Technical reference for AI agents using the RiddleDC API. Copy the markdown to include in your agent's context.

The Vision Loop

Your workflow for web tasks:

1. Screenshot the page
2. Send to vision LLM
3. Decide what to do
4. Execute action
5. Screenshot to verify
6. Repeat

Four Input Modes

The unified /v1/run endpoint supports four input modes:

URL Mode - Simple screenshot

POST /v1/run
{"url": "https://example.com"}

Returns job_id, poll for completion. Good for single page captures.

URLs Mode - Batch screenshots

POST /v1/run
{"urls": ["https://example1.com", "https://example2.com"]}

Up to 10 URLs per batch. Cost-effective: one 30s minimum covers all.

Steps Mode - JSON workflow (recommended for agents)

POST /v1/run
{"steps": [{"goto": "https://example.com"}, {"click": ".button"}, {"screenshot": "after-click"}]}

Structured JSON, no code strings. Easy for LLMs to generate.

Script Mode - Full Playwright

POST /v1/run
{"script": "await page.goto('https://example.com');\nawait page.click('.button');\nawait saveScreenshot('after-click');"}

Full Playwright access. Good for complex logic with loops and conditionals.

Use steps mode for most agent tasks. Use script mode when you need loops or conditional logic.

Viewport Strategy

SizeUse When
1920x1080Default. Maximum information density. Dashboards, data tables.
800x600Testing responsive layouts
375x667Mobile experience. Triggers mobile CSS breakpoints.
{"url": "...", "options": {"viewport": {"width": 375, "height": 667}}}
Critical: fullPage defaults to true. Your screenshot height will expand to capture ALL content. Set fullPage: false if you want viewport-only.

Zooming In on Details

If you can't read small text or need to focus on a specific element, use element screenshots or clip regions:

Steps Mode (recommended)

// Element screenshot - capture just a specific element
{"screenshot": {"label": "form-detail", "selector": "form.checkout"}}

// Clipped region - capture a specific area
{"screenshot": {"label": "hero", "clip": {"x": 0, "y": 0, "width": 1200, "height": 600}}}

// Viewport only (not full scrollable page)
{"screenshot": {"label": "above-fold", "fullPage": false}}

Script Mode (for complex logic)

await page.goto("https://example.com");
const element = await page.$("form");

// Option 1: Direct element screenshot (simplest)
await element.screenshot({path: "form.png"});

// Option 2: Clip with padding for context
const box = await element.boundingBox();
await page.screenshot({
  path: "form-with-context.png",
  clip: {
    x: box.x - 20,
    y: box.y - 20,
    width: box.width + 40,
    height: box.height + 40
  }
});

Both modes give you focused views. Steps mode is recommended since your agent outputs JSON, not code strings.

Debugging with console.json

All jobs capture browser console output by default. Every console.log(), JavaScript error, and network failure is saved.

Even simple screenshot jobs have logs. The job ID comes back in the X-Job-Id response header:

# Sync response includes X-Job-Id header
curl -D - "https://api.riddledc.com/v1/run" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"url": "https://example.com"}' -o screenshot.png

# X-Job-Id: job_abc123

# Check logs anytime:
curl "https://api.riddledc.com/v1/jobs/job_abc123/artifacts"

For script mode, add your own logging to trace execution:

console.log("Step 1: Navigating...");
await page.goto(url);

console.log("Step 2: Looking for button...");
const button = await page.$(".submit-btn");
console.log("Button found:", button ? "yes" : "no");

if (button) {
  console.log("Step 3: Clicking...");
  await page.click(".submit-btn");
}

await saveScreenshot("final-state");
console.log("Done!");

Debugging workflow: Screenshot looks wrong? Check console.json for JS errors. Page didn't load? Look for network failures. Selector not found? See what actually rendered.

Authentication for Protected Pages

Cookies (session-based auth)

{
  "url": "https://app.example.com/dashboard",
  "options": {
    "cookies": [
      {"name": "session", "value": "abc123", "domain": "app.example.com", "path": "/"}
    ]
  }
}

Headers (Bearer tokens, API keys)

{
  "url": "https://api.example.com/data",
  "options": {
    "headers": {
      "Authorization": "Bearer user-token-here"
    }
  }
}

localStorage (SPAs with token storage)

{
  "url": "https://spa.example.com",
  "options": {
    "localStorage": {
      "authToken": "eyJhbG..."
    }
  }
}
Tip: Cookie injection skips login flows entirely. This saves 10-30 seconds per job and reduces cost.

Common Gotchas

1. fullPage is true by default

// You request 375x667, you get 375x1800+
// Set fullPage: false if you want exact viewport
{"options": {"fullPage": false, "viewport": {"width": 375, "height": 667}}}

2. Both saveScreenshot() and page.screenshot() work

// Option 1: RiddleDC helper (recommended for clarity)
await saveScreenshot("my");

// Option 2: Standard Playwright - also works! Auto-saved to artifacts.
await page.screenshot({path: "my.png"});

3. Await your console.log values

// Wrong - logs "[object Promise]"
console.log("Title:", page.title());

// Right - logs actual title
console.log("Title:", await page.title());

4. Artifacts expire in 24 hours

Download what you need promptly. CDN URLs stop working after 24 hours.

Cost Optimization

ApproachCost
Single jobfrom $0.004 (30s minimum)
Job with 5 screenshots~$0.0008 per screenshot
Job with 10 screenshots~$0.0004 per screenshot

The 30-second minimum means one job costs the same whether you take 1 screenshot or 10. Pack more into each job when possible.

PDF Generation

Need a PDF instead of a screenshot? Use page.pdf():

await page.goto("https://example.com/report");
await page.pdf({
  path: "report.pdf",
  format: "A4",
  printBackground: true
});

PDFs are automatically saved to artifacts alongside screenshots.

Error Handling

When jobs fail, check:

  1. console.json - Shows logs up to failure point + exception stack trace
  2. error-screenshot-1.png - Visual state when error occurred
  3. Job status - Contains error code and message
{
  "status": "failed",
  "error": {
    "code": "SCRIPT_ERROR",
    "message": "Timeout 30000ms exceeded waiting for selector '.nonexistent'"
  }
}

Error codes: TIMEOUT, SCRIPT_ERROR, NAVIGATION_FAILED, UNAUTHORIZED, INVALID_URL, RATE_LIMIT_EXCEEDED

Summary

  1. Use /v1/run for everything - url, urls, steps, or script mode
  2. Steps mode is ideal for agents - structured JSON, no code strings
  3. Use element screenshots to zoom in: {"screenshot": {"label": "x", "selector": ".element"}}
  4. fullPage=true is default - set false if you need viewport control
  5. Always add console.log() in script mode - it's your debugging lifeline
  6. Cookie injection skips login flows and saves money
  7. Batch multiple URLs with urls mode to reduce cost below $0.001/screenshot
  8. Check console.json first when things fail
  9. Sync mode is default - PNG bytes returned directly (28s max)

You now have eyes. Use them wisely.