[<- Back to Home](/)
# The Economics of "Chatty" Agents (And How to Fix Them)

Browser-based agents are incredible--but they're also really good at burning money and time if you structure them the wrong way.

Part I of II * [Part II: Your Agent Doesn't Need to See Every Step](/blog/batching-heuristics)

**Update:** All examples use `/v1/run`, which is sync by default--send a request, get results back directly. For AI agents, we recommend [steps mode](/docs#run-api) (JSON) instead of script mode (Playwright code).

Most people obsess over LLM token cost. That matters. But the other half of the system--the browser infrastructure, orchestration, screenshots, network hops--is usually where the bill quietly gets ugly.

This post is about fixing that.

Here we focus on the economics and architecture of batching work into a single browser session.
In [Part II](/blog/batching-heuristics) we'll talk about when it's safe to batch and how to handle failures mid-script.

## The "Vision Loop" That Burns Money and Time

The default pattern for browser agents right now looks something like:

- Take a screenshot
- Send it to the LLM

- Decide the next move
- Execute one click

- Repeat

Every tiny action gets its own screenshot and its own round-trip to your browser infrastructure.

It's safe and simple. It's also:

- **Slow** - you're paying the latency cost of network + orchestration on every click.
- **Expensive** - if you treat each step as a separate "interaction," you pay full overhead for each one.

If your infra charges per job and an agent treats every step as a separate job, costs multiply fast. 50 separate jobs at ~$0.004 each = $0.20 in browser costs for one workflow--before you even count LLM tokens.

That's the "chatty" part: the agent is constantly talking to your browser service when it could be quietly doing work.

## Rethinking the Unit of Work: Sessions, Not Clicks

The fix starts with changing the unit of work from:

"One action = one request"

to:

"One plan (many actions) = one request"

Instead of a loop of "screenshot -> think -> click -> screenshot -> …", you want:

- Look at the page once
- Plan a sequence of deterministic actions

- Execute that plan in one go
- Get back the evidence (screenshots, network events, DOM state) at the end

That's the architecture we built into the `/v1/run` endpoint at Riddle.

## One Plan, Many Actions

`/v1/run` lets you send a plan describing what the browser should do, instead of micromanaging every click from the outside.

Instead of this:

- Request 1: "Go to homepage and screenshot"
- Request 2: "Type in search query and screenshot"

- Request 3: "Click Search and screenshot"
- Request 4: "Wait for results and screenshot"

…you send one request with all the steps:

### Steps Mode (JSON, ideal for LLM-generated workflows)

```
{
  "steps": [
    {"goto": "https://example.com"},
    {"screenshot": "homepage"},
    {"fill": {"selector": "input[name='q']", "value": "riddledc"}},
    {"click": "button[type='submit']"},
    {"waitFor": "#results"},
    {"screenshot": "search-results"}
  ]
}
```

### Script Mode (Playwright, for complex logic)

```
// All of this runs in a single API call
await page.goto("https://example.com");
await saveScreenshot("homepage");

await page.fill("input[name='q']", "riddledc");
await page.click("button[type='submit']");

await page.waitForSelector("#results", { timeout: 10000 });
await saveScreenshot("search-results");
```

Here's a complete TypeScript example showing how to call the API:

```
async function runSearchFlow() {
  const script = \`
    await page.goto("https://example.com");
    await saveScreenshot("homepage");

    await page.fill("input[name='q']", "riddledc");
    await page.click("button[type='submit']");

    await page.waitForSelector("#results", { timeout: 10000 });
    await saveScreenshot("search-results");
  \`;

  // Submit the script job
  const submitRes = await fetch("https://api.riddledc.com/v1/run", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": \`Bearer \${process.env.RIDDLEDC_API_KEY}\`,
    },
    body: JSON.stringify({
      script,
      timeout_sec: 60,
    }),
  });

  const { job_id, status_url, billing } = await submitRes.json();
  console.log(\`Job submitted: \${job_id}\`);
  console.log(\`Estimated cost: $\${billing.estimated_cost_dollars}\`);

  // Poll for completion
  let artifacts;
  while (true) {
    const statusRes = await fetch(\`https://api.riddledc.com\${status_url}\`, {
      headers: { "Authorization": \`Bearer \${process.env.RIDDLEDC_API_KEY}\` },
    });
    const status = await statusRes.json();

    if (status.status === "completed") {
      // Fetch artifacts
      const artifactsRes = await fetch(
        \`https://api.riddledc.com/v1/jobs/\${job_id}/artifacts\`,
        { headers: { "Authorization": \`Bearer \${process.env.RIDDLEDC_API_KEY}\` } }
      );
      artifacts = await artifactsRes.json();
      break;
    }

    if (status.status === "failed") {
      throw new Error(\`Job failed: \${status.error?.message}\`);
    }

    await new Promise(r => setTimeout(r, 1000));
  }

  console.log("Artifacts:", artifacts);
  // artifacts.artifacts = [
  //   { name: "homepage.png", url: "https://cdn.riddledc.com/..." },
  //   { name: "search-results.png", url: "https://cdn.riddledc.com/..." },
  //   { name: "console.json", url: "https://cdn.riddledc.com/..." },
  //   { name: "network.har", url: "https://cdn.riddledc.com/..." }
  // ]
}
```

From the LLM's perspective, it still gets everything it needs:

- A "before" screenshot (homepage)
- An "after" screenshot (results page)

- Console logs for debugging
- Network HAR for request/response data

It just doesn't have to babysit every keypress.

## Our Pricing: Why Batching Wins

All of this matters because of how the economics work.

At Riddle, browser time costs **$0.50 per hour**, billed per second with a **30-second minimum** per job.

That means:

- Minimum job cost: ~$0.0042 (30s × $0.50/hr)
- ≈ $0.0083 per minute after that

- ≈ $0.25 for a 30-minute session

Two key implications:

### 1. Pack Work Into the 30-Second Minimum

If your agent makes one API call to take one screenshot and then stops, that job costs ~$0.004.

If that same agent makes one API call to run a workflow that performs 10 actions and 5 screenshots inside 30 seconds, it's still ~$0.004.

You've turned one job into a small workflow instead of paying for 5 separate jobs.

### 2. The Real Freedom Is After 30 Seconds

The more important nuance: you don't need to race the clock.

Because additional time is billed so cheaply, you can let the browser stay warm and grind through a complex workflow without feeling like every extra second is killing your margins.

**Example:**

A 30-minute session that navigates a deep menu structure, waits on slow pages, and extracts data across multiple tabs…

Total browser cost ≈ $0.25

That's basically a dedicated browser worker for pennies.

So the design goal shifts from "How do I keep this under 30 seconds?" to:

"How much useful work can I cram into one session, knowing that extra minutes are cheap?"

## Faster Agents, Not Just Cheaper Ones

Batching doesn't just save money--it saves time in wall-clock terms.

Every separate request in a traditional "vision loop" pays for:

- Network latency from your agent to your browser service
- Queueing / scheduling overhead on the infra side

- Browser-side orchestration and warm-up

By moving to **one plan, many actions**:

- You pay those costs once per workflow, not once per click.
- The browser stays warm and "in context" as it moves through your script.

- Your LLM waits for a single, rich result instead of 20 incremental updates.

That means:

- **Lower latency** per completed task
- **Higher throughput** for the same infra footprint

- **Less complexity** in your agent implementation (fewer moving parts, fewer retries)

## Fidelity Without Constant Babysitting

A natural fear is: "If I batch actions, doesn't my agent fly blind?"

It doesn't have to.

Because each job can return:

- Multiple screenshots (before, after, and anywhere you choose)
- Network HAR with all requests/responses

- Console logs with timestamps

…you maintain a high-fidelity view of what happened; you just compress it into fewer, richer checkpoints.

Your agent can still:

- Verify that the right form was filled
- Confirm that the correct results page loaded

- Inspect error states and UI changes

It just does that verification at meaningful boundaries, not after every impatient click.

In [Part II](/blog/batching-heuristics) we'll go deeper on this: when it's safe to batch steps, what heuristics tell you "I need to look again," and how to handle failures mid-batch without losing debuggability.

## Putting It All Together

If you're building browser agents today, there are three big takeaways:

- **Per-click mental models are expensive.**

Stop thinking in "interactions." Think in sessions.

- **Batching is almost always a win.**

Packing the first 30 seconds is great--but the real superpower is how cheap time is after that. Let the browser do real work.

- **One plan, many actions is the right default.**

Your agent doesn't need to see every step. It needs good evidence at the right moments.

## Try It Yourself

Here's a one-liner to test the pattern:

```
curl -X POST "https://api.riddledc.com/v1/run" \\
  -H "Authorization: Bearer $RIDDLEDC_API_KEY" \\
  -H "Content-Type: application/json" \\
  -d '{
    "script": "await page.goto(\"https://example.com\"); await saveScreenshot(\"test\");",
    "timeout_sec": 30
  }'
```

Give your agent one plan instead of 50 tiny requests--and see how much money and time you stop burning.

## Ready to Build Smarter Agents?

[Get Started](/register)
[API Docs](/docs)
[Read Part II](/blog/batching-heuristics)