The Economics of "Chatty" Agents (And How to Fix Them)
Browser-based agents are incredible—but they're also really good at burning money and time if you structure them the wrong way.
Part I of II • Part II: Your Agent Doesn't Need to See Every Step
Update: All examples use /v1/run, which is sync by default—send a request, get results back directly. For AI agents, we recommend steps mode (JSON) instead of script mode (Playwright code).
Most people obsess over LLM token cost. That matters. But the other half of the system—the browser infrastructure, orchestration, screenshots, network hops—is usually where the bill quietly gets ugly.
This post is about fixing that.
Here we focus on the economics and architecture of batching work into a single browser session. In Part II we'll talk about when it's safe to batch and how to handle failures mid-script.
The "Vision Loop" That Burns Money and Time
The default pattern for browser agents right now looks something like:
- Take a screenshot
- Send it to the LLM
- Decide the next move
- Execute one click
- Repeat
Every tiny action gets its own screenshot and its own round-trip to your browser infrastructure.
It's safe and simple. It's also:
- Slow – you're paying the latency cost of network + orchestration on every click.
- Expensive – if you treat each step as a separate "interaction," you pay full overhead for each one.
If your infra charges per job and an agent treats every step as a separate job, costs multiply fast. 50 separate jobs at ~$0.004 each = $0.20 in browser costs for one workflow—before you even count LLM tokens.
That's the "chatty" part: the agent is constantly talking to your browser service when it could be quietly doing work.
Rethinking the Unit of Work: Sessions, Not Clicks
The fix starts with changing the unit of work from:
"One action = one request"
to:
"One plan (many actions) = one request"
Instead of a loop of "screenshot → think → click → screenshot → …", you want:
- Look at the page once
- Plan a sequence of deterministic actions
- Execute that plan in one go
- Get back the evidence (screenshots, network events, DOM state) at the end
That's the architecture we built into the /v1/run endpoint at Riddle.
One Plan, Many Actions
/v1/run lets you send a plan describing what the browser should do, instead of micromanaging every click from the outside.
Instead of this:
- Request 1: "Go to homepage and screenshot"
- Request 2: "Type in search query and screenshot"
- Request 3: "Click Search and screenshot"
- Request 4: "Wait for results and screenshot"
…you send one request with all the steps:
Steps Mode (JSON, ideal for LLM-generated workflows)
{
"steps": [
{"goto": "https://example.com"},
{"screenshot": "homepage"},
{"fill": {"selector": "input[name='q']", "value": "riddledc"}},
{"click": "button[type='submit']"},
{"waitFor": "#results"},
{"screenshot": "search-results"}
]
}Script Mode (Playwright, for complex logic)
// All of this runs in a single API call
await page.goto("https://example.com");
await saveScreenshot("homepage");
await page.fill("input[name='q']", "riddledc");
await page.click("button[type='submit']");
await page.waitForSelector("#results", { timeout: 10000 });
await saveScreenshot("search-results");Here's a complete TypeScript example showing how to call the API:
async function runSearchFlow() {
const script = `
await page.goto("https://example.com");
await saveScreenshot("homepage");
await page.fill("input[name='q']", "riddledc");
await page.click("button[type='submit']");
await page.waitForSelector("#results", { timeout: 10000 });
await saveScreenshot("search-results");
`;
// Submit the script job
const submitRes = await fetch("https://api.riddledc.com/v1/run", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer ${process.env.RIDDLEDC_API_KEY}`,
},
body: JSON.stringify({
script,
timeout_sec: 60,
}),
});
const { job_id, status_url, billing } = await submitRes.json();
console.log(`Job submitted: ${job_id}`);
console.log(`Estimated cost: $${billing.estimated_cost_dollars}`);
// Poll for completion
let artifacts;
while (true) {
const statusRes = await fetch(`https://api.riddledc.com${status_url}`, {
headers: { "Authorization": `Bearer ${process.env.RIDDLEDC_API_KEY}` },
});
const status = await statusRes.json();
if (status.status === "completed") {
// Fetch artifacts
const artifactsRes = await fetch(
`https://api.riddledc.com/v1/jobs/${job_id}/artifacts`,
{ headers: { "Authorization": `Bearer ${process.env.RIDDLEDC_API_KEY}` } }
);
artifacts = await artifactsRes.json();
break;
}
if (status.status === "failed") {
throw new Error(`Job failed: ${status.error?.message}`);
}
await new Promise(r => setTimeout(r, 1000));
}
console.log("Artifacts:", artifacts);
// artifacts.artifacts = [
// { name: "homepage.png", url: "https://cdn.riddledc.com/..." },
// { name: "search-results.png", url: "https://cdn.riddledc.com/..." },
// { name: "console.json", url: "https://cdn.riddledc.com/..." },
// { name: "network.har", url: "https://cdn.riddledc.com/..." }
// ]
}From the LLM's perspective, it still gets everything it needs:
- A "before" screenshot (homepage)
- An "after" screenshot (results page)
- Console logs for debugging
- Network HAR for request/response data
It just doesn't have to babysit every keypress.
Our Pricing: Why Batching Wins
All of this matters because of how the economics work.
At Riddle, browser time costs $0.50 per hour, billed per second with a 30-second minimum per job.
That means:
- Minimum job cost: ~$0.0042 (30s × $0.50/hr)
- ≈ $0.0083 per minute after that
- ≈ $0.25 for a 30-minute session
Two key implications:
1. Pack Work Into the 30-Second Minimum
If your agent makes one API call to take one screenshot and then stops, that job costs ~$0.004.
If that same agent makes one API call to run a workflow that performs 10 actions and 5 screenshots inside 30 seconds, it's still ~$0.004.
You've turned one job into a small workflow instead of paying for 5 separate jobs.
2. The Real Freedom Is After 30 Seconds
The more important nuance: you don't need to race the clock.
Because additional time is billed so cheaply, you can let the browser stay warm and grind through a complex workflow without feeling like every extra second is killing your margins.
Example:
A 30-minute session that navigates a deep menu structure, waits on slow pages, and extracts data across multiple tabs…
Total browser cost ≈ $0.25
That's basically a dedicated browser worker for pennies.
So the design goal shifts from "How do I keep this under 30 seconds?" to:
"How much useful work can I cram into one session, knowing that extra minutes are cheap?"
Faster Agents, Not Just Cheaper Ones
Batching doesn't just save money—it saves time in wall-clock terms.
Every separate request in a traditional "vision loop" pays for:
- Network latency from your agent to your browser service
- Queueing / scheduling overhead on the infra side
- Browser-side orchestration and warm-up
By moving to one plan, many actions:
- You pay those costs once per workflow, not once per click.
- The browser stays warm and "in context" as it moves through your script.
- Your LLM waits for a single, rich result instead of 20 incremental updates.
That means:
- Lower latency per completed task
- Higher throughput for the same infra footprint
- Less complexity in your agent implementation (fewer moving parts, fewer retries)
Fidelity Without Constant Babysitting
A natural fear is: "If I batch actions, doesn't my agent fly blind?"
It doesn't have to.
Because each job can return:
- Multiple screenshots (before, after, and anywhere you choose)
- Network HAR with all requests/responses
- Console logs with timestamps
…you maintain a high-fidelity view of what happened; you just compress it into fewer, richer checkpoints.
Your agent can still:
- Verify that the right form was filled
- Confirm that the correct results page loaded
- Inspect error states and UI changes
It just does that verification at meaningful boundaries, not after every impatient click.
In Part II we'll go deeper on this: when it's safe to batch steps, what heuristics tell you "I need to look again," and how to handle failures mid-batch without losing debuggability.
Putting It All Together
If you're building browser agents today, there are three big takeaways:
- Per-click mental models are expensive.
Stop thinking in "interactions." Think in sessions. - Batching is almost always a win.
Packing the first 30 seconds is great—but the real superpower is how cheap time is after that. Let the browser do real work. - One plan, many actions is the right default.
Your agent doesn't need to see every step. It needs good evidence at the right moments.
Try It Yourself
Here's a one-liner to test the pattern:
curl -X POST "https://api.riddledc.com/v1/run" \
-H "Authorization: Bearer $RIDDLEDC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"script": "await page.goto("https://example.com"); await saveScreenshot("test");",
"timeout_sec": 30
}'Give your agent one plan instead of 50 tiny requests—and see how much money and time you stop burning.