Your Agent Doesn't Need to See Every Step
Batching browser actions without flying blind
Part II of II • Part I: The Economics of "Chatty" Agents
In Part I, we talked about how agents can burn money and time by treating every click as a separate interaction with your browser infrastructure.
From: "one action = one request"
To: "one plan (many actions) = one request"
We used Riddle's /v1/run endpoint with steps or script mode to show how a single job can do multiple actions and screenshots inside one session, instead of paying per job.
This post is about the scary question that comes next:
When is it safe to batch? How do you keep your agent from flying blind, and what happens when a batched script fails halfway through?
The Default Pattern: The "Vision Loop"
- Screenshot
- LLM thinks
- Click
- Repeat
Every tiny step gets visual verification. It's the safest and simplest loop—but also the slowest and most expensive.
The contrarian take is: your agent doesn't need to see every step.
Most web flows have long deterministic stretches. If you click "Submit" on a form, you usually don't need five screenshots to know if it worked. The URL, DOM, and network responses already tell you a lot.
Screenshot → think → click → screenshot → think → click…
- Screenshot once at the start
- Run a deterministic sequence "blind"
- Screenshot once at the end to verify
The trick is knowing where you can safely do that.
What Makes a Step "Batch-Safe"?
A sequence of actions is "batch-safe" when the world is predictable enough that you don't need constant LLM supervision.
1. The Path Is Deterministic
- Filling a known form with fixed fields
- Clicking stable buttons/links with strong selectors
- Opening a menu and choosing a specific item
- Choosing options based on arbitrary page text
- Navigating content that's heavily A/B-tested or personalized
- Handling CAPTCHAs or login challenges
If the next step depends on free-form text the LLM has never seen, don't batch it. If it's "fill three fields and click Submit," you probably can.
2. Selectors Are Robust, Not Brittle
- Data attributes:
data-test="login-button" - Stable IDs/classes agreed on with the product team
- Explicit test hooks already used in QA
- Matching on raw button text that changes ("Continue" → "Next")
- Very long CSS selectors that encode layout details
- "Nth child of nth child" vibes
The more fragile the selector, the more you'll want intermediate checkpoints.
3. Success Is Observable Without Screenshots
- URL: e.g.
/dashboardvs/login - DOM: presence of a
[data-test='dashboard-root']element - Network: a 200 OK from
/api/dashboardwith a known JSON shape
When you can say "this step worked if X is true," you can safely bake that into the script.
4. The Blast Radius of Failure Is Small
- You're in test or staging environments
- Actions are idempotent or easily reversible
- The worst case is "we have to try again," not "we bought 10,000 widgets"
In high-risk flows (checkout, irreversible mutations), you can still batch—but you'll want more frequent checkpoints and tighter checks.
Heuristics for "I Need to Look Again"
You don't want a screenshot after every click, but you do want the script to know when it's time to wake the LLM up.
Unexpected Navigation
- URL changes to
/login,/error,/blocked, or another "danger" pattern - Domain or origin changes (unusual redirects)
DOM Divergence
- A required selector doesn't appear within a timeout
- An error banner shows up (
[role='alert'],.error-message)
Network Anomalies
- A key request returns 4xx/5xx
- The API returns an error shape instead of the expected payload
Timing Weirdness
- Spinner never disappears
- Page transition takes significantly longer than normal
- Take a fresh screenshot
- Capture context (URL, key network responses, relevant DOM snippets)
- Stop the script and return control to the agent
The LLM gets rich evidence right when the world stops being predictable.
A Concrete Pattern: Checks and Early Aborts
Let's turn this into something you can actually use.
We'll extend the basic step types with checks—small blocks that assert "the world looks how we expect" and decide whether to keep going.
// Step types for structured scripts
type Step =
| { goto: string }
| { screenshot: string }
| { fill: { selector: string; value: string } }
| { click: string }
| { waitFor: string; timeout?: number }
| { check: Check[]; onFail?: ("screenshot" | "abort")[] }
| { eval: string }; // escape hatch for raw Playwright
type Check =
| { urlIncludes: string }
| { urlExcludes: string }
| { selectorExists: string; timeout?: number }
| { selectorMissing: string }
| { noHttpErrors: string }; // URL pattern to watch
Here's a script that only screenshots at the boundaries, but has a rich mid-batch check:
const steps = [
// 1. Go to login & capture initial state
{ goto: "https://app.example.com/login" },
{ screenshot: "login-page" },
// 2. Fill login form and submit (no extra screenshots)
{ fill: { selector: "input[name='email']", value: "demo@example.com" } },
{ fill: { selector: "input[name='password']", value: "secret" } },
{ click: "button[type='submit']" },
// 3. Wait for dashboard
{ waitFor: "[data-test='dashboard-root']", timeout: 15000 },
// 4. CHECK: did we actually land on the dashboard?
{
check: [
{ urlIncludes: "/dashboard" },
{ selectorExists: "[data-test='dashboard-root']" },
{ noHttpErrors: "/api/dashboard" }
],
onFail: ["screenshot", "abort"]
},
// 5. If checks passed, continue "blind"
{ click: "[data-test='open-reports']" },
{ waitFor: "[data-test='report-row']", timeout: 10000 },
// 6. Final screenshot
{ screenshot: "dashboard-reports" }
];
const response = await fetch("https://api.riddledc.com/v1/run", {
method: "POST",
headers: {
"Authorization": \`Bearer \${API_KEY}\`,
"Content-Type": "application/json"
},
body: JSON.stringify({ steps })
});
const result = await response.json();
if (result.status === "failed") {
console.error("Failed at:", result.failedStep);
console.error("Reason:", result.reason);
// result.screenshots includes the failure screenshot
} else {
console.log("Success:", result.screenshots);
}
What This Buys You
Only two screenshots:
login-pageat the start,dashboard-reportsat the endMiddle is "blind but deterministic": fill, click, wait, click, wait—all inside the browser, not round-tripping through the LLM
Checks keep you honest: if you hit
/logininstead of/dashboard, or the dashboard root never appears, or/api/dashboardreturns 500, the script takes a screenshot, aborts, and returns a structured failure
From an agent's point of view, this is gold: a rich chunk of evidence at the exact moment things went sideways, not 20 tiny screenshots that all look the same.
How the Agent Should Think About Batched Scripts
- Plan in segments, not microscopic steps
- "Segment 1: log in"
- "Segment 2: open report"
- "Segment 3: export CSV"
- For each segment, ask:
- Is this path deterministic?
- Do I have strong selectors and simple success checks?
- What are reasonable failure heuristics?
- Emit a script for the whole segment, with:
- One screenshot at the start
- One screenshot at the end
- Optional checks in the middle that can abort and screenshot on failure
- On failure, use the returned context to:
- Diagnose why the checks failed
- Adjust selectors, expectations, or the overall strategy
- Try again, or fall back to a more cautious, vision-heavy approach
You're using expensive LLM tokens where the world is uncertain and cheap browser time where it isn't.
Putting It Together
Part I (Economics): Stop paying per click. Think in sessions. Pack as much work as you can into a single API call. Time is cheap after the first 30 seconds.
Part II (Safety & Design): Your agent doesn't need to see every step. Batch deterministic segments, add lightweight checks, and wake the LLM only when something looks off.
If you're building browser agents and you're tired of babysitting every mouse move, try structuring your next workflow as:
One screenshot → one plan → one API call → one rich result.
Let the browser handle the boring, deterministic work—so your agent can save its intelligence for the parts where the world really is uncertain.