Evidence Over Summaries
AI agents can change web apps quickly. Browser-facing changes need proof a reviewer can inspect.
The Trust Gap
A coding agent can edit a route, wire a control, fix a layout, or change a checkout flow. Then it can write a confident paragraph saying the work is done.
That paragraph is not enough. For browser-facing work, the reviewer needs to know what actually loaded, what the agent clicked, what changed on screen, whether the console stayed clean, and where the run got stuck if it failed.
The missing layer is not another test runner or another coding agent. It is a small evidence contract between agent output and human review.
What Riddle Proof Does
Riddle Proof turns a browser-visible change into a proof bundle. The workflow is deliberately plain:
- The agent states the browser behavior it needs to prove.
- Riddle runs the app through static preview, server preview, build preview, or a live URL.
- A Playwright-style proof script exercises the important route and user flow.
- Riddle captures screenshots, JSON artifacts, HAR, console output, and script results.
- The agent reports a compact verdict with the runtime phase trail.
The output is something a human can skim and another agent can parse. That is the important shape: evidence for people, structure for automation.
The Proof Bundle
A useful proof bundle says what was tested and gives the reviewer enough material to debug without rerunning the job. It should include the preview, the criteria, the verdict, the artifacts, and the latest phase.
{
"version": "riddle-proof.agent-change.v1",
"status": "complete",
"phase": "complete",
"phase_updated_at": "2026-04-20T17:20:00.000Z",
"phase_details": { "outputs": 5 },
"preview": {
"mode": "server-preview",
"url": "https://preview.riddledc.com/s/sp_1234abcd/",
"route": "/checkout"
},
"proof": {
"ok": true,
"assertions": [
"checkout page loads",
"submit button stays enabled",
"success message appears",
"no console errors"
]
},
"result": {
"ok": true,
"checks": {
"heading": "Checkout",
"successMessage": "Order received"
}
},
"artifacts": [
{ "name": "checkout-before.png", "content_type": "image/png" },
{ "name": "checkout-after.png", "content_type": "image/png" },
{ "name": "console.json", "content_type": "application/json" },
{ "name": "network.har", "content_type": "application/json" }
],
"script_error": null
}When a run fails, the phase matters as much as the final status.
A timeout in waiting_for_readiness is a different problem from a failure in running_browser_proof.
The next agent should not have to guess whether it was still building, still starting, or already inside the browser flow.
What This Is Not
Riddle Proof does not guarantee that the whole product is correct. It does not replace unit tests, integration tests, visual review, QA, or human product judgment.
It proves selected browser behavior against stated criteria in a real preview. That narrower claim is the useful one. It turns "the agent says it works" into "the agent showed what it ran and what it captured."
Why Now
The primitives are already familiar: preview deployments, Playwright scripts, screenshots, traces, HAR files, console logs, and CI artifacts. The change is that agents now generate browser-facing code quickly enough that review needs a better handoff format.
A diff shows what changed. A summary says what the agent believes happened. A proof bundle shows what the browser did.
That is the product boundary for Riddle Proof: the evidence layer for AI-authored web changes.
Try The Workflow
Start with the Riddle Proof docs. Use Preview Modes to choose the right runtime shape, then make the proof criteria explicit before the agent runs the browser.
There is also a public example proof bundle from Riddle's own docs proof. It is intentionally ordinary JSON: target, criteria, verdict, artifacts, console summary, phase, and caveat.