Riddle Proof
Evidence-backed workflows for agent-authored browser changes.
Plain text for agents: /docs/riddle-proof/markdown.md
What It Is
Riddle Proof is a workflow for turning an agent's code change into inspectable evidence. It runs the actual app in a browser, exercises the behavior that matters, captures artifacts, and records whether the change met the stated proof criteria.
Build output and unit tests are still important, but they do not prove that a user-facing browser flow works. Riddle Proof fills that gap with real previews, screenshots, JSON artifacts, console output, diagnostics, and ship gates.
The contract is agent-agnostic: use it from Codex, Claude Code, OpenClaw, CI, or any workflow that can run a browser proof script and preserve the resulting evidence.
When To Use It
Use Riddle Proof when correctness depends on the browser, a running server, generated assets, timing, layout, or an integration boundary.
UI changesRoutes, forms, responsive layouts, canvas, and visual state.Preview deploysAsset paths, routing, environment variables, and production-like serving.Rich browser appsAudio, animation, WebGL, generated media, and user gestures.Agent shippingPRs where a human should see evidence instead of trusting a summary.Do not use proof as a replacement for focused code tests. The practical stack is: unit tests for logic, build checks for packaging, Riddle Proof for the browser behavior the user will actually touch.
Proof Loop
assertionspreviewscriptartifactsship gateA useful proof run starts with explicit criteria: the route loads, a control changes state without resetting playback, a screenshot contains the expected panel, or an audio render stays below a clipping threshold.
The agent then runs the app, performs the user flow through Riddle, saves artifacts, and reports a compact result that can be read by a human, GitHub comment, Discord bot, CLI wrapper, or another integration.
From Agent Change To Proof Bundle
Riddle Proof is the evidence layer between an agent saying "I changed the app" and a human deciding whether the change is ready.
The agent should not only summarize the diff. It should show the route it tested, the preview it used, the exact proof criteria, the artifacts it captured, and the last runtime phase if anything got stuck.
agent edits codestatic/server/buildbrowser scriptverdict + artifacts{ "version": "riddle-proof.agent-change.v1", "status": "complete", "phase": "complete", "phase_updated_at": "2026-04-20T17:20:00.000Z", "phase_details": { "outputs": 5 }, "preview": { "mode": "server-preview", "url": "https://preview.riddledc.com/s/sp_1234abcd/", "route": "/checkout" }, "proof": { "ok": true, "script": "checkout-submit-smoke", "assertions": [ "checkout page loads", "submit button stays enabled", "success message appears", "no console errors" ] }, "result": { "ok": true, "checks": { "heading": "Checkout", "successMessage": "Order received" } }, "artifacts": [ { "name": "checkout-before.png", "content_type": "image/png" }, { "name": "checkout-after.png", "content_type": "image/png" }, { "name": "console.json", "content_type": "application/json" }, { "name": "network.har", "content_type": "application/json" } ], "script_error": null }A proof bundle is not a guarantee that the whole product is correct. It is reviewable evidence that the selected browser-facing behavior was exercised in a real preview and passed the stated criteria.
What Proof Changes
Proof changes the agent loop from "I made the edit" to "I exercised the edited behavior and saved evidence."
Fewer blind handoffsThe proof bundle carries the URL, criteria, screenshots, console summary, and verdict into review.Better failure locationPhase metadata shows whether a job failed while building, waiting for readiness, running browser proof, or collecting artifacts.Real-device layout pressureViewport checks catch missing CSS, horizontal overflow, low contrast, and mobile clipping before a human sees the page.Unexpected coverageA proof run pays rent when it catches the bug you meant to check and the nearby regressions you did not know to ask about yet.Reusable judgmentA human, GitHub check, Discord bot, or another agent can read the same compact proof object without scrolling terminal history.The strongest proof runs are boring and concrete: name the behavior, open the preview, perform the action, capture artifacts, and fail loudly when the evidence does not match the claim.
Example Proof Bundle
Riddle uses the same workflow on its own docs. The readable example is at /examples/riddle-proof, and the raw agent-facing JSON is at /examples/riddle-proof/docs-live-proof-bundle.json.
That bundle records a live URL proof against this docs page, the preview docs, the docs index, and the launch blog post. It includes the verdict, route checks, screenshot artifacts, console summary, HAR artifact, duration, and caveat.
{
"version": "riddle-proof.example-bundle.v1",
"status": "completed",
"phase": "complete",
"target": {
"mode": "live-url",
"url": "https://riddledc.com/docs/riddle-proof"
},
"contract": {
"summary": "Bring your agent; Riddle brings the proof.",
"terms": [
"Riddle Proof turns browser flows into proof receipts",
"The proof loop is agent-agnostic",
"The bundle is agent-proof evidence for review"
]
},
"proof": {
"ok": true,
"assertions": [
"Riddle Proof docs load",
"Proof receipts are explained",
"Bring your agent; Riddle brings the proof.",
"Preview docs explain preview modes",
"Browser console is clean"
]
},
"artifacts": [
{ "name": "riddle-proof-docs.png", "role": "primary_screenshot" },
{ "name": "console.json", "role": "console_summary" },
{ "name": "network.har", "role": "network_trace" }
],
"script_error": null
}Use the structure, not the exact checklist
A checkout flow, canvas app, docs page, or audio tool will need different assertions. The stable part is the handoff: target, criteria, verdict, artifacts, phase, and caveat.
Preview Choice
Preview choice is part of the proof contract. Use the mode that matches the artifact you need to prove.
For static exports, build locally and proof the exported directory with static preview. For server apps, use server preview. For Dockerfile-based apps, use build preview. See Preview Modes for the full decision table.
# Static export or prebuilt static bundle
riddle-proof-loop riddle-preview-deploy ./out docs-merge-proof --framework static
# Then run a durable profile against the returned Preview URL
riddle-proof-loop run-profile --profile docs-proof.json --url https://preview.riddledc.com/s/ps_example/ --runner riddleUse the packaged riddle-preview-deploy command, or its deployRiddleStaticPreview helper, rather than a one-off upload script. The packaged path records Preview metadata and handles transient publish recovery in the same base Riddle Proof loop used by local agents, CI, and wrappers.
Prefer a production-shaped preview for merge proof. It catches problems such as base paths, hashed assets, SPA routing, MIME types, missing static files, and server startup issues. Use a faster local or dev preview while iterating, then run the production-shaped preview before merge.
Bring your deploy host
Riddle Proof does not replace AWS Amplify, Vercel, Netlify, GitHub Pages, or a custom deploy pipeline. It sits after those systems: wait for a preview or live URL, run browser proof, and return evidence. This site uses AWS Amplify for production deploys and Riddle Proof for post-deploy browser verification.
Good proof targets
Exercise the exact route, selector, user gesture, and state transition that motivated the change. Save screenshots and JSON artifacts at the end, not only at page load.
Proof Packets
A proof packet is the durable record of what happened. It should be small enough to read quickly and specific enough to debug without rerunning the job.
{ "version": "riddle-proof.capture-diagnostic.v1", "ok": true, "tool": "server-preview", "target": "https://preview.example.com/app", "timestamp": "2026-04-20T00:00:00.000Z", "assertions": [ "route loads", "playback stays running while mixer changes", "screenshot includes trainer lane" ], "artifacts": [ { "name": "after-mixer-change.png", "role": "after_proof" }, { "name": "live-control-proof.json", "role": "diagnostic" } ], "diagnostics": [ { "label": "currentStep", "before": 4, "after": 22 }, { "label": "isPlaying", "value": true } ] }The exact fields vary by integration, but the shape should stay boring: version, target, criteria, outcome, artifacts, and compact diagnostics. Avoid proof output that only says "passed" without showing what was actually exercised.
Live URL Proof
Use live URL proof when the app is already deployed and the question is whether public routes, links, copy, layout, or browser behavior work in production.
This is the simplest production smoke test: visit the live URL, assert the important text or selectors, save screenshots, save a compact JSON proof packet, and fail the script if any check is false.
await page.goto("https://example.com/docs", { waitUntil: "domcontentloaded" }); await page.waitForSelector("h1"); const bodyText = await page.locator("body").innerText(); const checks = { heading: await page.locator("h1").first().innerText(), hasInstallStep: bodyText.includes("npm install"), hasPreviewLink: bodyText.includes("Preview Modes") }; const proof = { version: "riddle-proof.live-url.v1", target: "https://example.com/docs", ok: checks.hasInstallStep && checks.hasPreviewLink, checks }; await saveScreenshot("docs-live", { fullPage: true }); await saveJson("live-url-proof", proof); if (!proof.ok) throw new Error(JSON.stringify(checks)); return proof;For script runs that return data, set options.returnResult: true or include result. Riddle returns JSON for that case, including the returned result and artifact names/URLs, instead of switching to a raw PNG response.
Minimal Live Proof Request
This is the smallest useful live URL proof shape: load a route, assert a user-visible condition, save a screenshot, return JSON, and ask Riddle for a JSON response.
curl https://api.riddledc.com/v1/run \
-H "Authorization: Bearer $RIDDLE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://riddledc.com/docs/riddle-proof",
"script": "await page.waitForSelector(\"h1\"); const heading = await page.locator(\"h1\").first().innerText(); const ok = heading.includes(\"Riddle Proof\"); await saveScreenshot(\"docs-live\", { fullPage: true }); await saveJson(\"docs-live-proof\", { ok, heading }); if (!ok) throw new Error(heading); return { ok, heading };",
"include": ["screenshots", "result", "console"],
"options": { "returnResult": true }
}'Use placeholders in docs, scripts, and PR comments. Do not paste live API keys, cookies, private preview URLs, or application secrets into proof bundles.
Result Shape
A good proof result should be compact enough for an agent to paste into a PR comment, but complete enough to debug without rerunning the job.
{ "ok": true, "response_mode": "json", "result": { "ok": true, "checks": { "heading": "Riddle Proof" } }, "artifacts": [ { "name": "docs-live.png", "url": "https://...", "content_type": "image/png" }, { "name": "live-url-proof.json", "url": "https://...", "content_type": "application/json" } ], "artifacts_url": "https://..." }For preview jobs, wrappers should also surface phase, phase_updated_at, phase_details, progress, script_error, and outputs. Timeout reports should include the last known phase and progress.label so the next agent knows whether the job was still building, waiting for readiness, or inside browser proof. On failed server previews, surface server_log when present; it is usually the fastest way to diagnose startup and readiness failures.
App Contracts
Proof works best when the app exposes a small, intentional diagnostic contract instead of forcing the agent to infer everything from pixels.
For browser apps, that can be a window hook, stable DOM attributes, or a debug endpoint available only in preview/test contexts. The contract should expose user-visible state, not private secrets.
// Example browser proof hook window.__RIDDLE_APP_PROOF__ = { version: "my-app.proof.v1", getState() { return { route: location.pathname, isPlaying, currentStep, selectedInstrument }; }, async captureDiagnostic() { return { version: "riddle-proof.capture-diagnostic.v1", ok: true, state: this.getState() }; } };Keep the contract stable, redacted, and deliberately small. It should help the proof script confirm behavior, not become a second application API.
Audio And Rich Apps
Riddle Proof is useful for audio and visual apps, but it should be honest about what it can prove.
For audio, deterministic rendered-buffer checks can prove timing, clipping, silence, headroom, and state transitions. Live browser proof can confirm the UI route, playback controls, screenshots, and analyzer-visible activity. Neither one replaces listening when the question is taste, tone, or mix feel.
Authenticated Proof
Many useful browser changes live behind login. A proof workflow should be able to start from an authenticated browser state without asking the agent to replay a slow login flow every time.
Riddle accepts cookies, headers, and localStorage values for protected pages. For SPA auth, pass the same storage keys the app already uses, then run the proof against the protected route. Keep the token source outside source control and pass it through an environment variable, secret store, or local JSON file.
{ "url": "https://app.example.com/dashboard", "script": "await page.waitForSelector('h1'); await saveScreenshot('dashboard'); return { ok: true };", "include": ["screenshots", "result", "console"], "options": { "returnResult": true, "localStorage": { "CognitoIdentityServiceProvider.CLIENT_ID.USER.idToken": "JWT_FROM_SECRET_STORE" } } }Fast Vs Full Proof
| Tier | Use | Typical Evidence |
|---|---|---|
fast | Inner-loop iteration while the agent is still editing. | One route, one screenshot, one JSON state capture. |
merge | Before PR merge or deploy. | Built server preview, key user flow, artifacts, diagnostics, and CI. |
prod | After deploy or for live smoke checks. | Production URL, public route, post-deploy screenshot and state check. |
Fast proof keeps iteration speed high. Full proof should be slower and stricter because it is the evidence humans rely on when reviewing or merging.
Profile Text Semantics
Profile checks deliberately separate response-body contracts from rendered browser-text contracts. Use http_status when the contract belongs to the fetched response itself: status code, content type, byte size, or raw body fragments from markdown, JSON, YAML, robots, sitemap, or another machine-readable endpoint.
{ "type": "http_status", "label": "agent markdown", "url": "https://example.com/docs/markdown.md", "expected_status": 200, "allowed_content_types": ["text/markdown"], "min_bytes": 1000, "body_contains": ["# API Documentation"] }The body_contains, body_patterns, body_not_contains, and body_not_patterns fields match the raw HTTP response body, not rendered browser text.
For JSON responses, use body_json_assertions to prove specific fields without matching the whole payload. Assertions can check nested paths such as status, checks[0].status, or environment_blocker.
{ "type": "http_status", "label": "profile result json", "url": "https://example.com/riddle-proof-profile-result.json", "expected_status": 200, "allowed_content_types": ["application/json"], "body_json_assertions": [ { "path": "status", "equals": "passed" }, { "path": "environment_blocker", "equals": false }, { "path": "checks", "type": "array" }, { "path": "checks[0].status", "not_equals": "failed" }, { "path": "artifacts.screenshots", "contains": "desktop" } ] }Supported JSON assertion fields include exists, equals, not_equals, contains, and type. Assertion evidence keeps scalar observed values inline, while large arrays and objects are compacted with observed_length, observed_key_count, and representative samples.
Use text_visible or selector_text_visible when CSS transforms, hydration, client rendering, hidden elements, or layout-specific copy should be judged exactly as the browser exposes it to users.
Profile Mode
Profile mode runs a durable proof contract against an existing Preview or production URL without requiring an implementation step. Use it for audits, regressions, release checks, and proof-of-change follow-ups that should stay on the same base Riddle Proof path.
A profile can prepare real browser state with setup_actions, mock integration boundaries with network_mocks, and then judge the final route, text, selectors, console, overflow, HTTP bodies, and embedded frames.
{ "target": { "url": "https://preview.riddledc.com/s/pv_example/", "route": "/create", "network_mocks": [ { "label": "save-fails-then-succeeds", "url": "**/api/save", "method": "POST", "repeat_responses": true, "responses": [ { "status": 503, "json": { "error": { "message": "outage" } } }, { "status": 200, "json": { "gameId": "saved-game" } } ], "request_body_contains": ["\"buildId\":\"build-123\""] }, { "label": "slow-api", "url": "**/api/slow", "delay_ms": 2500, "json": { "ok": true } } ], "setup_actions": [ { "type": "fill", "selector": "input", "value": "Proof target" }, { "type": "click", "selector": "button", "text": "Save" }, { "type": "screenshot", "label": "after-save" } ] }, "checks": [ { "type": "frame_text_visible", "selector": "iframe", "text": "Saved player" }, { "type": "frame_url_equals", "selector": "iframe", "expected_url": "https://example.com/saved-game" } ] }Repeated mock responses make retry flows explicit.repeat_responses cycles configured responses per viewport, delay_ms simulates slow services, and request_body_contains records whether the browser sent the expected payload.
Use frame_text_visible, frame_url_equals, frame_url_matches, and frame_no_horizontal_overflow when the proof target is an embedded generated app, saved player, checkout frame, or similar iframe surface.
Profile Warnings
Profile runs can return nonblocking warnings when the contract is probably valid JSON but may not exercise the state you intended. The CLI also writes those warnings to summary.md under Profile Warnings so agents can fix weak profiles without digging through raw proof JSON.
One common warning is response shadowing inside network_mocks. When an earlier repeated response has broad request_body_contains fragments that are a subset of a later response, the earlier response can win first because Riddle Proof uses the first matching request-body response.
{ "label": "save-fails-then-succeeds", "url": "**/api/save", "method": "POST", "repeat_responses": true, "responses": [ { "status": 503, "request_body_contains": ["\"buildId\""] }, { "status": 200, "request_body_contains": ["\"buildId\"", "\"retrySave\":true"] } ] }Tighten overlapping responses with more specific request_body_contains, request_body_not_contains, request_body_patterns, or request_body_not_patterns. Warnings do not fail the proof by themselves; they are evidence that the profile may be too ambiguous for a trustworthy audit.
Gotchas
- Proof criteria must be explicit. A vague proof run creates vague confidence.
- Selectors should be stable and intentional. Prefer test IDs, labels, roles, and user-visible text.
- Timeouts need to match the app. Long builds and remote previews should not inherit tiny local timeouts.
- Failure artifacts matter. Save the screenshot, console, route, and diagnostic state when proof fails.
- Do not leak secrets. Redact tokens, cookies, API keys, and private URLs from proof packets.
- Do not overclaim. Proof is evidence about behavior, not a substitute for human product judgment.
Packages
The reusable contracts live in @riddledc/riddle-proof. Agent-specific wrappers can build on those contracts without depending on another plugin runtime.
npm install @riddledc/riddle-proofimport { createRunResult, createRunState } from "@riddledc/riddle-proof"; import { runRiddleProof } from "@riddledc/riddle-proof/runner"; import { toRiddleProofRunParams } from "@riddledc/riddle-proof/openclaw";The current package boundary standardizes run state, result shape, evidence bundles, proof assessment, ship metadata, adapter interfaces, and OpenClaw parameter normalization.