Riddle Proof

Evidence-backed workflows for agent-authored browser changes.

Plain text for agents: /docs/riddle-proof/markdown.md

What It Is

Riddle Proof is a workflow for turning an agent's code change into inspectable evidence. It runs the actual app in a browser, exercises the behavior that matters, captures artifacts, and records whether the change met the stated proof criteria.

Build output and unit tests are still important, but they do not prove that a user-facing browser flow works. Riddle Proof fills that gap with real previews, screenshots, JSON artifacts, console output, diagnostics, and ship gates.

Short version: bring your agent; Riddle brings the proof.

When To Use It

Use Riddle Proof when correctness depends on the browser, a running server, generated assets, timing, layout, or an integration boundary.

UI changesRoutes, forms, responsive layouts, canvas, and visual state.
Preview deploysAsset paths, routing, environment variables, and production-like serving.
Rich browser appsAudio, animation, WebGL, generated media, and user gestures.
Agent shippingPRs where a human should see evidence instead of trusting a summary.

Do not use proof as a replacement for focused code tests. The practical stack is: unit tests for logic, build checks for packaging, Riddle Proof for the browser behavior the user will actually touch.

Proof Loop

1State Criteriaassertions
->
2Run Apppreview
->
3Act Like Userscript
->
4Captureartifacts
->
5Judgeship gate

A useful proof run starts with explicit criteria: the route loads, a control changes state without resetting playback, a screenshot contains the expected panel, or an audio render stays below a clipping threshold.

The agent then runs the app, performs the user flow through Riddle, saves artifacts, and reports a compact result that can be read by a human, GitHub comment, Discord bot, CLI wrapper, or another integration.

Preview Choice

Preview choice is part of the proof contract. Use the mode that matches the artifact you need to prove.

For static exports, build locally and proof the exported directory with static preview. For server apps, use server preview. For Dockerfile-based apps, use build preview. See Preview Modes for the full decision table.

Prefer a production-shaped preview for merge proof. It catches problems such as base paths, hashed assets, SPA routing, MIME types, missing static files, and server startup issues. Use a faster local or dev preview while iterating, then run the production-shaped preview before merge.

Good proof targets

Exercise the exact route, selector, user gesture, and state transition that motivated the change. Save screenshots and JSON artifacts at the end, not only at page load.

Proof Packets

A proof packet is the durable record of what happened. It should be small enough to read quickly and specific enough to debug without rerunning the job.

{ "version": "riddle-proof.capture-diagnostic.v1", "ok": true, "tool": "server-preview", "target": "https://preview.example.com/app", "timestamp": "2026-04-20T00:00:00.000Z", "assertions": [ "route loads", "playback stays running while mixer changes", "screenshot includes trainer lane" ], "artifacts": [ { "name": "after-mixer-change.png", "role": "after_proof" }, { "name": "live-control-proof.json", "role": "diagnostic" } ], "diagnostics": [ { "label": "currentStep", "before": 4, "after": 22 }, { "label": "isPlaying", "value": true } ] }

The exact fields vary by integration, but the shape should stay boring: version, target, criteria, outcome, artifacts, and compact diagnostics. Avoid proof output that only says "passed" without showing what was actually exercised.

App Contracts

Proof works best when the app exposes a small, intentional diagnostic contract instead of forcing the agent to infer everything from pixels.

For browser apps, that can be a window hook, stable DOM attributes, or a debug endpoint available only in preview/test contexts. The contract should expose user-visible state, not private secrets.

// Example browser proof hook window.RIDDLE_APP_PROOF = { version: "my-app.proof.v1", getState() { return { route: location.pathname, isPlaying, currentStep, selectedInstrument }; }, async captureDiagnostic() { return { version: "riddle-proof.capture-diagnostic.v1", ok: true, state: this.getState() }; } };

Keep the contract stable, redacted, and deliberately small. It should help the proof script confirm behavior, not become a second application API.

Audio And Rich Apps

Riddle Proof is useful for audio and visual apps, but it should be honest about what it can prove.

For audio, deterministic rendered-buffer checks can prove timing, clipping, silence, headroom, and state transitions. Live browser proof can confirm the UI route, playback controls, screenshots, and analyzer-visible activity. Neither one replaces listening when the question is taste, tone, or mix feel.

Good standard: prove that the app behaves correctly, then listen before claiming that it sounds good.

Fast Vs Full Proof

TierUseTypical Evidence
fastInner-loop iteration while the agent is still editing.One route, one screenshot, one JSON state capture.
mergeBefore PR merge or deploy.Built server preview, key user flow, artifacts, diagnostics, and CI.
prodAfter deploy or for live smoke checks.Production URL, public route, post-deploy screenshot and state check.

Fast proof keeps iteration speed high. Full proof should be slower and stricter because it is the evidence humans rely on when reviewing or merging.

Gotchas

  • Proof criteria must be explicit. A vague proof run creates vague confidence.
  • Selectors should be stable and intentional. Prefer test IDs, labels, roles, and user-visible text.
  • Timeouts need to match the app. Long builds and remote previews should not inherit tiny local timeouts.
  • Failure artifacts matter. Save the screenshot, console, route, and diagnostic state when proof fails.
  • Do not leak secrets. Redact tokens, cookies, API keys, and private URLs from proof packets.
  • Do not overclaim. Proof is evidence about behavior, not a substitute for human product judgment.

Packages

The reusable contracts live in @riddledc/riddle-proof. Agent-specific wrappers can build on those contracts without depending on another plugin runtime.

npm install @riddledc/riddle-proof

import { createRunResult, createRunState } from "@riddledc/riddle-proof"; import { runRiddleProof } from "@riddledc/riddle-proof/runner"; import { toRiddleProofRunParams } from "@riddledc/riddle-proof/openclaw";

The current package boundary standardizes run state, result shape, evidence bundles, proof assessment, ship metadata, adapter interfaces, and OpenClaw parameter normalization.

Repo: https://github.com/riddledc/integrations