# Riddle Proof

Evidence-backed workflows for agent-authored browser changes.

Plain text for agents: https://riddledc.com/docs/riddle-proof/markdown.md

## What It Is

Riddle Proof is a workflow for turning an agent's code change into inspectable evidence. It runs the actual app in a browser, exercises the behavior that matters, captures artifacts, and records whether the change met the stated proof criteria.

Build output and unit tests are still important, but they do not prove that a user-facing browser flow works. Riddle Proof fills that gap with real previews, screenshots, JSON artifacts, console output, diagnostics, and ship gates.

The contract is agent-agnostic: use it from Codex, Claude Code, OpenClaw, CI, or any workflow that can run a browser proof script and preserve the resulting evidence.

Short version: bring your agent; Riddle brings the proof.

## When To Use It

Use Riddle Proof when correctness depends on the browser, a running server, generated assets, timing, layout, or an integration boundary.

- UI changes: routes, forms, responsive layouts, canvas, and visual state.
- Preview deploys: asset paths, routing, environment variables, and production-like serving.
- Rich browser apps: audio, animation, WebGL, generated media, and user gestures.
- Agent shipping: PRs where a human should see evidence instead of trusting a summary.

Do not use proof as a replacement for focused code tests. The practical stack is: unit tests for logic, build checks for packaging, Riddle Proof for the browser behavior the user will actually touch.

## Proof Loop

1. State criteria: assertions.
2. Run app: preview.
3. Act like user: script.
4. Capture: artifacts.
5. Judge: ship gate.

A useful proof run starts with explicit criteria: the route loads, a control changes state without resetting playback, a screenshot contains the expected panel, or an audio render stays below a clipping threshold.

The agent then runs the app, performs the user flow through Riddle, saves artifacts, and reports a compact result that can be read by a human, GitHub comment, Discord bot, CLI wrapper, or another integration.

## From Agent Change To Proof Bundle

Riddle Proof is the evidence layer between an agent saying "I changed the app" and a human deciding whether the change is ready.

The agent should not only summarize the diff. It should show the route it tested, the preview it used, the exact proof criteria, the artifacts it captured, and the last runtime phase if anything got stuck.

1. Change: agent edits code.
2. Preview: static, server, or build preview.
3. Proof: browser script.
4. Bundle: verdict and artifacts.

```json
{
  "version": "riddle-proof.agent-change.v1",
  "status": "complete",
  "phase": "complete",
  "phase_updated_at": "2026-04-20T17:20:00.000Z",
  "phase_details": { "outputs": 5 },
  "preview": {
    "mode": "server-preview",
    "url": "https://preview.riddledc.com/s/sp_1234abcd/",
    "route": "/checkout"
  },
  "proof": {
    "ok": true,
    "script": "checkout-submit-smoke",
    "assertions": [
      "checkout page loads",
      "submit button stays enabled",
      "success message appears",
      "no console errors"
    ]
  },
  "result": {
    "ok": true,
    "checks": {
      "heading": "Checkout",
      "successMessage": "Order received"
    }
  },
  "artifacts": [
    { "name": "checkout-before.png", "content_type": "image/png" },
    { "name": "checkout-after.png", "content_type": "image/png" },
    { "name": "console.json", "content_type": "application/json" },
    { "name": "network.har", "content_type": "application/json" }
  ],
  "script_error": null
}
```

A proof bundle is not a guarantee that the whole product is correct. It is reviewable evidence that the selected browser-facing behavior was exercised in a real preview and passed the stated criteria.

## What Proof Changes

Proof changes the agent loop from "I made the edit" to "I exercised the edited behavior and saved evidence."

- Fewer blind handoffs: the proof bundle carries the URL, criteria, screenshots, console summary, and verdict into review.
- Better failure location: phase metadata shows whether a job failed while building, waiting for readiness, running browser proof, or collecting artifacts.
- Real-device layout pressure: viewport checks catch missing CSS, horizontal overflow, low contrast, and mobile clipping before a human sees the page.
- Unexpected coverage: a proof run pays rent when it catches the bug you meant to check and the nearby regressions you did not know to ask about yet.
- Reusable judgment: a human, GitHub check, Discord bot, or another agent can read the same compact proof object without scrolling terminal history.

The strongest proof runs are boring and concrete: name the behavior, open the preview, perform the action, capture artifacts, and fail loudly when the evidence does not match the claim.

## Example Proof Bundle

Riddle uses the same workflow on its own docs. The readable example is at https://riddledc.com/examples/riddle-proof, and the raw agent-facing JSON is at https://riddledc.com/examples/riddle-proof/docs-live-proof-bundle.json.

That bundle records a live URL proof against this docs page, the preview docs, the docs index, and the launch blog post. It includes the verdict, route checks, screenshot artifacts, console summary, HAR artifact, duration, and caveat.

```json
{
  "version": "riddle-proof.example-bundle.v1",
  "status": "completed",
  "phase": "complete",
  "target": {
    "mode": "live-url",
    "url": "https://riddledc.com/docs/riddle-proof"
  },
  "proof": {
    "ok": true,
    "assertions": [
      "Riddle Proof docs load",
      "Preview docs explain preview modes",
      "Browser console is clean"
    ]
  },
  "artifacts": [
    { "name": "riddle-proof-docs.png", "role": "primary_screenshot" },
    { "name": "console.json", "role": "console_summary" },
    { "name": "network.har", "role": "network_trace" }
  ],
  "script_error": null
}
```

Use the structure, not the exact checklist. A checkout flow, canvas app, docs page, or audio tool will need different assertions. The stable part is the handoff: target, criteria, verdict, artifacts, phase, and caveat.

## Preview Choice

Preview choice is part of the proof contract. Use the mode that matches the artifact you need to prove.

For static exports, build locally and proof the exported directory with static preview. For server apps, use server preview. For Dockerfile-based apps, use build preview. See [Preview Modes](/docs/preview) for the full decision table.

Prefer a production-shaped preview for merge proof. It catches problems such as base paths, hashed assets, SPA routing, MIME types, missing static files, and server startup issues. Use a faster local or dev preview while iterating, then run the production-shaped preview before merge.

Bring your deploy host. Riddle Proof does not replace AWS Amplify, Vercel, Netlify, GitHub Pages, or a custom deploy pipeline. It sits after those systems: wait for a preview or live URL, run browser proof, and return evidence. This site uses AWS Amplify for production deploys and Riddle Proof for post-deploy browser verification.

Good proof targets exercise the exact route, selector, user gesture, and state transition that motivated the change. Save screenshots and JSON artifacts at the end, not only at page load.

## Proof Packets

A proof packet is the durable record of what happened. It should be small enough to read quickly and specific enough to debug without rerunning the job.

```json
{
  "version": "riddle-proof.capture-diagnostic.v1",
  "ok": true,
  "tool": "server-preview",
  "target": "https://preview.example.com/app",
  "timestamp": "2026-04-20T00:00:00.000Z",
  "assertions": [
    "route loads",
    "playback stays running while mixer changes",
    "screenshot includes trainer lane"
  ],
  "artifacts": [
    { "name": "after-mixer-change.png", "role": "after_proof" },
    { "name": "live-control-proof.json", "role": "diagnostic" }
  ],
  "diagnostics": [
    { "label": "currentStep", "before": 4, "after": 22 },
    { "label": "isPlaying", "value": true }
  ]
}
```

The exact fields vary by integration, but the shape should stay boring: version, target, criteria, outcome, artifacts, and compact diagnostics. Avoid proof output that only says "passed" without showing what was actually exercised.

## Live URL Proof

Use live URL proof when the app is already deployed and the question is whether public routes, links, copy, layout, or browser behavior work in production.

This is the simplest production smoke test: visit the live URL, assert the important text or selectors, save screenshots, save a compact JSON proof packet, and fail the script if any check is false.

```js
await page.goto("https://example.com/docs", { waitUntil: "domcontentloaded" });
await page.waitForSelector("h1");

const bodyText = await page.locator("body").innerText();
const checks = {
  heading: await page.locator("h1").first().innerText(),
  hasInstallStep: bodyText.includes("npm install"),
  hasPreviewLink: bodyText.includes("Preview Modes")
};

const proof = {
  version: "riddle-proof.live-url.v1",
  target: "https://example.com/docs",
  ok: checks.hasInstallStep && checks.hasPreviewLink,
  checks
};

await saveScreenshot("docs-live", { fullPage: true });
await saveJson("live-url-proof", proof);
if (!proof.ok) throw new Error(JSON.stringify(checks));
return proof;
```

For script runs that return data, set `options.returnResult: true` or include `result`. Riddle returns JSON for that case, including the returned result and artifact names/URLs, instead of switching to a raw PNG response.

## Minimal Live Proof Request

This is the smallest useful live URL proof shape: load a route, assert a user-visible condition, save a screenshot, return JSON, and ask Riddle for a JSON response.

```bash
curl https://api.riddledc.com/v1/run \
  -H "Authorization: Bearer $RIDDLE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://riddledc.com/docs/riddle-proof",
    "script": "await page.waitForSelector(\"h1\"); const heading = await page.locator(\"h1\").first().innerText(); const ok = heading.includes(\"Riddle Proof\"); await saveScreenshot(\"docs-live\", { fullPage: true }); await saveJson(\"docs-live-proof\", { ok, heading }); if (!ok) throw new Error(heading); return { ok, heading };",
    "include": ["screenshots", "result", "console"],
    "options": { "returnResult": true }
  }'
```

Use placeholders in docs, scripts, and PR comments. Do not paste live API keys, cookies, private preview URLs, or application secrets into proof bundles.

## Result Shape

A good proof result should be compact enough for an agent to paste into a PR comment, but complete enough to debug without rerunning the job.

```json
{
  "ok": true,
  "response_mode": "json",
  "result": { "ok": true, "checks": { "heading": "Riddle Proof" } },
  "artifacts": [
    { "name": "docs-live.png", "url": "https://...", "content_type": "image/png" },
    { "name": "live-url-proof.json", "url": "https://...", "content_type": "application/json" }
  ],
  "artifacts_url": "https://..."
}
```

For preview jobs, wrappers should also surface `phase`, `phase_updated_at`, `phase_details`, `script_error`, and `outputs`. Timeout reports should include the last known phase so the next agent knows whether the job was still building, waiting for readiness, or inside browser proof.

## App Contracts

Proof works best when the app exposes a small, intentional diagnostic contract instead of forcing the agent to infer everything from pixels.

For browser apps, that can be a window hook, stable DOM attributes, or a debug endpoint available only in preview/test contexts. The contract should expose user-visible state, not private secrets.

```js
// Example browser proof hook
window.__RIDDLE_APP_PROOF__ = {
  version: "my-app.proof.v1",
  getState() {
    return {
      route: location.pathname,
      isPlaying,
      currentStep,
      selectedInstrument
    };
  },
  async captureDiagnostic() {
    return {
      version: "riddle-proof.capture-diagnostic.v1",
      ok: true,
      state: this.getState()
    };
  }
};
```

Keep the contract stable, redacted, and deliberately small. It should help the proof script confirm behavior, not become a second application API.

## Audio And Rich Apps

Riddle Proof is useful for audio and visual apps, but it should be honest about what it can prove.

For audio, deterministic rendered-buffer checks can prove timing, clipping, silence, headroom, and state transitions. Live browser proof can confirm the UI route, playback controls, screenshots, and analyzer-visible activity. Neither one replaces listening when the question is taste, tone, or mix feel.

Good standard: prove that the app behaves correctly, then listen before claiming that it sounds good.

## Authenticated Proof

Many useful browser changes live behind login. A proof workflow should be able to start from an authenticated browser state without asking the agent to replay a slow login flow every time.

Riddle accepts cookies, headers, and `localStorage` values for protected pages. For SPA auth, pass the same storage keys the app already uses, then run the proof against the protected route. Keep the token source outside source control and pass it through an environment variable, secret store, or local JSON file.

```json
{
  "url": "https://app.example.com/dashboard",
  "script": "await page.waitForSelector('h1'); await saveScreenshot('dashboard'); return { ok: true };",
  "include": ["screenshots", "result", "console"],
  "options": {
    "returnResult": true,
    "localStorage": {
      "CognitoIdentityServiceProvider.CLIENT_ID.USER.idToken": "JWT_FROM_SECRET_STORE"
    }
  }
}
```

Auth safety: screenshots, result JSON, and console summaries are usually the right default. HAR/network evidence can include authorization headers or signed URLs, so treat it as explicit opt-in evidence for authenticated runs.

## Fast Vs Full Proof

| Tier | Use | Typical Evidence |
| --- | --- | --- |
| fast | Inner-loop iteration while the agent is still editing. | One route, one screenshot, one JSON state capture. |
| merge | Before PR merge or deploy. | Built server preview, key user flow, artifacts, diagnostics, and CI. |
| prod | After deploy or for live smoke checks. | Production URL, public route, post-deploy screenshot and state check. |

Fast proof keeps iteration speed high. Full proof should be slower and stricter because it is the evidence humans rely on when reviewing or merging.

## Gotchas

- Proof criteria must be explicit. A vague proof run creates vague confidence.
- Selectors should be stable and intentional. Prefer test IDs, labels, roles, and user-visible text.
- Timeouts need to match the app. Long builds and remote previews should not inherit tiny local timeouts.
- Failure artifacts matter. Save the screenshot, console, route, and diagnostic state when proof fails.
- Do not leak secrets. Redact tokens, cookies, API keys, and private URLs from proof packets.
- Do not overclaim. Proof is evidence about behavior, not a substitute for human product judgment.

## Packages

The reusable contracts live in `@riddledc/riddle-proof`. Agent-specific wrappers can build on those contracts without depending on another plugin runtime.

```bash
npm install @riddledc/riddle-proof
```

```ts
import { createRunResult, createRunState } from "@riddledc/riddle-proof";
import { runRiddleProof } from "@riddledc/riddle-proof/runner";
import { toRiddleProofRunParams } from "@riddledc/riddle-proof/openclaw";
```

The current package boundary standardizes run state, result shape, evidence bundles, proof assessment, ship metadata, adapter interfaces, and OpenClaw parameter normalization.

Repo: https://github.com/riddledc/integrations