← Riddle Proof
Good Catch Diary

Real bugs caught by cheap browser proof.

A running diary of browser-visible issues found by real Riddle Proof runs. Each entry now starts with a plain-English catch card, then keeps the technical ledger and artifacts underneath.

87curated catches from recent real browser proof runs.
<$0.10Even a ten-minute proof run is under nine cents of browser time.
0 sourceMany findings can be detected from the outside with a URL and browser contract.
ManifestEach catch links to a receipt with screenshots, logs, run metadata, and the sales lesson.

Three layers

The diary stays useful as a lab notebook, but each catch also gets a short translation layer before the detailed receipts.

  • Catch card: what happened in plain English and what it does not prove.
  • Ledger: bug, missed signal, proof lesson, and evidence trail.
  • Story layer: case studies can roll multiple approved cards into a product narrative.

The diary

card → ledger → artifacts
Neon made coverage receipts reusable instead of app-local Riddle artifact
May 25, 2026package + app + deployLilArcadeRiddle Proofproof packsaudio proof
Catch card

Riddle stopped Neon proof receipts from drifting into app glue.

The app had a reusable proof-pack helper available, but still carried local coverage summary code. The fix moved receipt formatting back to the pack and added a faster verified lane for local and live checks.

What Riddle caught
The ratchet exposed a framework boundary problem: proof receipts were passing, but the evidence language could drift because it was still app-local.
Why it matters
Reusable proof packs only pay off when the real app consumes them; otherwise every future agent has to maintain app-specific proof vocabulary.
Does not prove
It does not prove the mix sounds better, and the bounded live sample does not replace a full promotion batch when broad coverage is needed.
Technical ledger

Bug: The reusable audio exploration coverage helper existed in Riddle Proof packs, but LilArcade still carried its own coverage summarizer and Markdown formatter inside the app proof script. That meant the real target could drift from the reusable proof-pack receipt language even after the package shipped.

Why normal checks missed it: Nothing looked broken from a normal pass/fail perspective. The app built, the proof passed, and the deployed target was healthy. The issue was architectural: the ratchet was still using app-local proof glue where the reusable pack should own the evidence shape.

Proof lesson: A reusable proof pack is not truly reusable until the real app consumes it. The ratchet should make the fast local path cheap, keep deployment as a promotion gate, and use shared receipt language so future agents do not have to re-learn the same audio coverage vocabulary.

Evidence: Integrations PR #749 added reusable audio exploration coverage summaries to @riddledc/riddle-proof-packs and release PR #750 published version 0.8.0. LilArcade PR #531 consumed that package, removed 144 duplicated lines from the Neon deep-exploration script, passed local tests/build, merged, and deployed through Amplify job 709. LilArcade PR #532 then added a documented fast lane with test:neon, deep-explore-fast, and post-deploy-fast, passed GitHub CI, merged, and deployed through Amplify job 710. The final live post-deploy-fast run against https://lilarcade.com passed a 1 song / 1 part / 1 window coverage check with 0 findings and restoration OK, then passed current-target durable proof with 2 overrides and 0 findings.

Neon stopped "a little" from ranking the biggest cut Riddle artifact
May 25, 2026production proof + deployLilArcadeRiddle Proofclaim translationaudio proof
Catch card

Riddle stopped an over-aggressive bass cut.

The user asked to turn the bass down "a little," but the proof loop initially ranked the biggest safe cut. Riddle caught that the candidate moved the right track in the right direction, but too far for the request.

What Riddle caught
The larger bass cuts failed candidate_magnitude_matches_requested_intent once "a little" became an explicit proof constraint.
Why it matters
Agents can over-optimize small creative requests into heavy-handed edits unless the proof constrains request scope before ranking candidates.
Does not prove
It does not prove bass -0.10 sounds better. It proves the candidate better matches the requested scope and preserved objective guardrails.
Technical ledger

Bug: The Neon ratchet loop could understand "turn the bass part down a little" as bass/down, but it had no objective receipt for the requested magnitude. The production packet therefore ranked the largest tested guardrail-preserving cut, bass -0.25, ahead of subtler candidates even though the user asked for "a little."

Why normal checks missed it: The run was not broken in the usual pass/fail sense. Fast mix health, mobile layout, playback sync, section-energy floors, clipping/headroom, low-level guardrails, and state restoration all passed. The problem only showed up when reading the claim translation: the packet proved target and direction, but not magnitude.

Proof lesson: Natural-language creative requests need claim constraints, not just metric ranking. For "a little," the proof should reject oversized candidates as claim-translation mismatches before review-order ranking can make the biggest movement look like the best next candidate.

Evidence: A live production preliminary candidate run against the deployed Neon target used the request "turn the bass part down a little" with bass/down constraints and recommended bass -0.25. LilArcade PR #527 added subtle magnitude inference, a default 0.12 max absolute delta, and a candidate_magnitude_matches_requested_intent receipt. After GitHub CI passed and Amplify job 705 deployed, the same production proof recommended bass -0.10, supported bass -0.10 and bass -0.05, and rejected bass -0.18 and bass -0.25 as claim_translation_mismatch on the magnitude receipt. The packet still says ranking is review_order_only and does not prove subjective mix quality.

Neon made approval surrogate evidence-backed Riddle artifact
May 25, 2026package + app + production proofLilArcadeRiddle Proofapproval surrogateaudio proof
Catch card

Neon made approval surrogate evidence-backed

Approval is part of the proof surface.

What Riddle caught
Integrations PR #741 added createMixingCanonSurrogateReview to @riddledc/riddle-proof-packs/audio-mix-review, then release PR #742 published @riddledc/riddle-proof-packs@0.7.0.
Why it matters
This is the practical shape of creative proof: Riddle Proof can keep development moving with a conservative approval surrogate, but every step remains auditable and refuses to claim subjective mix quality.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The approved-candidate flow could carry mixing_canon_surrogate as an approval mode, but the approval decision itself was not yet a first-class proof artifact. That made it harder to inspect why Codex was allowed to stand in for a human during development iteration.

Why normal checks missed it: The batch could already prove a supported candidate, apply a one-candidate profile, and prepare a durable patch plan. A normal pass/fail run would miss the missing handoff evidence because the approval mode appeared downstream, while the decision checklist that produced it was not independently reviewable.

Proof lesson: Approval is part of the proof surface. If an agent applies a creative candidate to keep work moving, the approval surrogate needs its own artifact with conservative-delta checks, objective receipts, section-energy guardrails, state restoration, and an explicit proof/taste boundary.

Evidence: Integrations PR #741 added createMixingCanonSurrogateReview to @riddledc/riddle-proof-packs/audio-mix-review, then release PR #742 published @riddledc/riddle-proof-packs@0.7.0. LilArcade PR #525 inserted that review before approved-candidate application and deployed through Amplify job 703. A production proof selected guitar -0.05, set guitar from 0.6 to 0.55, passed 10 objective receipts, preserved section-energy guardrails, restored state, wrote a mixing-canon-surrogate-review artifact with failedChecks [], and produced a ready_for_durable_patch plan. LilArcade PR #526 then applied that durable override and deployed through Amplify job 704. A live current-target proof verified 2 active overrides, 0 findings, and the new guitar override active at 0.55 with no clipping, 2.47 dB headroom, and no low-level proof window.

Neon made audio heuristics browser-safe instead of copy-pasted Riddle artifact
May 25, 2026package + app + deploy proofLilArcadeRiddle Proofpackage boundaryaudio heuristics
Catch card

Neon made audio heuristics browser-safe instead of copy-pasted

Reusable proof packs need browser-safe subpaths when app contracts consume them in the runtime bundle.

What Riddle caught
Integrations PR #739 added @riddledc/riddle-proof-packs/audio-mix-heuristics as a pure browser-safe subpath, then release PR #740 published @riddledc/riddle-proof-packs@0.6.4 through trusted publishing.
Why it matters
This is the kind of integration bug Riddle Proof should catch early: the proof code was correct in isolation, but the product boundary was wrong for a browser app.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The reusable audio heuristics layer existed in Riddle Proof packs, but the first LilArcade cleanup imported the proof-pack root package from browser app code. That pulled Node-oriented proof framework chunks into the Vite browser bundle and failed the production build on Node externals.

Why normal checks missed it: The focused proof-contract tests passed because they bundle the contract for Node. The issue only appeared when the actual browser production build tried to resolve the package graph. Without the build gate, this would have looked like a harmless dedupe and then broken deploy.

Proof lesson: Reusable proof packs need browser-safe subpaths when app contracts consume them in the runtime bundle. A reusable helper is not truly reusable until the import boundary matches the environment that will run it.

Evidence: Integrations PR #739 added @riddledc/riddle-proof-packs/audio-mix-heuristics as a pure browser-safe subpath, then release PR #740 published @riddledc/riddle-proof-packs@0.6.4 through trusted publishing. LilArcade PR #524 switched Neon to that subpath, removed 259 lines of duplicated local section-energy/loudness helper code, passed the browser build, passed 148 sequencer tests, produced a local built-app guitar-down review packet, deployed through Amplify job 702, and passed the production post-deploy preset with 6 songs, 19 parts, 22 windows, 2 durable overrides, and 0 findings.

Neon promoted a guitar-down candidate into a deployed override Riddle artifact
May 25, 2026production proof + deployLilArcadeRiddle Proofdurable patchcurrent target
Catch card

Neon promoted a guitar-down candidate into a deployed override

For creative agent work, the ratchet needs a promotion loop, not just a recommendation loop.

What Riddle caught
A production promotion batch ran the natural request "turn the guitar part down a little" with explicit guitar/down constraints.
Why it matters
Riddle Proof can manage the boring but critical handoff from supported candidate to deployed app state.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The earlier guitar-down packet was still only a transient recommendation. That is useful for review, but it does not prove the app can safely carry the candidate into source, deploy it, and verify that the deployed target is actually running the durable override.

Why normal checks missed it: A normal mix-change workflow can stop after "candidate looks supported" or after a source patch lands. The risky gap is the handoff between transient browser state, durable source data, production deploy, and current-target proof. Any of those layers could drift while the packet still looked persuasive.

Proof lesson: For creative agent work, the ratchet needs a promotion loop, not just a recommendation loop. A supported candidate should become durable only through an explicit approval boundary, generated patch plan, source edit, deploy, and post-deploy current-target audit.

Evidence: A production promotion batch ran the natural request "turn the guitar part down a little" with explicit guitar/down constraints. It passed fast mix health, mobile layout, playback sync, deep exploration over 2 songs / 4 parts / 4 windows, a narrowed claim-candidate loop, approval-surrogate application, durable patch planning, and a pre-patch current-target audit. LilArcade PR #522 then applied the durable override guitar: 0.6 and deployed through Amplify job 700. A post-deploy current-target proof passed with two active overrides, zero findings, chord 0.16 still active, guitar 0.6 active, and the guitar target reporting no clipping, headroom 2.31 dB, and no low-level proof window.

Neon stopped hiding tiny guitar energy deltas Riddle artifact
May 25, 2026local + npm + production proofLilArcadeRiddle Proofaudio heuristicsreview packets
Catch card

Neon stopped hiding tiny guitar energy deltas

Review-packet formatting is part of the proof surface.

What Riddle caught
@riddledc/riddle-proof-packs 0.6.3 changed the shared packet formatter to preserve tiny nonzero audio values.
Why it matters
Riddle Proof is not just collecting artifacts; it is improving the reliability of the human handoff.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Neon ratchet packet had started tracking requested-instrument section energy, but the human-facing Markdown rounded some very small nonzero guitar energy deltas to 0. The raw proof JSON still had the movement, but the review packet made a real rendered change look less observable than it was.

Why normal checks missed it: The proof still passed and the candidate recommendation was reasonable, so a normal pass/fail check would not catch it. The issue only showed up when reading the human packet as a reviewer would: the packet had the right tracked-instrument column, but the formatting was too coarse for tiny audio-energy deltas.

Proof lesson: Review-packet formatting is part of the proof surface. If the proof asks a human to review measurable candidate movement, small nonzero values must stay visible instead of being rounded into apparent no-ops. Metrics still do not prove taste, but they must be precise enough to support review.

Evidence: @riddledc/riddle-proof-packs 0.6.3 changed the shared packet formatter to preserve tiny nonzero audio values. LilArcade PR #521 consumed that package and deployed with Amplify job 699. A fresh live Riddle Proof Playwright preliminary batch against the deployed Amplify branch passed fast mix health, mobile layout, playback sync, claim-candidate review, and human-review packet extraction. The packet recommended guitar -0.05, supported three candidates, rejected zero, restored state, preserved guardrails, and now shows tracked guitar energy deltas such as -0.000044, -0.000066, and target movement energy -0.000054 instead of flattening them to 0.

Neon turned a natural mix request into proof-backed candidates Riddle artifact
May 25, 2026local + production proofLilArcadeRiddle Proofclaim candidatesaudio heuristics
Catch card

Neon turned a natural mix request into proof-backed candidates

A creative proof loop gets more useful when the requested claim is part of the run contract.

What Riddle caught
LilArcade PR #519 added --ratchet-intent, --ratchet-target-tracks, and --ratchet-direction to the Neon ratchet batch CLI.
Why it matters
Riddle Proof is becoming a practical candidate operator: an agent can ask for a natural musical change, get bounded candidates with objective receipts, and hand off a compact review packet instead of a vague "sounds better" claim.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Neon ratchet loop had become good at running a fixed chord-down claim, but trying a new musical request still meant editing profile JSON or leaning on hard-coded intent. That slowed the loop and made natural claim translation harder to prove.

Why normal checks missed it: The profile already supported claim candidates, receipts, section-energy comparisons, and review packets. The missing layer was ergonomic: the batch CLI could narrow focus tracks and iteration count, but it could not declare the actual claim text, target tracks, or direction as first-class run inputs.

Proof lesson: A creative proof loop gets more useful when the requested claim is part of the run contract. The proof should record the natural request, explicit target constraints, candidate actions, receipt verdicts, section-energy tables, and the listening-review boundary in one packet.

Evidence: LilArcade PR #519 added --ratchet-intent, --ratchet-target-tracks, and --ratchet-direction to the Neon ratchet batch CLI. A live preliminary run against the deployed Amplify branch used the request "turn the guitar part down a little" with explicit guitar/down constraints. The generated profile passed those constraints into runRatchetLoop, tested three guitar-down candidates, and returned candidate_ready_for_listening_review with guitar -0.05 recommended. All ten objective receipts passed for the recommended candidate, state was restored after the loop, and the packet included all-candidate section-energy tables. The packet still says ranking is review_order_only and does not prove subjective mix quality.

Neon refused to promote an off-target mix candidate Riddle artifact
May 25, 2026local + production proofLilArcadeRiddle Proofintent guardclaim candidates
Catch card

Neon refused to promote an off-target mix candidate

Claim-candidate loops need target and direction receipts, not just mix-health receipts.

What Riddle caught
LilArcade PR #517 added request-aware target inference and candidate-track/direction receipts.
Why it matters
Riddle Proof caught a claim-translation bug in a creative loop.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Neon ratchet could ask for "turn the chord part down" and still surface a bass -0.18 candidate because broad review-order ranking found bass objectively guardrail-preserving. That was useful exploration evidence, but it was not support for the requested chord-down claim.

Why normal checks missed it: The candidate packet already separated metrics from taste, so a normal review could see that ranking was only review_order_only. The mismatch was subtler: the recommended candidate matched the ranking metric but not the natural-language claim target. Only comparing requested_intent, candidate action, and preservation receipts exposed the claim-translation gap.

Proof lesson: Claim-candidate loops need target and direction receipts, not just mix-health receipts. A candidate can be measurable, reversible, and guardrail-preserving while still being the wrong candidate for the claim.

Evidence: LilArcade PR #517 added request-aware target inference and candidate-track/direction receipts. After deploy, a live Riddle Proof Playwright promotion run against the Amplify branch inferred targetTracks ["chord"] and direction "down" from the intent. The packet reported needs_followup with zero supported candidates and four rejected chord-down candidates. Even chord -0.05 failed required_instruments_preserved and section_energy_floors_preserved at the current durable chord 0.16 level. The approval-surrogate and durable patch steps were skipped because no supported review candidate existed.

Neon section-energy guard rejected a disappearing chord cut Riddle artifact
May 25, 2026local + production proofLilArcadeRiddle Proofaudio heuristicsclaim candidates
Catch card

Neon section-energy guard rejected a disappearing chord cut

Audio proof should make candidate rejection deterministic without pretending to automate taste.

What Riddle caught
After @riddledc/riddle-proof-packs 0.6.0 shipped and LilArcade synced the Neon profiles, a live Riddle Proof Playwright run against the deployed Amplify branch passed.
Why it matters
Riddle Proof turned a fuzzy mixing concern into a deterministic candidate guard: it did not claim to know the better mix, but it caught a candidate that would make a required chord lane vanish and produced a compact review packet explaining why.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Neon candidate loop needed a more inspectable reason to reject a small-looking chord cut. The control edit chord 0.16 -> 0.06 was just another bounded candidate, but the live section-energy receipt showed the required chord lane disappearing in the Intro Bed proof window.

Why normal checks missed it: A normal candidate table can say a candidate failed preservation, but it does not show whether the failure is a meaningful audio-lane disappearance or just a ranking preference. The new section-energy receipt compared baseline and candidate windows directly and exposed RMS, peak, and total-energy floors for the required chord lane.

Proof lesson: Audio proof should make candidate rejection deterministic without pretending to automate taste. Section-by-section energy floors and loudness-style deltas are useful review aids when they reject disappearing required lanes, preserve headroom guardrails, and keep the surviving candidates ranked for human listening review.

Evidence: After @riddledc/riddle-proof-packs 0.6.0 shipped and LilArcade synced the Neon profiles, a live Riddle Proof Playwright run against the deployed Amplify branch passed. The ratchet packet reported claim_candidate_supported with five supported candidates and one rejected candidate. The rejected chord -0.10 candidate failed required_instruments_preserved and section_energy_floors_preserved: in the Intro Bed window, chord moved from baseline RMS 0.0022, peak 0.0079, total energy 0.000001 to candidate RMS 0, peak 0, total energy 0. The remaining recommended review candidate, bass 0.62 -> 0.44, preserved required section energy floors and guardrails. The packet still says ranking is review_order_only and does not prove subjective mix quality.

Neon profile sync blessed a stale current target Riddle artifact
May 25, 2026local + production proofLilArcadeRiddle Proofprofile synccurrent target
Catch card

Neon profile sync blessed a stale current target

Current-target profiles for app-owned durable state should be generated from the app state they claim to audit.

What Riddle caught
LilArcade PR #514 changed the Neon profile sync command so the durable current-target profile is instantiated from the active durable override file.
Why it matters
Riddle Proof caught a source-of-truth drift bug in the proof machinery itself.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: After the chord 0.16 preservation candidate became the active durable override, the checked-in Neon durable current-target profile still referenced the older chord 0.18 candidate from the pack sample. The profile sync gate passed because it only compared local profiles to the published pack, not to the app source of truth.

Why normal checks missed it: The sync check was internally consistent: local files matched the reusable pack sample. The mismatch only appeared when comparing three evidence roles together: the active app override, the checked-in current-target profile, and the deployed browser proof target.

Proof lesson: Current-target profiles for app-owned durable state should be generated from the app state they claim to audit. Reusable pack samples are useful seeds, but they should not become stale source-of-truth evidence for a live app.

Evidence: LilArcade PR #514 changed the Neon profile sync command so the durable current-target profile is instantiated from the active durable override file. The regenerated profile now names monkberry-moon-delight-tab-chord-008-to-016-preservation-candidate, records generated_from_active_durable_override, expects chord 0.16, and keeps the current-target/taste boundary explicit. GitHub CI passed, Amplify job 692 deployed merge commit 1505ab8, and a post-deploy Riddle Proof Playwright run against the live branch URL passed with HTTP 200, visible 0.16X, contract/profile chord levels at 0.16, peak 0.7768, RMS 0.1116, clipping false, and zero fatal console events.

Neon candidate loop tested the wrong audio path Riddle artifact
May 25, 2026local + production proofLilArcadeRiddle Proofclaim candidatesaudio guardrails
Catch card

Neon candidate loop tested the wrong audio path

Claim-candidate loops must use the same source-preparation and state-isolation preconditions as the guardrail proofs they depend on.

What Riddle caught
Before the fix, the production candidate batch failed deep exploration with two Monkberry intro findings: chord was required but inactive at chord 0.08, with no clipping or low-level window.
Why it matters
Riddle Proof caught a ratchet consistency bug: one proof path said the chord lane was missing, while another path was about to recommend cutting it further.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: After the Neon chord 0.08 durable override shipped, the deeper production sweep found the required Monkberry chord lane missing in intro windows. Raising the active chord floor to 0.16 fixed that deterministic guardrail, but the claim-candidate loop still recommended another chord -0.10 cut because it rendered candidates without the prepared sample-source path used by deep exploration. Candidate attempts also inherited prior candidate edits instead of starting from the original mix each time.

Why normal checks missed it: The individual proof surfaces were each plausible: current-target proof could verify the active override, and the candidate packet could produce a green recommendation. The bug only appeared when comparing evidence roles across the ratchet: deep exploration loaded the piano/sample path and rejected the low chord floor, while the review loop used a different render setup where the same cut looked preserved.

Proof lesson: Claim-candidate loops must use the same source-preparation and state-isolation preconditions as the guardrail proofs they depend on. Otherwise the packet can rank a candidate for listening review under easier conditions than the current-target audit or deep sweep.

Evidence: Before the fix, the production candidate batch failed deep exploration with two Monkberry intro findings: chord was required but inactive at chord 0.08, with no clipping or low-level window. A local isolated-candidate run then showed the review packet still recommended chord 0.16 -> 0.06 because the loop did not prepare audio sources. LilArcade PR #513 superseded the chord 0.08 override with chord 0.16, reset candidate tracks to original levels before every attempt, and made runRatchetLoop prepare audio sample sources before baseline and candidate renders by default. After deploy, the production current-target proof passed with active override monkberry-moon-delight-tab-chord-008-to-016-preservation-candidate, expected chord 0.16, zero findings, peak 0.7768, clipping false, and lowLevel false. The production candidate batch then passed with source preparation true, five supported candidates, chord -0.10 rejected for required_instruments_preserved, and bass -0.18 surfaced only as review-order guidance.

Neon current-target proof hid its own nested receipts Riddle artifact
May 25, 2026local run + production proofLilArcadeRiddle Proofartifact handoffaudio guardrails
Catch card

Neon current-target proof hid its own nested receipts

Passing proof is not enough if the handoff cannot find the proof.

What Riddle caught
Before the fix, the post-apply current-target run for the new chord 0.08 override passed with current_target_ready, one active override, zero findings, peak 0.7817, headroom 2.14 dB, clipping false, and lowLevel false, but artifactIndex contained only 3 entries.
Why it matters
Riddle Proof found a handoff failure after the product proof passed: the app state was correct, but the evidence packet was too shallow for reliable review.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: After the Neon chord 0.08 durable override was applied locally, the current-target proof passed but the batch artifact index only listed the three top-level batch files. The actual per-override proof receipt, generated profile, console capture, DOM summary, and screenshots were nested under the durable-current-target step and missing from the reviewer-facing artifact index.

Why normal checks missed it: The proof itself was green, so a normal pass/fail check would have stopped there. The weakness only appeared when treating the batch summary as the handoff surface a reviewer or agent would use to inspect the evidence after promotion.

Proof lesson: Passing proof is not enough if the handoff cannot find the proof. Current-target audits need to index nested per-override receipts, screenshots, and aggregate summaries so a durable source change can be reviewed without spelunking through output directories.

Evidence: Before the fix, the post-apply current-target run for the new chord 0.08 override passed with current_target_ready, one active override, zero findings, peak 0.7817, headroom 2.14 dB, clipping false, and lowLevel false, but artifactIndex contained only 3 entries. LilArcade PR #512 added durable-current-target artifact indexing for aggregate summaries, nested generated profiles, proof/profile/console/DOM receipts, Markdown summaries, and nested screenshots. The production proof after deploy passed against https://lilarcade.com with the active override monkberry-moon-delight-tab-chord-018-to-008-approved-candidate, expected mixer level chord 0.08, zero findings, and 14 indexed artifacts.

Neon review packets hid candidate evidence in raw JSON Riddle artifact
May 25, 2026local runLilArcadeRiddle Proofhuman reviewaudio guardrails
Catch card

Neon review packets hid candidate evidence in raw JSON

Human-review packets are proof artifacts, not summaries after the fact.

What Riddle caught
The before packet from the preliminary Neon loop recommended chord -0.10 and counted two supported candidates, but had no Supported Candidates table.
Why it matters
Riddle Proof improved the handoff between objective browser proof and human creative judgment: the system still does not decide taste, but it gives reviewers the evidence needed to make the listening decision quickly.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The local Neon preliminary ratchet could produce a valid human-review packet, but the Markdown artifact only showed the recommendation and counts. Candidate actions, measured target movement, receipt pass/fail status, and ranking values were present in JSON but hidden from the reviewer-readable packet.

Why normal checks missed it: The proof run was green and the packet said the right proof/taste boundary. The weakness only showed up when using the artifact for its actual purpose: deciding whether a supported candidate is worth listening to without reading raw JSON.

Proof lesson: Human-review packets are proof artifacts, not summaries after the fact. Creative ratchets need to show supported and rejected candidates, target movement, receipt status, and ranking-as-review-order directly in Markdown while still refusing to call the mix automatically better.

Evidence: The before packet from the preliminary Neon loop recommended chord -0.10 and counted two supported candidates, but had no Supported Candidates table. The JSON contained the missing evidence. Integrations PR #729 added reusable candidate tables to @riddledc/riddle-proof-packs, #730 published 0.5.2 through trusted publishing, and LilArcade PR #510 consumed it. The final local run passed with preliminary_candidate_ready, chord -0.10 recommended, two supported candidates, zero rejected candidates, state restoration true, no permanent edit, no clipping or low-level windows, and a Markdown table showing target movement, pass (6) receipts, and review-order ranking.

Neon batch confused patch-plan identity with current target proof Riddle artifact
May 24, 2026local runLilArcadeRiddle Proofaudio guardrailslocal ratchet
Catch card

Neon batch confused patch-plan identity with current target proof

A proof batch needs to distinguish planned durable edits from already-active current-target evidence.

What Riddle caught
Before the fix, the full local batch produced a durable patch plan for chord 0.18 -> 0.08 with override id monkberry-moon-delight-tab-chord-minus-01-approved-candidate, while the current-target proof used the same id for the active chord 0.18 override.
Why it matters
Riddle Proof caught a workflow-level ambiguity: the product state was healthy, but the proof handoff could mislabel a future edit as the same durable object as the current target.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The local Neon ratchet batch could show a new durable patch plan and a durable current-target proof with the same override id even though they referred to different mixer levels: the proposed future patch wanted chord 0.08 while the active current target still proved chord 0.18.

Why normal checks missed it: Each step looked reasonable alone. The approved-candidate proof produced a valid listening-review handoff, and the current-target proof correctly verified the already-active source override. The mismatch only appeared when the batch stitched the patch-plan and current-target roles together.

Proof lesson: A proof batch needs to distinguish planned durable edits from already-active current-target evidence. Repeated deltas are not stable identities for creative changes; durable handoff ids need absolute target evidence, and the batch should fail if one id names two different level states.

Evidence: Before the fix, the full local batch produced a durable patch plan for chord 0.18 -> 0.08 with override id monkberry-moon-delight-tab-chord-minus-01-approved-candidate, while the current-target proof used the same id for the active chord 0.18 override. LilArcade PR #505 changed generated ids to include absolute from/to levels and added a plan/current-target comparison receipt. The final focused browser batch passed with planned override id monkberry-moon-delight-tab-chord-018-to-008-approved-candidate, active current-target id monkberry-moon-delight-tab-chord-minus-01-approved-candidate, planComparison planned_override_not_applied_yet, one active override, zero findings, peak 0.7777, RMS 0.1112, clipping false, and lowLevel false.

Neon durable mix proof could not prove profile-source agreement Riddle artifact
May 24, 2026local + production runLilArcadeRiddle Proofapp contractaudio guardrails
Catch card

Neon durable mix proof could not prove profile-source agreement

Durable creative edits need a current-target proof that checks both runtime state and source/profile descriptors.

What Riddle caught
The first production run of npm run proof:sequencer:durable-current-target failed with selectedSongMatches true, mixProfileMatches true, contractMatches true, visibleMatches true, actualLevel 0.18, visibleToken 0.18X, but profileMatches false because profileLevel was null.
Why it matters
Riddle Proof found a proof-contract weakness in a healthy-looking applied mix: the UI and live state were right, but the durable source/profile evidence was not strong enough yet.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The durable current-target proof could see the approved chord level in the live Neon app and visible UI, but the proof contract did not expose mixProfile.mixerLevels, so it could not prove the source/profile descriptor agreed with the running mixer state.

Why normal checks missed it: The production app looked correct: Monkberry Moon Delight (Tab) loaded, the mix profile id matched, the live contract level was chord 0.18, and the UI showed 0.18X. A weaker current-target smoke would have stopped there. The stricter durable override audit compared live state, visible text, and profile/source evidence together and found profileLevel null.

Proof lesson: Durable creative edits need a current-target proof that checks both runtime state and source/profile descriptors. The result still does not judge taste; it proves the applied override is observable, source-backed, and inside objective guardrails.

Evidence: The first production run of npm run proof:sequencer:durable-current-target failed with selectedSongMatches true, mixProfileMatches true, contractMatches true, visibleMatches true, actualLevel 0.18, visibleToken 0.18X, but profileMatches false because profileLevel was null. It also exposed a summary-parser blind spot: the runner preserved the returned check in profile-result.json under return_stored_to, while the top-level summary reported check null. LilArcade PR #501 added the durable-current-target command, exposed mixProfile.mixerLevels through proof state, mixer state, and offline metric receipts, and fixed the summary parser. After CI, merge, deploy notification, and a final production run, the proof passed with profileLevel 0.18, findingCount 0, peak 0.7777, RMS 0.1112, clipping false, and lowLevel false.

Neon deep exploration found proof-window overclaim and hot presets Riddle artifact
May 24, 2026local runLilArcadeRiddle Proofaudio guardrailslocal ratchet
Catch card

Neon deep exploration found proof-window overclaim and hot presets

One-piece-at-a-time ratchets are useful while shaping the contract, but once a round is clean the efficient move is a deeper local sweep before deploy.

What Riddle caught
The failing local sweep sampled 6 available songs, 6 proof-capable songs, 19 parts, and 23 proof windows.
Why it matters
Riddle Proof found both proof weakness and product weakness: a declared-window overclaim and objective clipping in rich audio presets.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The bounded Neon proof loop was clean, but a deeper all-current-song local sweep found one proof-window calibration overclaim and five additional built-in preset clipping regressions.

Why normal checks missed it: The default ratchet was intentionally bounded for speed, and single-song smoke proofs only exercised the current target. They could miss later Monkberry parts whose active lanes differed from the declared proof window, plus hot presets outside the first sampled set.

Proof lesson: One-piece-at-a-time ratchets are useful while shaping the contract, but once a round is clean the efficient move is a deeper local sweep before deploy. The deeper pass should batch deterministic guardrail failures while still saying nothing about subjective mix taste.

Evidence: The failing local sweep sampled 6 available songs, 6 proof-capable songs, 19 parts, and 23 proof windows. It found Monkberry Moon Delight (Tab) part 49-64 using a declared proof window that overclaimed chord activity, plus clipping in Yakety Yak (Dark) Drop and Resolve, Dark Progression A and B, and Midnight Protocol A. LilArcade then validated declared proof windows against per-part active lanes, lowered the hot built-in presets, and added npm run proof:sequencer:deep-explore as a repeatable command. The final command passed with 6 available songs, 6 proof-capable songs, 19 sampled parts, 22 sampled windows, 0 findings, and restoration ok.

Neon playback proof could pass without proving playback Riddle artifact
May 24, 2026local runLilArcadeRiddle Proofinteraction proof
Catch card

Neon playback proof could pass without proving playback

Interaction proofs need action-specific receipts.

What Riddle caught
The before receipt passed with 7 setup actions and no Stop text wait, while post-action contract evidence still showed isPlaying false and currentStep 0.
Why it matters
Riddle Proof caught a flaw in the proof, not just the product.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Neon playback-sync proof pack profile could pass after clicking Play even when the captured app contract still reported playback stopped: post-action isPlaying false and trainer currentStep 0.

Why normal checks missed it: The route loaded, the Play button was visible, the click did not throw, the screenshot was captured, and the profile only asserted that post-playback evidence existed. It did not wait for the visible Stop state or assert that playback was actually running and advancing.

Proof lesson: Interaction proofs need action-specific receipts. For playback, the proof should pair a visible UI transition with live app-contract state, then assert the state changed in the claimed direction.

Evidence: The before receipt passed with 7 setup actions and no Stop text wait, while post-action contract evidence still showed isPlaying false and currentStep 0. Integrations PR #721 hardened the reusable pack, trusted publishing released @riddledc/riddle-proof-packs@0.4.8, and LilArcade PR #493 synced the published profile. The after receipt passed with 10 setup actions, waited for button.drum-play text Stop, captured isPlaying true, currentStep 2, and movedForward true. LilArcade validation also passed profile sync, 103 app tests, production build, mobile trainer layout proof, and deploy notification.

Neon app profiles drifted from the reusable proof pack Riddle artifact
May 24, 2026local runLilArcadeRiddle Proofproof packs
Catch card

Neon app profiles drifted from the reusable proof pack

Reusable proof packs need a synchronization gate in the target app.

What Riddle caught
LilArcade PR #492 added a profile sync/check script, synced the full nine-profile Neon pack surface, and wired npm test through proof:sequencer:check-profiles.
Why it matters
Riddle Proof caught a workflow bug in the proof system itself: the reusable pack had advanced, but the app was no longer carrying the whole proof surface.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: LilArcade had only five local Neon Riddle Proof profiles while the published Neon Step Sequencer pack had nine, and the existing local files were missing newer pack checks, receipts, and metadata.

Why normal checks missed it: The app tests and individual proof runs could still pass because they only exercised whichever local profile files happened to exist. Nothing forced the app-local profiles to stay generated from the reusable pack, so coverage could quietly lag behind the framework work.

Proof lesson: Reusable proof packs need a synchronization gate in the target app. Otherwise the pack can improve while the app keeps running stale local profiles and loses coverage without an obvious failure.

Evidence: LilArcade PR #492 added a profile sync/check script, synced the full nine-profile Neon pack surface, and wired npm test through proof:sequencer:check-profiles. The catch found four absent local profiles: explore-songs-and-mixes, mobile-trainer-layout, playback-sync, and source-readiness. It also refreshed five stale profiles. Validation passed with 103 app tests, a production build, a local fast mix health proof with 7 checks and 9 setup actions, and a new source-readiness proof with 4 checks and all required source states idle.

Neon mix candidates needed a durable source handoff Riddle artifact
May 24, 2026local runLilArcadeRiddle Proofhuman review
Catch card

Neon mix candidates needed a durable source handoff

A creative proof loop should separate three things: objective receipts, human or surrogate approval, and durable application.

What Riddle caught
Run 007 produced a human-review packet with status candidate_applied_for_listening_review: Monkberry Moon Delight (Tab), candidate chord -0.10, mixer action chord 0.38 -> 0.28, 6 supported candidates, 0 rejected candidates, approval mode mixing_canon_surrogate, approvedCandidateApplied true, candidateActionsAreTransient false, and ranking role review_order_only.
Why it matters
Riddle Proof did not pretend the chord reduction was artistically better.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Neon ratchet loop could produce and apply a proof-backed mix candidate inside the running browser, but that was still not enough to make the change a durable, reviewable source edit.

Why normal checks missed it: A normal browser proof can stop once the app state says the approved candidate was applied. That misses the last-mile question: whether the accepted candidate became a narrow source-level patch, whether its approval basis stayed attached, and whether the current target still passes after the durable edit lands.

Proof lesson: A creative proof loop should separate three things: objective receipts, human or surrogate approval, and durable application. Riddle Proof can prove that a candidate changed a measurable mix control and stayed inside guardrails; the handoff should still say that musical taste needs listening review.

Evidence: Run 007 produced a human-review packet with status candidate_applied_for_listening_review: Monkberry Moon Delight (Tab), candidate chord -0.10, mixer action chord 0.38 -> 0.28, 6 supported candidates, 0 rejected candidates, approval mode mixing_canon_surrogate, approvedCandidateApplied true, candidateActionsAreTransient false, and ranking role review_order_only. LilArcade PR #490 added a targeted neon-approved-mix-overrides.json entry and a durable-patch plan CLI that refuses transient or unapproved packets. The follow-up local current-target proof passed with the browser contract reading chordLevel 0.28, mix peak 0.8303, RMS 0.1234, clipping false, lowLevel false, 6 active instruments, clean route/layout checks, and no fatal console events.

Neon Step Sequencer had hidden clipping in built-in mixes Riddle artifact
May 24, 2026local runLilArcadeRiddle Proofaudio guardrails
Catch card

Neon Step Sequencer had hidden clipping in built-in mixes

Audio proof should separate objective guardrails from taste.

What Riddle caught
Run 005 first exposed proof and app-contract gaps while making the exploration sweep real: arbitrary song/part states needed tempo/bar-count normalization, and saved/song snapshots preserved rhythmSynthEnabled but not bass/chord/guitar lane flags.
Why it matters
Riddle Proof caught an objective audio regression in a running app without pretending to judge taste: it found clipping, explained the weak proof layers that had to be fixed first, and ended with inspectable before/final receipts.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: A bounded Neon Step Sequencer exploration sweep found objective clipping in built-in song presets even though the UI loaded, audio sources prepared, and the main Monkberry Tab proof windows were healthy.

Why normal checks missed it: A route smoke, a single selected-song proof, or a subjective listen pass could all miss this. The issue surfaced only after the proof contract swept multiple song/part states, normalized historical snapshots, rendered offline audio windows, and treated peak/headroom receipts as guardrails.

Proof lesson: Audio proof should separate objective guardrails from taste. Riddle Proof can prove that a running app prepared sources, rendered bounded audio windows, preserved lane activity, avoided clipping, and produced a confidence map. It should not claim that the mix is artistically better without human review.

Evidence: Run 005 first exposed proof and app-contract gaps while making the exploration sweep real: arbitrary song/part states needed tempo/bar-count normalization, and saved/song snapshots preserved rhythmSynthEnabled but not bass/chord/guitar lane flags. Once those were fixed, the sweep produced product findings: Yakety Yak (Dark) part 0 clipped at peak 1.0531, Yakety Yak (Dark) part 1 clipped at peak 1.0626, and Monkberry Moon Delight (Sheet) part 0 tripped the clipping guardrail. The final local run asserted __neonProof.exploration.ok === true, sampled 4 songs and 8 song/part entries, passed all 8 entries, produced 0 prioritized findings, kept final sampled peaks below threshold, and stayed clean for fatal console errors and horizontal overflow.

Ski Adventure touch input landed half a player width off Riddle artifact
May 20, 2026< $0.01LilArcademobile inputgameplay geometry
Catch card

Ski Adventure touch input landed half a player width off

Input proofs should check geometry, not only movement.

What Riddle caught
Initial phone job job_7e220c74 proved live trees existed, then dragged on .game-area from ratio 0.5,0.82 to 0.8,0.82.
Why it matters
Riddle Proof caught a subtle mobile gameplay-feel bug that a clickability or movement smoke would likely miss: the player moved, but under the user finger it felt wrong.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: Ski Adventure responded to touch movement, but the skier landed one half-width left of the finger because the touch handler subtracted half the player width before CSS translateX(-50%) applied the same offset visually.

Why normal checks missed it: A basic mobile smoke could prove the skier moved and the game stayed playable. The regression only appeared when the proof compared the drag target against the rendered player center after proving live obstacles existed.

Proof lesson: Input proofs should check geometry, not only movement. For touch games, prove live gameplay first, dispatch a real gesture, then compare intended target coordinates to the rendered actor center.

Evidence: Initial phone job job_7e220c74 proved live trees existed, then dragged on .game-area from ratio 0.5,0.82 to 0.8,0.82. The skier moved right but landed with targetX 264.8, visualCenterX 249.953125, playerX 249.962, playerWidth 30, and alignmentError 14.846875. LilArcade PR #462 changed the touch handler to use the existing setPlayerCenterX helper with the raw touch center. Final production matrix job job_f9b9b6ce passed across desktop, phone, iPad Mini, and iPad with phone alignment error 0.153125, tablet alignment error 0.353125, 0px overflow, and clean browser health.

Coin Clicker dashboard milestone ETA used wrong source of truth Riddle artifact
May 20, 2026< $0.01LilArcadedashboard mathstate seeding
Catch card

Coin Clicker dashboard milestone ETA used wrong source of truth

Dashboards need source-of-truth proofs, not only visible-label proofs.

What Riddle caught
Initial desktop job job_7e208368 seeded a saved Coin Clicker state with 100000 current coins, 250000 lifetime totalCoins, and spent upgrades.
Why it matters
Riddle Proof caught a source-of-truth math bug in a healthy-looking dashboard by seeding realistic spent-state data and checking the arithmetic contract behind the displayed copy.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: Coin Clicker Math Dashboard selected the next total-earned milestone correctly, but computed the ETA from current spendable coins instead of lifetime totalCoins after the player had spent currency on upgrades.

Why normal checks missed it: The dashboard looked healthy: the route loaded, upgrades rendered, and the next milestone was the right 1.00M target. The bug only surfaced after seeding realistic prior-spend state and checking the arithmetic behind the ETA text.

Proof lesson: Dashboards need source-of-truth proofs, not only visible-label proofs. Seed realistic persisted state, prove the chosen milestone, and assert the displayed estimate uses the same metric as the milestone definition.

Evidence: Initial desktop job job_7e208368 seeded a saved Coin Clicker state with 100000 current coins, 250000 lifetime totalCoins, and spent upgrades. The dashboard chose the next total-earned milestone 1.00M but displayed in ~5h 40m, which came from spendable coins, instead of the total-earned contract in ~4h 44m. LilArcade PR #460 changed MathDashboard to subtract totalCoins for total-earned milestone ETA. Final production matrix job job_7e5005e1 passed across desktop, phone, iPad Mini, and iPad with Next Milestone 1.00M in ~4h 44m, reset-cleared receipts, 0px overflow, and clean browser health.

Dashboard retry copy had no retry button Riddle artifact
May 19, 2026< $0.01Riddle siteDashboardactionable recovery
Catch card

Dashboard retry copy had no retry button

Recovery copy should include a local recovery action when the user can retry without leaving the page.

What Riddle caught
Initial production job job_246afd37 loaded /dashboard/ across desktop, phone, iPad Mini, and iPad with valid balance and API-key mocks while /v1/jobs?limit=10 returned HTTP 503.
Why it matters
Riddle Proof caught a practical recovery UX gap: the dashboard no longer lied about failed history, but it still left users without the local retry action the copy promised.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated Dashboard correctly stopped showing a failed recent-jobs load as an empty account, but the recovery copy told users to try again without rendering a Retry recent jobs button.

Why normal checks missed it: The page looked broadly healthy: auth worked, balance loaded, API keys loaded, the recent-jobs section showed an honest unavailable message, raw backend details stayed hidden, layout stayed stable, and browser health was clean. The issue only surfaced when the proof required the recovery state to be actionable, not merely truthful.

Proof lesson: Recovery copy should include a local recovery action when the user can retry without leaving the page. Honest error text is only half the contract for account dashboards.

Evidence: Initial production job job_246afd37 loaded /dashboard/ across desktop, phone, iPad Mini, and iPad with valid balance and API-key mocks while /v1/jobs?limit=10 returned HTTP 503. The proof showed Recent jobs unavailable. Please try again. and healthy adjacent account data, but failed because .recent-jobs-section .jobs-retry-button and Retry recent jobs were absent in every viewport. Riddle-site PR #191 added a Retry recent jobs button, refactored recent-jobs loading into a retryable function, and kept balance/API-key data stable. Final production matrix job job_1b771150 passed 27 checks across all four viewports with the retry affordance visible. Final production click-through job job_8e924e2f clicked Retry recent jobs and proved stale recovery copy disappeared after job_v556_retry_recovered rendered.

Dashboard recent jobs failure looked like no jobs Riddle artifact
May 19, 2026< $0.01Riddle siteDashboardaccount-state honesty
Catch card

Dashboard recent jobs failure looked like no jobs

List-load failures are not empty states.

What Riddle caught
Initial production job job_ea46646e loaded /dashboard/ across desktop, phone, iPad Mini, and iPad with valid balance and API-key mocks while /v1/jobs?limit=10 returned HTTP 503.
Why it matters
Riddle Proof caught an account-state lie in a Dashboard recent-jobs surface: a failed history load was shown as an empty account while other account data looked healthy.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated Dashboard rendered the empty-account recent-jobs state after the recent jobs endpoint failed, even though balance and API-key data loaded normally.

Why normal checks missed it: Most of the dashboard looked healthy: auth worked, balance loaded, API keys loaded, layout stayed stable, and the failed jobs request was just one section of a larger page. A shallow dashboard smoke could see useful account data and miss that a service outage was represented as no user activity.

Proof lesson: List-load failures are not empty states. Account dashboards should distinguish unavailable history from an empty account while preserving independent account data and keeping raw backend details out of the UI.

Evidence: Initial production job job_ea46646e loaded /dashboard/ across desktop, phone, iPad Mini, and iPad with valid balance and API-key mocks while /v1/jobs?limit=10 returned HTTP 503. The proof showed Browser Time Balance, 2h 15m, one active job, API Keys, and Riddle Proof v553 existing key, but Recent Jobs rendered No jobs yet and no .dashboard-inline-error. Riddle-site PR #189 added a separate recentJobsError state, rendered Recent jobs unavailable. Please try again., hid the empty state on load failure, and added a static guard. Final production job job_24bf862f passed 25 checks with the unavailable copy visible, No jobs yet absent, balance/API-key content still visible, backend message/code absent, 0px overflow, 0 warnings, and 0 fatal console/page errors.

Playground screenshot hid secondary terminal evidence Riddle artifact
May 19, 2026< $0.01Riddle sitePlaygroundevidence honesty
Catch card

Playground screenshot hid secondary terminal evidence

No screenshots does not mean no proof evidence.

What Riddle caught
Initial production job job_96fb3328 loaded /playground/ across desktop, phone, iPad Mini, and iPad, submitted sync:true Screenshot requests 4/4 with console and HAR included, and received terminal completed_error JSON with job_rp551_sync_screenshot_secondary_only, one console log, one warning, one HAR entry, and no screenshot.
Why it matters
Riddle Proof caught secondary evidence being hidden when a terminal Screenshot response returned no image but did return console and HAR artifacts.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated Playground Screenshot flow received a terminal completed_error JSON response with console and HAR evidence but no screenshot, then treated it like an async job receipt and kept waiting instead of rendering the returned evidence.

Why normal checks missed it: The request itself was correct: auth worked, Screenshot mode submitted sync:true with console and HAR included, the mock was hit exactly once per viewport, and the response carried a job ID plus useful secondary artifacts. A shallow API or click smoke could stop at the successful submission and miss that the visible receipt never appeared.

Proof lesson: No screenshots does not mean no proof evidence. Terminal receipts should render returned console, HAR, raw response, job ID, and service error evidence immediately, even when image evidence is absent.

Evidence: Initial production job job_96fb3328 loaded /playground/ across desktop, phone, iPad Mini, and iPad, submitted sync:true Screenshot requests 4/4 with console and HAR included, and received terminal completed_error JSON with job_rp551_sync_screenshot_secondary_only, one console log, one warning, one HAR entry, and no screenshot. The proof failed because .result-state never appeared; Screenshot mode interpreted the terminal JSON as an async job_id receipt and stayed in artifact polling, hiding the secondary evidence. Riddle-site PR #187 routed terminal Screenshot JSON through the shared result receipt path before async polling and added a static guard. Final production job job_80b21e0f passed 32 checks with Error, exact service message, job ID, partial results available, honest No screenshots captured copy, Console Output, Network HAR, Raw Response, no Success or empty-evidence copy, 0px overflow, 0 warnings, and 0 fatal console/page errors.

Playground screenshot leaked malformed success body Riddle artifact
May 19, 2026< $0.01Riddle sitePlaygroundrecovery honesty
Catch card

Playground screenshot leaked malformed success body

Handled action recovery needs content proof, not only an error box.

What Riddle caught
Initial production job job_cccf7a47 loaded /playground/ across desktop, phone, iPad Mini, and iPad, submitted sync:true Screenshot requests 4/4 with console and HAR included, and mocked HTTP 200 application/json with a malformed body.
Why it matters
Riddle Proof caught raw-response leakage in a malformed-success recovery that otherwise looked visibly handled and browser-clean.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated Playground Screenshot flow handled an HTTP 200 response with malformed JSON by exposing parser-specific failure text and the raw malformed response body to the user.

Why normal checks missed it: Most of the flow looked healthy: authentication worked, Screenshot mode submitted the right sync request, the error container appeared, mock counts matched, browser health was clean, and layout stayed stable. A shallow recovery check would see a handled error state and miss that the recovery copy leaked internal response details.

Proof lesson: Handled action recovery needs content proof, not only an error box. Successful HTTP status with malformed JSON should render a generic action-specific fallback while rejecting raw response text, parser text, contradictory success states, and browser noise in the same viewport matrix.

Evidence: Initial production job job_cccf7a47 loaded /playground/ across desktop, phone, iPad Mini, and iPad, submitted sync:true Screenshot requests 4/4 with console and HAR included, and mocked HTTP 200 application/json with a malformed body. The proof failed because Failed to take screenshot. Please try again. was absent while Failed to parse JSON response and the raw body {not valid playground screenshot success json were visible in every viewport. Riddle-site PR #185 added fallback-aware sync JSON parsing for Screenshot, Workflow, and Script modes and static guards against raw JSON/parser leaks. Final production job job_2ec55527 passed 20 checks with the generic screenshot fallback visible, raw/parser/object/success text absent, the error state count correct, 0px overflow, 0 warnings, and 0 fatal console/page errors.

Playground timeout hid partial evidence Riddle artifact
May 18, 2026< $0.01Riddle sitePlaygroundresult honesty
Catch card

Playground timeout hid partial evidence

Terminal timeout receipts need the same artifact-honesty contract as terminal errors.

What Riddle caught
Initial production job job_d7a29899 loaded /playground/ across desktop, phone, iPad Mini, and iPad, submitted sync:true Workflow requests 4/4 with screenshots, console, and HAR included, and rendered Timed Out, the exact timeout message, screenshot evidence, console evidence, HAR evidence, and raw response debug.
Why it matters
Riddle Proof caught incomplete evidence-status copy in a timeout receipt that otherwise looked healthy: the debug evidence was preserved, but the user was not told it was partial.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated Playground sync Workflow flow rendered a terminal timeout and preserved screenshot, console, HAR, raw response, and billing evidence, but failed to tell the user that the returned evidence was partial.

Why normal checks missed it: The receipt looked mostly healthy: the route loaded, auth worked, the Workflow request body was correct, Timed Out rendered, the timeout message rendered, artifact sections were populated, and browser health was clean. A shallow timeout check would stop there and miss that the evidence status copy was incomplete.

Proof lesson: Terminal timeout receipts need the same artifact-honesty contract as terminal errors. If screenshots, console logs, or HAR entries survive a timeout, the UI should label them as partial results while keeping success/error states and browser health honest.

Evidence: Initial production job job_d7a29899 loaded /playground/ across desktop, phone, iPad Mini, and iPad, submitted sync:true Workflow requests 4/4 with screenshots, console, and HAR included, and rendered Timed Out, the exact timeout message, screenshot evidence, console evidence, HAR evidence, and raw response debug. The proof failed because partial results available was absent in every viewport. Riddle-site PR #183 added timeout partial-results metadata, computed it from screenshots, console logs, or HAR evidence across sync and async Playground branches, and added static guards. Final production job job_5ef41407 passed 30 checks with Timed Out, the exact timeout message, partial results available, screenshot/console/HAR evidence, raw response debug, 0px overflow, 0 warnings, and 0 fatal console/page errors.

Playground sync terminal error looked successful Riddle artifact
May 18, 2026< $0.01Riddle sitePlaygroundresult honesty
Catch card

Playground sync terminal error looked successful

Artifact preservation and result honesty are separate contracts.

What Riddle caught
Initial production job job_994d843b loaded /playground/ across desktop, phone, iPad Mini, and iPad, submitted sync:true Script requests 4/4 with screenshots, console, and HAR included, and rendered the returned screenshot, console log/error, HAR request, raw response, and billing metadata.
Why it matters
Riddle Proof caught a semantic receipt lie in a proof surface that otherwise looked artifact-rich: the debug evidence was there, but the user-facing result said the failed job succeeded.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated Playground sync Script flow preserved screenshot, console, HAR, raw response, and billing evidence from a terminal completed_error response, but labeled the run Success and hid the service error message.

Why normal checks missed it: The proof surface looked rich: the route loaded, the sync request body was correct, the screenshot appeared, console and HAR evidence expanded, and billing metadata rendered. A shallow artifact check would see plenty of useful evidence and miss that the top-level receipt contradicted the terminal backend status.

Proof lesson: Artifact preservation and result honesty are separate contracts. Proof profiles for sync terminal JSON should assert the visible status, service error copy, partial-results copy, success/error selectors, artifact sections, and browser health together.

Evidence: Initial production job job_994d843b loaded /playground/ across desktop, phone, iPad Mini, and iPad, submitted sync:true Script requests 4/4 with screenshots, console, and HAR included, and rendered the returned screenshot, console log/error, HAR request, raw response, and billing metadata. The proof failed because the Playground showed Success, omitted Synthetic v538 sync script failed after collecting partial evidence, omitted partial results available, rendered no .error-warning, and kept .success-indicator visible. Riddle-site PR #181 derived sync JSON success, timeout, and error state from terminal status, treated completed_error and failed as errors, preserved partial evidence, and added static guard snippets. Final production job job_4bc68ef7 passed 30 checks across desktop, phone, iPad Mini, and iPad with Error, the exact service message, partial results available, screenshot/console/HAR evidence, raw response debug, 0px overflow, 0 warnings, and 0 fatal console/page errors.

Billing history failure looked like no transactions Riddle artifact
May 18, 2026< $0.01Riddle siteBillingload recovery
Catch card

Billing history failure looked like no transactions

List-load recovery profiles should prove that a failed optional list is not rendered as an empty list.

What Riddle caught
Initial corrected production job job_e99911a5 mocked /api/billing/history to return 503 while balance and auto-recharge loaded normally across desktop, phone, iPad Mini, and iPad.
Why it matters
Riddle Proof caught an account-state lie in a Billing surface: the page looked usable, but a failed data load was being reported as if the user had no transactions.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated Billing page handled balance and auto-recharge correctly when transaction history failed, but translated the failed history request into the empty-account state No transactions yet.

Why normal checks missed it: Most of the page looked healthy: the user was signed in, balance rendered, active job copy rendered, purchase controls loaded, auto-recharge loaded, and the history section stayed visible. The bug was a small but important state distinction inside one optional list.

Proof lesson: List-load recovery profiles should prove that a failed optional list is not rendered as an empty list. The contract needs explicit failure copy, empty-state absence, raw backend text absence, and browser/layout health in the same run.

Evidence: Initial corrected production job job_e99911a5 mocked /api/billing/history to return 503 while balance and auto-recharge loaded normally across desktop, phone, iPad Mini, and iPad. The proof failed because Transaction history unavailable. Please try again. was missing, No transactions yet. was visible, and .transaction-history .error-message was absent. Riddle-site PR #179 added a transaction-history-specific recovery state and removed the handled warning path. Final production job job_59ebe466 passed 25 checks with one visible history error, the empty state absent, raw backend text absent, expected resource console noise allowed, 0 warnings, and 0px overflow.

Dashboard API-key transport failure logged fatal Riddle artifact
May 18, 2026< $0.01Riddle sitedashboardnetwork recovery
Catch card

Dashboard API-key transport failure logged fatal

Transport-failure profiles should distinguish expected browser resource noise from application-level fatal logging, then prove the visible recovery state and browser health together.

What Riddle caught
Initial production job job_e27b8e47 used Riddle Proof abort mocks to make the API-key DELETE fail at the fetch layer across desktop, phone, iPad Mini, and iPad.
Why it matters
Riddle Proof caught operational-health debt in a transport failure that looked visibly handled: the UI recovered, but browser evidence still showed an app-level fatal error.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Dashboard API-key revoke flow handled a fetch-level transport failure visibly, but still emitted an app-level fatal console error for the expected recovery path.

Why normal checks missed it: The visible UI looked safe: canceling the confirm dialog preserved the active key, accepting the dialog kept the key active after the failed request, and the user saw Failed to revoke API key. The bug was hidden in browser evidence: the app logged the handled TypeError as fatal while the mocked net::ERR_FAILED resource event was expected.

Proof lesson: Transport-failure profiles should distinguish expected browser resource noise from application-level fatal logging, then prove the visible recovery state and browser health together.

Evidence: Initial production job job_e27b8e47 used Riddle Proof abort mocks to make the API-key DELETE fail at the fetch layer across desktop, phone, iPad Mini, and iPad. The profile passed the cancel/accept dialog counts, preserved the active key row, showed Failed to revoke API key: Failed to fetch, and allowed the expected net::ERR_FAILED resource event, but failed on the unallowed Error revoking API key console error. After Riddle-site PR #177 removed the handled console.error, final production job job_ed5c02fc passed 28 checks with the same visible recovery and clean app/page health.

Playground Batch discarded secondary error artifacts Riddle artifact
May 18, 2026< $0.01Riddle sitePlaygroundartifacts
Catch card

Playground Batch discarded secondary error artifacts

Failure receipts should preserve all useful evidence, not only screenshots.

What Riddle caught
Initial production job job_6c3bcbf9 loaded /playground/ across desktop, phone, iPad Mini, and iPad, submitted Batch /v1/run 4/4 with sync:false, used the custom artifacts_url 4/4, and avoided the guessed /artifacts URL.
Why it matters
Riddle Proof caught evidence loss in a page that otherwise looked like it had handled the failure: the useful debug artifacts existed, but the UI hid them.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated Playground Batch result rendered a terminal error and no-screenshot guidance, but silently discarded returned console.json and network.har artifacts.

Why normal checks missed it: The route loaded, auth state worked, Batch submitted with sync:false, the service-provided artifacts_url was used, the backend error rendered, and the UI honestly said no screenshots were captured. A shallow check would see an error receipt and stop; the proof required secondary artifacts to be fetched and shown as partial evidence.

Proof lesson: Failure receipts should preserve all useful evidence, not only screenshots. Console and HAR artifacts are partial results too when a terminal job has no images.

Evidence: Initial production job job_6c3bcbf9 loaded /playground/ across desktop, phone, iPad Mini, and iPad, submitted Batch /v1/run 4/4 with sync:false, used the custom artifacts_url 4/4, and avoided the guessed /artifacts URL. The proof still failed because console.json and network.har were returned but never fetched, partial results available was absent, and no secondary evidence rendered. Riddle-site PR #175 changed Batch to fetch console/HAR artifacts from terminal artifact files, count screenshots, console, and HAR as partial evidence, and render the secondary evidence sections. Static Preview job job_be3524fc and final production job job_5b0cd240 passed with console/HAR fetched 4/4, partial results available visible, No screenshots captured still honest, secondary evidence expandable, clean app/page health, warning hygiene, and 0px overflow.

Docs code copy claimed success after clipboard denial Riddle artifact
May 18, 2026< $0.01Riddle siteDocsclipboard
Catch card

Docs code copy claimed success after clipboard denial

Clipboard-copy controls need feedback honesty under browser permission restrictions.

What Riddle caught
Initial production job job_bc7d9340 loaded /docs/preview/ across desktop, phone, iPad Mini, and iPad, clicked the first code-block copy button, and saw Copied!
Why it matters
Riddle Proof caught feedback dishonesty in a public agent-facing docs control: the UI claimed success while the browser recorded a failed clipboard write.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The public Preview docs code-copy control visibly claimed Copied! after the browser denied clipboard write permission, while the page recorded an unhandled clipboard permission error.

Why normal checks missed it: The docs route loaded, the code block rendered, required docs links stayed present, and the clicked button changed to Copied!. A shallow visual or smoke check would treat that as success; the proof required browser page health after the interaction.

Proof lesson: Clipboard-copy controls need feedback honesty under browser permission restrictions. A visible success label is not enough when the browser API rejected the write.

Evidence: Initial production job job_bc7d9340 loaded /docs/preview/ across desktop, phone, iPad Mini, and iPad, clicked the first code-block copy button, and saw Copied! in every viewport. The proof still failed because page_error_count was 1 and the browser page error was Failed to execute writeText on Clipboard: Write permission denied. Riddle-site PR #173 moved public copy flows onto a shared helper that reports failure honestly without throwing. Static Preview job job_81b738dd and final production job job_59dc499b passed the same docs copy contract with Copied! visible, page_error_count 0, clean fatal console evidence, required docs links preserved, and 0px overflow.

Redeem promo code leaked malformed success body Riddle artifact
May 18, 2026< $0.01Riddle siteRedeemerror handling
Catch card

Redeem promo code leaked malformed success body

Shared backend contracts need per-surface proof.

What Riddle caught
Initial production job job_28a9fecb submitted the promo redeem request across desktop, phone, iPad Mini, and iPad with {"promo_code":"RP523-MALFORMED-SUCCESS"}.
Why it matters
Riddle Proof caught sibling-surface contract drift: a backend recovery path fixed in Billing still leaked raw response text on public Redeem.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The public Redeem promo-code flow visibly recovered from a malformed 200 response, but showed Server error (200) and raw malformed response text to the user.

Why normal checks missed it: The route loaded, synthetic auth worked, the signed-in form stayed usable, the promo POST fired with the expected uppercase request body, and success UI stayed absent. A shallow check would see an error state and stop; the proof asserted the exact recovery text and raw-body absences.

Proof lesson: Shared backend contracts need per-surface proof. Billing had already been fixed for malformed promo responses, but the sibling Redeem route still leaked raw status/body text until it had its own browser contract.

Evidence: Initial production job job_28a9fecb submitted the promo redeem request across desktop, phone, iPad Mini, and iPad with {"promo_code":"RP523-MALFORMED-SUCCESS"}. The page preserved signed-in Redeem state, but failed because Failed to redeem promo code. Please try again. was absent while Server error (200) and not valid redeem success json appeared in all four viewports. Riddle-site PR #171 changed Redeem malformed promo responses to a generic fallback and added static guards. Preview job job_00373d49 and final production job job_3210c98a passed with generic recovery visible, raw/status/parser/object/success text absent, preserved signed-in form state, clean app/page health, and 0px overflow.

Billing promo code leaked malformed success body Riddle artifact
May 17, 2026< $0.01Riddle siteBillingerror handling
Catch card

Billing promo code leaked malformed success body

Handled action recovery is a text contract, not just the presence of an error box.

What Riddle caught
Initial production job job_794020bd submitted the promo redeem request across desktop, phone, iPad Mini, and iPad with {"promo_code":"RP521-MALFORMED-SUCCESS"}.
Why it matters
Riddle Proof caught user-visible raw response leakage in a Billing action that otherwise looked like it had handled the failure.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Billing promo-code redeem flow visibly recovered from a malformed 200 response, but showed Server error (200) and raw malformed response text to the user.

Why normal checks missed it: The Billing route loaded, authentication worked, the promo POST fired with the expected request body, and the surrounding balance, transaction history, and auto-recharge state stayed visible. A shallow check would see an error state and stop; the proof asserted the exact recovery text and raw-body absences.

Proof lesson: Handled action recovery is a text contract, not just the presence of an error box. Successful HTTP status with malformed body should produce generic recovery copy and never leak raw response text.

Evidence: Initial production job job_794020bd submitted the promo redeem request across desktop, phone, iPad Mini, and iPad with {"promo_code":"RP521-MALFORMED-SUCCESS"}. The page preserved Billing state, but failed because Failed to redeem promo code. Please try again. was absent while Server error (200) and not valid promo success json appeared in all four viewports. Riddle-site PR #169 changed malformed promo responses to a generic fallback and added static guards. Preview job job_bfc43eef and final production job job_616c2533 passed with generic recovery visible, raw/status/parser/object/success text absent, preserved Billing state, clean app/page health, and 0px overflow.

Dashboard API key create logged handled parser failure Riddle artifact
May 17, 2026< $0.01Riddle siteDashboardconsole health
Catch card

Dashboard API key create logged handled parser failure

Handled action failures need browser-health proof even when the visible fallback is correct.

What Riddle caught
Initial production job job_4a3386ea submitted the API-key create request across desktop, phone, iPad Mini, and iPad with the expected request body.
Why it matters
Riddle Proof caught hidden browser-health debt in a Dashboard action that looked correctly handled to the user.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Dashboard API-key create flow visibly recovered from a malformed 200 response, but still logged the JSON parser failure as a fatal browser console error.

Why normal checks missed it: The Dashboard route loaded, surrounding account data stayed visible, the existing API key remained active, the create form showed a generic failure, and the success modal stayed closed. The failure only appeared because the proof treated browser console health as part of the handled action contract.

Proof lesson: Handled action failures need browser-health proof even when the visible fallback is correct. A 200 response with malformed body should be treated like a recovery path, not a fatal app error.

Evidence: Initial production job job_4a3386ea submitted the API-key create request across desktop, phone, iPad Mini, and iPad with the expected request body. The page showed Failed to create API key, kept the existing key active, avoided parser text and the success modal, but failed because Error creating API key: SyntaxError appeared as a fatal console error. Riddle-site PR #167 removed that handled console.error and added a static guard. Preview job job_e7070e36 and final production job job_55ca9d1d passed with one recovery error, no modal, clean browser evidence, and 0px overflow.

Playground optional artifacts leaked browser warnings Riddle artifact
May 17, 2026< $0.01Riddle sitePlaygroundartifact handling
Catch card

Playground optional artifacts leaked browser warnings

Optional evidence failures should degrade silently.

What Riddle caught
Initial production job job_4fb7aedd loaded a failed Script result with a valid screenshot artifact and intentionally malformed optional secondary artifacts.
Why it matters
Riddle Proof caught an evidence-product bug that made a reviewable failed result look healthy while optional artifact parsing still leaked browser warning noise.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: A failed async Script result preserved screenshot evidence and rendered the correct error state, but malformed optional console.json leaked a browser console warning from the Playground artifact loader.

Why normal checks missed it: The page was usable, the route stayed stable, network and layout checks passed, and the failure receipt looked reviewable. The issue only appeared because the proof treated browser-health signals as part of the artifact contract.

Proof lesson: Optional evidence failures should degrade silently. Evidence UIs need browser-health checks, not screenshot-only review.

Evidence: Initial production job job_4fb7aedd loaded a failed Script result with a valid screenshot artifact and intentionally malformed optional secondary artifacts. The UI kept the failed receipt reviewable, but no_console_warnings caught a browser warning while fetching console.json. Riddle-site PR #160 made optional secondary artifact parsing silent and guarded it with static checks. Preview job job_7566934e and final production job job_a4ba8716 passed with the failed result still visible, screenshot evidence preserved, 0px overflow, 0 page errors, 0 fatal console errors, and 0 warnings.

Playground partial results were screenshot-biased Riddle artifact
May 17, 2026< $0.01Riddle sitePlaygroundartifact handling
Catch card

Playground partial results were screenshot-biased

Evidence products must avoid screenshot bias.

What Riddle caught
Initial production job job_f8acfe7d loaded a failed Script result with zero screenshots and valid console/HAR artifacts.
Why it matters
Riddle Proof caught screenshot bias in an evidence UI by proving that console and HAR artifacts need the same recovery weight as screenshots.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: A failed Script result with no screenshots but valid console and HAR artifacts rendered the secondary evidence, but omitted the partial results available receipt because the product counted only screenshots as partial evidence.

Why normal checks missed it: The route, failed state, console section, HAR section, layout, browser errors, and warning checks all passed. The semantic receipt check caught that console and HAR evidence were not treated as first-class partial results.

Proof lesson: Evidence products must avoid screenshot bias. Console logs and HAR records are first-class recovery artifacts, especially when screenshots are missing.

Evidence: Initial production job job_f8acfe7d loaded a failed Script result with zero screenshots and valid console/HAR artifacts. The useful secondary evidence rendered, but the UI omitted partial results available. Riddle-site PR #161 changed partial-result detection to count console and HAR entries, then added static guardrails against screenshot-only checks. Preview job job_1fe8aa7b and final production job job_adba0f31 passed with console/HAR evidence visible, partial results available rendered, no empty secondary sections, 0px overflow, 0 page errors, 0 fatal console errors, and 0 warnings.

Dashboard API key modal copy crashed on clipboard denial Riddle artifact
May 17, 2026< $0.01Riddle siteDashboardsecurity controls
Catch card

Dashboard API key modal copy crashed on clipboard denial

Credential controls need browser-permission-aware interaction proof.

What Riddle caught
Initial production job job_6599260b created the API key across desktop, phone, iPad Mini, and iPad, rendered API Key Created!, the one-time secret, and the curl usage snippet, then failed because modal Copy threw a clipboard permission page error and Copied never appeared.
Why it matters
Riddle Proof caught a real browser-permission failure in a credential-copy flow that looked healthy until the proof exercised the modal interaction.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Dashboard one-time API-key modal called the browser clipboard API directly. In Riddle's browser run, clipboard writes were denied, the page threw a fatal browser error, and the modal never reached the expected Copied state.

Why normal checks missed it: The Dashboard loaded, mocked account data rendered, the API-key POST succeeded, the one-time key modal appeared, and the curl snippet was visible. The failure only appeared when the proof clicked Copy inside the browser permission model and treated modal feedback plus page errors as part of the contract.

Proof lesson: Credential controls need browser-permission-aware interaction proof. If the product asks users to copy a secret, the proof should click that exact control, require visible feedback, and fail on page errors across real browser environments.

Evidence: Initial production job job_6599260b created the API key across desktop, phone, iPad Mini, and iPad, rendered API Key Created!, the one-time secret, and the curl usage snippet, then failed because modal Copy threw a clipboard permission page error and Copied never appeared. Riddle-site PR #155 added guarded clipboard handling, textarea fallback, visible Copied state, manual-copy error recovery, and static guardrails. Preview job job_8cf778ad and final production job job_28c9b5b7 passed with exact create request-body receipts, visible Copied feedback, 0px overflow, 0 page errors, 0 fatal console errors, and 0 warnings.

Dashboard MCP token copy crashed on clipboard denial Riddle artifact
May 17, 2026< $0.01Riddle siteDashboardsecurity controls
Catch card

Dashboard MCP token copy crashed on clipboard denial

Security-sensitive controls need interaction proof, not just visual proof.

What Riddle caught
Initial production job job_0c54da5c proved the authenticated Dashboard loaded across desktop, phone, iPad Mini, and iPad, then failed because Copy token threw a clipboard permission page error.
Why it matters
This is authenticated product proof material: Riddle Proof caught a real browser-permission failure in a token-control flow that looked healthy until the proof exercised the interaction.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Dashboard MCP Login Token card called the browser clipboard API directly. In Riddle's browser run, clipboard writes were denied, the page threw a fatal browser error, and the token copy flow never reached the expected masked Copied state.

Why normal checks missed it: The Dashboard loaded, mocked account data rendered, the token stayed masked, and the buttons looked right. The failure only appeared when the proof clicked Copy token inside the browser permission model and treated page errors plus token-redaction state as part of the contract.

Proof lesson: Security-sensitive controls need interaction proof, not just visual proof. Copy, reveal, hide, and redaction states should be checked as a state machine across browser environments.

Evidence: Initial production job job_0c54da5c proved the authenticated Dashboard loaded across desktop, phone, iPad Mini, and iPad, then failed because Copy token threw a clipboard permission page error. Riddle-site PR #153 added a guarded clipboard helper, textarea fallback, visible manual-copy guidance, static guardrails, and copy-state reset. Preview job job_27b77be1 and final production job job_49453ba2 passed with the token masked before copy, masked after copy, revealed only after explicit Reveal, hidden again after Hide, 0px overflow, 0 page errors, 0 fatal console errors, and 0 warnings.

Docs Markdown leaked code entities Riddle artifact
May 17, 2026< $0.01Riddle siteagent markdowndocs copy
Catch card

Docs Markdown leaked code entities

Agent-readable docs need their own proof surface.

What Riddle caught
Initial production job job_ced017b6 proved the rendered Riddle Proof docs were healthy but the markdown export still leaked escaped code entities.
Why it matters
This is a public agent-surface catch: Riddle Proof caught docs drift that a human browser review could miss because the bug lived in the markdown contract agents actually read.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The agent-facing Riddle Proof docs markdown export leaked HTML entity text into a code example, so the rendered docs were readable but the raw markdown that agents consume carried a stale escaped-code contract.

Why normal checks missed it: The human docs page loaded, the visible section looked correct, and normal route checks would not read the raw markdown body as an agent would. The failure only surfaced when the proof treated rendered docs and /docs/riddle-proof/markdown.md as separate public contracts.

Proof lesson: Agent-readable docs need their own proof surface. A rendered page can be correct while the markdown export that models and CLIs rely on is stale, escaped, or semantically different.

Evidence: Initial production job job_ced017b6 proved the rendered Riddle Proof docs were healthy but the markdown export still leaked escaped code entities. Riddle-site PR #151 fixed the markdown output and static guard. Static Preview job job_8a7bd738 and final production job job_764fcc77 passed with the agent markdown clean, route loaded, 0px overflow, 0 fatal errors, and 0 warnings.

Serverless page taught stale screenshot polling Riddle artifact
May 17, 2026< $0.01Riddle siteserverless docsscreenshot API
Catch card

Serverless page taught stale screenshot polling

API education pages need contract proof, not just route proof.

What Riddle caught
Initial production job job_5cf3a0a0 proved the Serverless page was alive but still described stale screenshot polling behavior instead of the current simple screenshot response.
Why it matters
This is public docs proof material: Riddle Proof caught stale integration guidance before it could keep teaching users and agents the wrong screenshot API shape.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Serverless docs still taught an old screenshot polling response shape, even though the current API returns immediate screenshot evidence for the simple screenshot flow.

Why normal checks missed it: The page loaded, the surrounding copy looked plausible, and stale API snippets can survive normal visual checks. The proof failed because it searched for the exact current contract and rejected the older polling-oriented language.

Proof lesson: API education pages need contract proof, not just route proof. A stale snippet can be more damaging than a missing page because it teaches agents and users the wrong integration path.

Evidence: Initial production job job_5cf3a0a0 proved the Serverless page was alive but still described stale screenshot polling behavior instead of the current simple screenshot response. Riddle-site PR #150 updated the docs and static guard. Static Preview job job_fa59cd7a and final production job job_42300b18 passed with the current screenshot response language visible, stale polling copy absent, 0px overflow, 0 fatal errors, and 0 warnings.

Homepage taught stale screenshot JSON Riddle artifact
May 17, 2026< $0.01Riddle sitehomepagescreenshot API
Catch card

Homepage taught stale screenshot JSON

Homepage examples are integration docs.

What Riddle caught
Initial production job job_0a9292ea proved the homepage loaded cleanly but still carried stale screenshot JSON copy.
Why it matters
This is public conversion-surface proof: Riddle Proof caught stale API teaching copy on the homepage, where wrong examples shape a buyer or agent before they reach deeper docs.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The homepage still advertised a stale screenshot JSON response shape, so the top public entry point taught a contract that no longer matched the simple screenshot API.

Why normal checks missed it: The homepage was visually polished and route-healthy. A normal screenshot review would see the marketing section, not whether the JSON example matched the actual API response contract.

Proof lesson: Homepage examples are integration docs. The public first impression should be checked for exact API contract language, forbidden stale snippets, layout health, and browser noise together.

Evidence: Initial production job job_0a9292ea proved the homepage loaded cleanly but still carried stale screenshot JSON copy. Riddle-site PR #149 updated the homepage example and static guard. Static Preview job job_c1fb675b and final production job job_6d79c361 passed with the current simple screenshot response shape visible, stale JSON copy absent, 0px overflow, 0 fatal errors, and 0 warnings.

Preview guide taught stale URL shape Riddle artifact
May 17, 2026< $0.01Riddle sitePreviewdocs copy
Catch card

Preview guide taught stale URL shape

Preview docs are part of the deployment surface.

What Riddle caught
Initial production job job_e8beb4d5 proved the Preview Tools guide still contained stale Preview URL shape copy.
Why it matters
This is public integration-docs proof: Riddle Proof caught a stale Preview URL contract in the guide that agents use to create before/after proof environments.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Preview Tools guide still taught the old subdomain-style Preview URL and response field names instead of the current /s/pv_* preview_url contract.

Why normal checks missed it: The guide route loaded and the examples looked credible. The failure was a stale integration contract embedded in code snippets, so it needed exact text checks for both required current snippets and forbidden legacy snippets.

Proof lesson: Preview docs are part of the deployment surface. If the docs teach stale URL shapes, agents can deploy correctly and then wire proof runs to the wrong URL field.

Evidence: Initial production job job_e8beb4d5 proved the Preview Tools guide still contained stale Preview URL shape copy. Riddle-site PR #148 updated the guide to use preview_url and https://preview.riddledc.com/s/pv_a1b2c3d4/. Static Preview job job_f1902246 and final production job job_ebf7a878 passed with the current Preview URL contract visible, legacy subdomain snippets absent, 0px overflow, 0 fatal errors, and 0 warnings.

Playground async Workflow ignored artifact URLs Riddle artifact
May 17, 2026< $0.01Riddle sitePlaygroundartifact evidence
Catch card

Playground async Workflow ignored artifact URLs

Agent-facing evidence UIs need artifact-contract checks per mode.

What Riddle caught
Initial production job job_d040676d proved the async Workflow flow across desktop, phone, iPad Mini, and iPad: /v1/run returned job_rp486_workflow_har_artifacts plus an explicit artifacts_url, but the UI never hit that custom URL, hit the forbidden guessed /artifacts URL 4/4 times, and rendered No screenshots captured with no console or Network HAR evidence.
Why it matters
This is proof-driven-product material: Riddle Proof caught a mode-specific missing evidence branch inside Riddle Playground after the Workflow job itself already looked successful.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The public Playground accepted an async Workflow job response with a service-returned artifacts_url, but still polled the guessed default /artifacts URL and rendered no Workflow screenshot, console, or HAR evidence.

Why normal checks missed it: The visible job result looked superficially successful: the Workflow request submitted, the job ID appeared, and the page showed a Success result. The failure lived in the evidence branch: the UI ignored the exact artifact collection URL returned by the service, so the proof artifacts existed but never reached the review surface.

Proof lesson: Agent-facing evidence UIs need artifact-contract checks per mode. Script, Workflow, and Batch can share a product surface while accidentally carrying different artifact URL and payload assumptions.

Evidence: Initial production job job_d040676d proved the async Workflow flow across desktop, phone, iPad Mini, and iPad: /v1/run returned job_rp486_workflow_har_artifacts plus an explicit artifacts_url, but the UI never hit that custom URL, hit the forbidden guessed /artifacts URL 4/4 times, and rendered No screenshots captured with no console or Network HAR evidence. Riddle-site PR #146 added shared artifact URL resolution, hydrated async Workflow screenshots from files[] or artifacts[], fetched Workflow console.json and network.har, and applied the same explicit-URL/files screenshot handling to Batch async. Static Preview job job_4c04dfe6 and final production job job_3fc1124e passed with screenshot label rp486-workflow-before, console log rp486 workflow har console log, HAR request rp486-workflow-resource, custom artifact URL hits 4/4, forbidden fallback hits 0, 0px overflow, 0 fatal errors, and 0 warnings. Follow-up production job job_2b6ea363 proved the adjacent Batch artifact path.

Playground async Script ignored HAR artifacts Riddle artifact
May 17, 2026< $0.01Riddle sitePlaygroundartifact evidence
Catch card

Playground async Script ignored HAR artifacts

Artifact-contract proof needs to check every evidence family, not just the first screenshot or console log.

What Riddle caught
Initial production job job_abb56468 proved the async Script flow across desktop, phone, iPad Mini, and iPad: /v1/run returned job_rp484_async_har_artifacts plus an explicit artifacts_url, the UI polled that custom endpoint 4/4 times, never hit the forbidden default /artifacts URL, rendered screenshot label rp484-har-before, and fetched console output rp484 async har console log.
Why it matters
This is proof-driven-product material: Riddle Proof caught a missing evidence branch inside Riddle Playground after the async run already looked successful.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The public Playground rendered screenshot and console artifacts from an async Script files[] response, but ignored the network.har artifact in the same payload.

Why normal checks missed it: The visible run looked mostly healthy: the async job completed, the custom artifacts_url was used instead of a guessed /artifacts URL, the screenshot label appeared, and console output rendered. The missing branch was narrower: HAR evidence existed in the artifact payload but never became a Network HAR review section.

Proof lesson: Artifact-contract proof needs to check every evidence family, not just the first screenshot or console log. A result can look successful while one artifact type silently disappears from the review surface.

Evidence: Initial production job job_abb56468 proved the async Script flow across desktop, phone, iPad Mini, and iPad: /v1/run returned job_rp484_async_har_artifacts plus an explicit artifacts_url, the UI polled that custom endpoint 4/4 times, never hit the forbidden default /artifacts URL, rendered screenshot label rp484-har-before, and fetched console output rp484 async har console log. It failed because Network HAR and the HAR request rp484-har-resource never appeared, and the same long async run exposed five unused CSS preload warnings. Riddle-site PR #144 hydrated async Script network.har artifacts from url or download_url, reused download_url support for console artifacts, and disabled Playground body-link prefetching. Static Preview job job_b69d0aa4 and final production job job_ad6fa952 passed with screenshot, console, HAR, 0px overflow, 0 fatal errors, and 0 warnings.

Playground hid a single partial screenshot label Riddle artifact
May 17, 2026< $0.01Riddle sitePlaygroundartifact evidence
Catch card

Playground hid a single partial screenshot label

A proof surface is not done when it merely stores evidence.

What Riddle caught
Initial production job job_188e7a69 passed the explicit artifacts_url, files[], download_url, completed_error, partial-results, console, no-fallback, overflow, fatal-console, and warning checks across desktop, phone, iPad Mini, and iPad, but failed because .screenshots-section did not visibly contain rp482-explicit-before-error.
Why it matters
This is proof-driven-product material: Riddle Proof caught an evidence-reviewability bug inside Riddle Playground itself after the underlying async artifact contract already worked.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The public Playground preserved a partial screenshot from an async Script run that ended completed_error, but hid the screenshot name when there was only one screenshot.

Why normal checks missed it: The hard artifact contract worked: /v1/run returned an explicit artifacts_url, the UI used that custom endpoint instead of a guessed /artifacts path, accepted files[] with screenshot download_url, showed the structured error, preserved partial results, fetched console output, and stayed clean on overflow, fatal errors, and warnings. The remaining failure was evidence reviewability: the screenshot existed, but its visible artifact label was missing.

Proof lesson: A proof surface is not done when it merely stores evidence. Failed or partial agent runs need reviewable artifact labels so a reader can connect screenshots, receipts, and console evidence without guessing.

Evidence: Initial production job job_188e7a69 passed the explicit artifacts_url, files[], download_url, completed_error, partial-results, console, no-fallback, overflow, fatal-console, and warning checks across desktop, phone, iPad Mini, and iPad, but failed because .screenshots-section did not visibly contain rp482-explicit-before-error. Riddle-site PR #142 always renders Playground screenshot names. Static Preview job job_ba0cfaf8 and final production job job_8d08662d passed 24 checks with the single partial screenshot label visible.

OpenClaw Moltbook article was referenced but unpublished Riddle artifact
May 17, 2026< $0.01Riddle siteOpenClawblog route
Catch card

OpenClaw Moltbook article was referenced but unpublished

Public proof stories need route, markdown, sitemap, and placeholder checks together.

What Riddle caught
Production jobs job_9e485320 and job_15a4c2d6 failed /blog/openclaw-moltbook across desktop, phone, iPad Mini, and iPad: the route was missing, .blog-post was absent, expected article text was missing, and route/fatal evidence was captured.
Why it matters
This is a public proof-surface catch: Riddle Proof found a broken story route and draft-copy debt in the article explaining Riddle Proof/OpenClaw work, then proved the Preview and production fix with durable artifacts.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: Riddle public blog and agent-facing surfaces referred to the OpenClaw/Moltbook article, but /blog/openclaw-moltbook returned 404 and the unpublished source still carried draft placeholder copy.

Why normal checks missed it: The surrounding blog index, footer, sitemap, and existing article pages were healthy. The issue only surfaced when the proof treated the specific article route, its raw markdown export, and finished-copy markers as one public contract across the viewport matrix.

Proof lesson: Public proof stories need route, markdown, sitemap, and placeholder checks together. A referenced article is not real proof material until the rendered route and raw agent-facing markdown both load and carry finished copy.

Evidence: Production jobs job_9e485320 and job_15a4c2d6 failed /blog/openclaw-moltbook across desktop, phone, iPad Mini, and iPad: the route was missing, .blog-post was absent, expected article text was missing, and route/fatal evidence was captured. Riddle-site PR #140 published the route, generated /blog/openclaw-moltbook/markdown.md, added blog index and sitemap coverage, and replaced placeholder sections with finished copy. Preview job job_83a8fdde then caught a rendered-vs-markdown heading mismatch before ship; fixed Preview job job_561a87ac passed; Amplify job 122 deployed commit ee7f657; final production job job_a9b4b56e passed 14 checks with route 200, raw markdown 200, 0px overflow, 0 fatal errors, and 0 warnings.

Good Catch Diary preloaded noisy route assets Riddle artifact
May 17, 2026< $0.01Riddle siteGood Catch Diarywarning hygiene
Catch card

Good Catch Diary preloaded noisy route assets

Public proof-storytelling pages should be quiet enough to inspect.

What Riddle caught
Initial production job job_3034ef9f proved /proof/good-catches/ had 39 cards, healthy screenshots, healthy evidence anchors, clean llms/docs checks, and 0 fatal errors, but failed no_console_warnings with four unused Next CSS preload warnings.
Why it matters
This is a public proof-surface catch: Riddle Proof found browser warning debt on the page that sells Riddle Proof catches, then proved the fix through Preview and production with durable artifacts.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The human-facing Good Catch Diary rendered the right stories and artifact links, but the browser console still emitted unused CSS preload warnings from prefetched internal routes.

Why normal checks missed it: The route loaded cleanly, all 39 catch cards rendered, the newest Profile Warnings story was visible, 39 screenshot artifacts and 39 evidence links were healthy, /llms.txt and Riddle Proof markdown were healthy, overflow stayed at 0px, and fatal console/page errors were clean. The failure lived in warning-level browser noise that only the no_console_warnings contract treats as product evidence.

Proof lesson: Public proof-storytelling pages should be quiet enough to inspect. Warning hygiene catches performance and browser-noise debt on pages that otherwise look correct, and it keeps later proof runs from normalizing noisy evidence.

Evidence: Initial production job job_3034ef9f proved /proof/good-catches/ had 39 cards, healthy screenshots, healthy evidence anchors, clean llms/docs checks, and 0 fatal errors, but failed no_console_warnings with four unused Next CSS preload warnings. Riddle-site PR #133 disabled route prefetch on the diary internal links. Static Preview job job_f6aebd25 and final production job job_f448859c passed 14 checks across desktop, phone, iPad Mini, and iPad with 0 console warnings.

Profile Warnings docs lagged behind the shipped surface Riddle artifact
May 17, 2026< $0.01Riddle siteRiddle ProofProfile Warnings
Catch card

Profile Warnings docs lagged behind the shipped surface

The proof product needs proof for its own proof-authoring contract.

What Riddle caught
Initial production job job_6d37e766 proved /docs/riddle-proof/ was otherwise healthy while rendered docs missed Profile Warnings and /docs/riddle-proof/markdown.md missed Profile Warnings plus warnings.
Why it matters
This is a self-audit catch with durable artifacts: Riddle Proof caught docs drift for its own newly shipped warning surface and proved the human page, agent markdown, Preview, and production fix with the same profile.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: Riddle Proof had shipped nonblocking profile warnings for ambiguous network mock response selectors, but the public Riddle Proof docs and agent-facing markdown did not mention Profile Warnings or the warnings result field.

Why normal checks missed it: The docs page loaded cleanly, the existing Profile Mode section and network mock terms were visible, /docs/riddle-proof/markdown.md returned 200 text/markdown with nonzero bytes, overflow stayed at 0px, and fatal console/page errors were clean. The gap was semantic: the newly shipped warning surface was absent from both human and machine-consumable docs.

Proof lesson: The proof product needs proof for its own proof-authoring contract. When a package adds evidence, warnings, or profile semantics, rendered docs and raw agent-facing markdown should be audited together so agents do not rediscover shipped behavior from changelogs or dogfood notes.

Evidence: Initial production job job_6d37e766 proved /docs/riddle-proof/ was otherwise healthy while rendered docs missed Profile Warnings and /docs/riddle-proof/markdown.md missed Profile Warnings plus warnings. Riddle-site PR #131 added the rendered and markdown docs, sidebar link, static guard, and docs-proof checks. Static Preview ps_08e95368 passed as job_cc796a28, Amplify job 112 deployed commit 80b3780, and final production job job_ad1cc6ec passed 9 checks across desktop, phone, iPad Mini, and iPad.

Builder accepted a saved preview path as a fresh build Riddle artifact
May 17, 2026< $0.05LilArcadeBuilderpreview boundary
Catch card

Builder accepted a saved preview path as a fresh build

Preview URL safety is contextual.

What Riddle caught
Initial production job job_643e881d caught Builder rendering Open in new tab and Save to Arcade for the same-host saved preview URL, with forbidden-saved-preview-path-v454 hit twice.
Why it matters
This is a real product-boundary catch: normal same-host URL checks were not enough, but browser proof caught the unsafe Builder preview path before it became a reusable save/open escape.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: LilArcade Builder accepted a same-host saved preview URL from /saved/riddle-proof-v454-sneaky-existing/index.html as if it were a fresh build artifact, leaving Open in new tab and Save to Arcade available instead of rejecting the stale saved-artifact path.

Why normal checks missed it: The unsafe URL used the trusted preview bucket host, auth and chat both worked, and the page stayed visually stable. The defect only surfaced when proof treated Builder preview context as stricter than saved-player context and required the forbidden saved-preview-path mock to stay at zero hits.

Proof lesson: Preview URL safety is contextual. A URL sanitizer that is correct for saved player pages can be too permissive for fresh Builder builds, so proof profiles should encode allowed artifact prefixes, forbidden network hits, and recovery behavior together.

Evidence: Initial production job job_643e881d caught Builder rendering Open in new tab and Save to Arcade for the same-host saved preview URL, with forbidden-saved-preview-path-v454 hit twice. LilArcade PR #444 added a Builder-only preview URL gate. Static Preview job job_5e778318 and final production job job_bc56fa3c passed: the saved path was rejected, the forbidden mock stayed at 0 hits, and the recovery build could still be saved and played.

Evidence Manifest preloaded noisy unused assets Riddle artifact
May 16, 2026< $0.01Riddle siteGood Catch Diarywarning hygiene
Catch card

Evidence Manifest preloaded noisy unused assets

Warning hygiene deserves its own contract.

What Riddle caught
Initial production job job_07d46452 used the new no_console_warnings contract and caught 9 unused Next CSS preload warnings while the page otherwise passed.
Why it matters
This is a proof-system dogfood catch: a newly promoted base Riddle Proof warning contract immediately found real warning/performance debt on Riddle public proof material.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The public Good Catch Evidence Manifest looked healthy, but browser console evidence showed unused preload warnings from automatic route prefetch and eager below-the-fold proof screenshots.

Why normal checks missed it: The route loaded, all 36 manifest cards rendered, the Profile Mode docs catch and job IDs were visible, overflow stayed at 0px, and fatal console/page errors were clean. The issue lived in warning-level browser noise that the previous fatal-console contract intentionally ignored.

Proof lesson: Warning hygiene deserves its own contract. Nonfatal browser warnings can hide performance debt and make later proof runs noisy, so mature public evidence pages should be able to require zero unallowed warnings.

Evidence: Initial production job job_07d46452 used the new no_console_warnings contract and caught 9 unused Next CSS preload warnings while the page otherwise passed. Riddle-site PR #128 disabled shared navigation prefetch and lazy-loaded Good Catch screenshots. Fixed Preview job job_1a70de27 and final production job job_4339e21c passed 9 checks across desktop, phone, iPad Mini, and iPad with 0 console warnings.

Profile Mode docs lagged behind proof primitives Riddle artifact
May 16, 2026< $0.01Riddle siteRiddle ProofProfile Mode
Catch card

Profile Mode docs lagged behind proof primitives

The proof surface itself needs proof.

What Riddle caught
Initial production job job_bb0aa65a proved /docs/riddle-proof/ was healthy while both rendered docs and /docs/riddle-proof/markdown.md missed Profile Mode, network_mocks, repeat_responses, delay_ms, request_body_contains, setup_actions, frame_text_visible, and frame_url_equals.
Why it matters
This is a self-audit catch: Riddle Proof found that Riddle Proof docs had fallen behind the exact reusable profile primitives being used for real audits.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The public Riddle Proof docs explained profile text semantics, but they did not document the Profile Mode primitives that recent real proof runs were using: network mocks, repeated and delayed responses, request-body receipts, setup actions, and iframe checks.

Why normal checks missed it: The docs page loaded cleanly, the existing Profile Text Semantics section was visible, the raw markdown route returned 200, overflow stayed at 0px, and fatal console evidence was clean. The drift was semantic: the rendered and machine-consumable docs had not caught up with the reusable proof contract.

Proof lesson: The proof surface itself needs proof. When a package adds or relies on reusable audit primitives, public rendered docs and raw agent-facing markdown should be tested as product contracts.

Evidence: Initial production job job_bb0aa65a proved /docs/riddle-proof/ was healthy while both rendered docs and /docs/riddle-proof/markdown.md missed Profile Mode, network_mocks, repeat_responses, delay_ms, request_body_contains, setup_actions, frame_text_visible, and frame_url_equals. Riddle-site PR #126 added the Profile Mode section and regenerated markdown. Static Preview job job_88ad03aa and final production job job_22ee6a7c passed 14 checks across desktop, phone, iPad Mini, and iPad.

llms.txt hid the raw proof bundle Riddle artifact
May 16, 2026< $0.01Riddle sitellms.txtproof receipts
Catch card

llms.txt hid the raw proof bundle

Agent indexes should point to raw receipts, not only review pages.

What Riddle caught
Production job job_a5d4383b proved /llms.txt, /examples/riddle-proof/, and /examples/riddle-proof/docs-live-proof-bundle.json were healthy while llms.txt omitted the raw bundle URL.
Why it matters
This is an agent-discovery catch: the machine-consumable proof receipt was public and healthy, but the compact agent entrypoint hid it.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The public llms.txt agent index linked to the human proof example page, but it did not link directly to the raw machine-consumable proof bundle that agents should ingest.

Why normal checks missed it: The file itself returned 200 text/plain, the human proof example page returned 200, the raw JSON bundle returned 200 application/json, overflow stayed at 0px, and fatal console evidence was clean. The missing contract was discovery: agents had to infer the raw proof receipt from the human page.

Proof lesson: Agent indexes should point to raw receipts, not only review pages. If a product publishes machine-consumable proof artifacts, the compact discovery surface needs to expose them directly.

Evidence: Production job job_a5d4383b proved /llms.txt, /examples/riddle-proof/, and /examples/riddle-proof/docs-live-proof-bundle.json were healthy while llms.txt omitted the raw bundle URL. Riddle-site PR #124 added Raw proof bundle JSON to llms.txt and ratcheted the static llms guard. Static Preview job job_df22fbc2 and final production job job_ceafae1b passed 6 checks across desktop, phone, iPad Mini, and iPad.

Proof example bundle drifted behind the agent-proof contract Riddle artifact
May 16, 2026< $0.01Riddle siteproof receiptsagent-proof
Catch card

Proof example bundle drifted behind the agent-proof contract

Proof examples are product surfaces too.

What Riddle caught
Initial production job job_30609bc5 proved the page and seven artifact links were healthy while /examples/riddle-proof/docs-live-proof-bundle.json missed proof receipts, Bring your agent; Riddle brings the proof, and agent-proof.
Why it matters
This is a machine-consumable proof-surface catch: the public page looked healthy, but Riddle Proof found the raw agent-facing bundle had drifted behind the product contract.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The public proof example page rendered cleanly and linked to healthy artifacts, but the raw JSON bundle that agents consume was stale and did not carry proof receipts, the Bring your agent; Riddle brings the proof promise, or the agent-proof contract.

Why normal checks missed it: The human page looked trustworthy: the route loaded, the proof example status was passed, all seven artifact links were healthy, overflow was 0px, and fatal console evidence was clean. The drift lived in the machine-consumable proof contract behind the page.

Proof lesson: Proof examples are product surfaces too. If agents are expected to consume a raw proof bundle, the proof should validate the raw JSON contract, not only the rendered page and artifact links.

Evidence: Initial production job job_30609bc5 proved the page and seven artifact links were healthy while /examples/riddle-proof/docs-live-proof-bundle.json missed proof receipts, Bring your agent; Riddle brings the proof, and agent-proof. Riddle-site PR #122 refreshed the bundle and rendered the Proof Contract section. Static Preview job job_d91c3c67 and final production job job_002d95c1 passed 11 checks across desktop, phone, iPad Mini, and iPad.

Agent Guide omitted the proof loop Riddle artifact
May 16, 2026< $0.01Riddle siteAgent GuideRiddle Proof
Catch card

Agent Guide omitted the proof loop

Agent-facing docs should connect low-level browser control to the reusable proof loop.

What Riddle caught
Production job job_77aceb4b proved the rendered Agent Guide and /ai-agents/guide/markdown.md missed Riddle Proof, proof receipts, and Bring your agent; Riddle brings the proof.
Why it matters
This is an agent-surface catch on Riddle itself: the docs were healthy as browser API docs, but proof found the missing bridge to the productized evidence loop agents should reuse.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The public Agent Guide explained raw browser screenshot and /v1/run mechanics, but it did not connect agents to the reusable Riddle Proof loop, proof receipts, or the "Bring your agent; Riddle brings the proof." contract.

Why normal checks missed it: The route loaded cleanly, the existing guide sections rendered, the raw markdown export returned 200, overflow stayed at 0px, and the neighboring Riddle Proof docs were healthy. The missing product contract was the bridge from browser primitives to durable proof workflow.

Proof lesson: Agent-facing docs should connect low-level browser control to the reusable proof loop. Otherwise every wrapper can rediscover the same pattern instead of sharing one inspectable evidence contract.

Evidence: Production job job_77aceb4b proved the rendered Agent Guide and /ai-agents/guide/markdown.md missed Riddle Proof, proof receipts, and Bring your agent; Riddle brings the proof. Riddle-site PR #119 added the Proof Loop section. The first static Preview job job_79a6afb6 then caught five React minified #418 page errors from invalid nested paragraph markup before production. Final Preview job job_e8b53136 and final production job job_5d94bf48 passed 11 checks across desktop, phone, iPad Mini, and iPad.

Riddle had no llms.txt agent index Riddle artifact
May 16, 2026< $0.01Riddle sitellms.txtagent discovery
Catch card

Riddle had no llms.txt agent index

Agent-facing product surfaces need an index, not just scattered docs.

What Riddle caught
Initial production job job_8fc84c72 proved /llms.txt returned 404 in desktop, phone, iPad Mini, and iPad while docs markdown, Riddle Proof markdown, Preview markdown, MCP markdown, OpenAPI YAML, and robots all stayed healthy.
Why it matters
This is an agent-discovery catch: the public docs were individually healthy, but browser proof found that agents had no compact starting point for the product surface.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: Riddle had markdown docs, Proof docs, Preview docs, MCP docs, OpenAPI YAML, and robots surfaces, but no public llms.txt entrypoint for agents to discover the set quickly.

Why normal checks missed it: The neighboring machine-readable surfaces were healthy and the rendered site worked. The missing contract was the compact agent index itself, which only surfaced when the proof treated agent-readable docs and discovery links as first-class product behavior.

Proof lesson: Agent-facing product surfaces need an index, not just scattered docs. A site can expose the right pieces individually while still making agents guess where to start.

Evidence: Initial production job job_8fc84c72 proved /llms.txt returned 404 in desktop, phone, iPad Mini, and iPad while docs markdown, Riddle Proof markdown, Preview markdown, MCP markdown, OpenAPI YAML, and robots all stayed healthy. Riddle-site PR #116 added public/llms.txt and a static guard. Final production job job_b0dc37de passed with 200 text/plain, required agent-readable docs links, 0px overflow, and no fatal errors.

Sitemap hid public Riddle routes from crawlers Riddle artifact
May 16, 2026< $0.01Riddle sitesitemapagent discovery
Catch card

Sitemap hid public Riddle routes from crawlers

Agent-facing contracts include sitemap and discovery surfaces.

What Riddle caught
Production jobs job_f39d58a4 and job_06fb59fc proved sitemap.xml was missing docs/MCP and public-content routes while the target pages themselves loaded cleanly.
Why it matters
This is a machine-consumable discovery catch: the product looked fine to humans, but browser proof found that crawler and agent discovery was stale.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: Riddle public pages rendered correctly, but sitemap.xml omitted docs, MCP, blog, guide, proof, and Good Catch routes that crawlers and agents rely on for discovery.

Why normal checks missed it: The visible pages were healthy: docs and blog routes loaded, route text was present, responsive overflow stayed at 0px, and console/page evidence was clean. The defect lived in a machine-consumable discovery file, not in the rendered page.

Proof lesson: Agent-facing contracts include sitemap and discovery surfaces. A page can be perfectly healthy for a human visitor while still being invisible to crawlers, maps, and docs-ingestion workflows.

Evidence: Production jobs job_f39d58a4 and job_06fb59fc proved sitemap.xml was missing docs/MCP and public-content routes while the target pages themselves loaded cleanly. Riddle-site PRs #111 and #112 patched the sitemap, then PR #113 replaced manual edits with a generated sitemap guard. Final production job job_83c9c01c passed the route-coverage contract.

Robots blocked agent markdown docs Riddle artifact
May 16, 2026< $0.01Riddle siterobots.txtagent docs
Catch card

Robots blocked agent markdown docs

Agent-facing docs need both availability and crawlability.

What Riddle caught
Production job job_c268d7ce loaded robots.txt across desktop, phone, iPad Mini, and iPad and proved four stale markdown Disallow lines were present while the docs markdown endpoint returned text/markdown.
Why it matters
This is an agent-surface catch: the human docs existed, but crawler policy still discouraged the machine-readable versions that agents should consume.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: robots.txt returned 200, advertised the sitemap, and allowed the site generally, but it explicitly disallowed the raw markdown exports that Riddle presents as agent-consumable docs.

Why normal checks missed it: The docs markdown itself was fetchable as text/markdown and the robots file looked superficially healthy. The mismatch only surfaced because the proof asserted absence of the four stale markdown Disallow rules.

Proof lesson: Agent-facing docs need both availability and crawlability. A raw markdown endpoint can exist and still be discouraged by robots policy unless the proof treats robots.txt as part of the public contract.

Evidence: Production job job_c268d7ce loaded robots.txt across desktop, phone, iPad Mini, and iPad and proved four stale markdown Disallow lines were present while the docs markdown endpoint returned text/markdown. Riddle-site PR #114 removed the rules and added a static guard. Final production job job_f4674917 passed the same contract.

Builder saved link said home page Riddle artifact
May 16, 2026< $0.01LilArcadeBuilderUI semantics
Catch card

Builder saved link said home page

This is a semantic UI contract catch: if a link opens a saved game, the visible action should say Play saved game.

What Riddle caught
The failing production run job_b712a753 drove the Builder save workflow across desktop, phone, iPad Mini, and iPad.
Why it matters
This is a semantic UI contract catch: the product could save and open the game, but the browser proof caught that the call to action named the wrong destination.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: After a generated LilArcade game was saved, the success CTA linked to the saved player route but said View on home page.

Why normal checks missed it: The flow looked healthy: save worked, the saved-game link existed, the generated preview rendered, overflow stayed at 0px, and fatal console/page evidence was clean. The mismatch only surfaced because the proof treated CTA copy and destination as one browser-visible contract.

Proof lesson: This is a semantic UI contract catch: if a link opens a saved game, the visible action should say Play saved game. A working route can still leak trust when the words describe the wrong destination.

Evidence: The failing production run job_b712a753 drove the Builder save workflow across desktop, phone, iPad Mini, and iPad. It proved the link existed and the preview iframe rendered, but Play saved game was absent while View on home page was present in every viewport. Fixed Preview job_62113ccf and final production job_ff6140dd passed with Play saved game visible, View on home page absent, one saved-game link, clean overflow, and clean fatal-console evidence.

Playground Batch curl hid async mode Riddle artifact
May 16, 2026< $0.01Riddle sitePlaygroundgenerated code
Catch card

Playground Batch curl hid async mode

This is a generated-command contract catch: copy buttons and examples should preserve the same request semantics as the real UI action.

What Riddle caught
The failing production run job_3155f0c1 hit the Batch submit and artifacts mocks 4/4 times, and the captured request body included "sync":false, but the required visible text "sync": false was absent from .result-state in every viewport.
Why it matters
This is a generated-command contract catch: Riddle Proof found that the UI worked but the integration command taught a subtly wrong async API call.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated Playground Batch flow correctly sent sync:false in the real /v1/run request body, returned a durable job receipt, and rendered screenshot artifacts, but the generated and copyable curl command omitted "sync": false.

Why normal checks missed it: The product behavior worked: submit succeeded, artifacts rendered, job receipt was visible, loading cleared, layout stayed clean, and browser console/page evidence was healthy. The mismatch only surfaced because the proof treated the integration snippet as part of the product contract, not as passive docs.

Proof lesson: This is a generated-command contract catch: copy buttons and examples should preserve the same request semantics as the real UI action. A working result can still teach users the wrong API call.

Evidence: The failing production run job_3155f0c1 hit the Batch submit and artifacts mocks 4/4 times, and the captured request body included "sync":false, but the required visible text "sync": false was absent from .result-state in every viewport. The reusable profile seed is job_rp359_batch_async_curl. After the fix, production job job_c892e0c0 passed with the generated curl command and real request body aligned across desktop, phone, iPad Mini, and iPad.

Playground async results hid the job receipt Riddle artifact
May 15, 2026< $0.01Riddle sitePlaygroundartifact handling
Catch card

Playground async results hid the job receipt

This is a receipt-traceability catch: artifact UIs should show the durable job id whenever they show async results.

What Riddle caught
The failing production run job_77bc1541 hit the Script submit and artifacts mocks 4/4 times, rendered screenshot and console evidence, and kept overflow at 0px, but failed because Job ID and job_rp356_script_receipt were absent from .result-state in every viewport.
Why it matters
This is a receipt-traceability catch: the artifact UI looked successful but omitted the identifier users, support, and agents need to connect the result back to a durable Riddle proof bundle.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated Playground async Script result rendered screenshot and console artifacts from a successful mocked job, but did not expose the durable job id on the visible result screen.

Why normal checks missed it: The submit request worked, artifact polling worked, the screenshot item rendered, console output rendered, loading cleared, layout stayed clean, and browser console/page evidence was healthy. The missing receipt only surfaced because the proof required the result UI to be traceable back to the exact Riddle job.

Proof lesson: This is a receipt-traceability catch: artifact UIs should show the durable job id whenever they show async results. Screenshots and logs are easier to trust when the user can connect them to the exact proof bundle.

Evidence: The failing production run job_77bc1541 hit the Script submit and artifacts mocks 4/4 times, rendered screenshot and console evidence, and kept overflow at 0px, but failed because Job ID and job_rp356_script_receipt were absent from .result-state in every viewport. After the fix, production job job_c747c2ec passed with the job receipt visible across desktop, phone, iPad Mini, and iPad.

Billing Stripe hydration failed invisibly Riddle artifact
May 15, 2026< $0.01Riddle sitebillinghydration
Catch card

Billing Stripe hydration failed invisibly

This is a screenshot-is-not-enough catch: a proof profile should pair visible business-state assertions with fatal/page-error evidence.

What Riddle caught
The failing production run job_0a9320d5 passed the recovered Billing state but failed no_fatal_console_errors with page_error_count 1 for Minified React error #418.
Why it matters
This is a screenshot-is-not-enough catch: the UI looked healthy after recovery, but the browser still recorded a React hydration failure that would be easy to miss in manual review.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated Billing page recovered from a forced transient balance-load failure and rendered the expected account state, but browser page-error evidence still captured Minified React error #418 from the Stripe Elements surface.

Why normal checks missed it: The visible page looked recovered: Billing & Credits rendered, the Retry flow worked, balance and transaction text appeared, and responsive layout stayed clean. The issue was only visible because the proof treated page errors as first-class evidence instead of trusting the final screenshot alone.

Proof lesson: This is a screenshot-is-not-enough catch: a proof profile should pair visible business-state assertions with fatal/page-error evidence. Hydration failures can be invisible in a happy-path screenshot while still making the page brittle.

Evidence: The failing production run job_0a9320d5 passed the recovered Billing state but failed no_fatal_console_errors with page_error_count 1 for Minified React error #418. After delaying Stripe Elements until client mount, production job job_a1a528af passed with page_error_count 0 across desktop, phone, iPad Mini, and iPad.

Playground Script failed jobs looked neutral Riddle artifact
May 15, 2026< $0.01Riddle sitePlaygroundterminal states
Catch card

Playground Script failed jobs looked neutral

Async artifact UIs should treat every terminal failure status as a first-class visible state and preserve the service error message plus partial artifact evidence.

What Riddle caught
The failing production run job_0144ef09 hit the Script submit and failed-artifacts mocks 4/4 times and showed one partial screenshot item, but failed because Synthetic v347 script sandbox failed after preserving partial screenshot evidence, partial results available, and .error-warning were absent in all four viewports.
Why it matters
This is an async artifact-status catch: the product preserved partial evidence but hid the terminal failure semantics users and support need.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Playground async Script path received a terminal status: failed artifact response with a service error and partial screenshot, but rendered a neutral Result state without .error-warning, the backend message, or the partial results available warning.

Why normal checks missed it: The route loaded, auth setup worked, submit and artifacts mocks hit exactly once per viewport, one partial screenshot rendered, loading cleared, layout stayed clean, and final console/page evidence was clean. The issue only surfaced when the profile asserted exact terminal-failure UI semantics.

Proof lesson: Async artifact UIs should treat every terminal failure status as a first-class visible state and preserve the service error message plus partial artifact evidence.

Evidence: The failing production run job_0144ef09 hit the Script submit and failed-artifacts mocks 4/4 times and showed one partial screenshot item, but failed because Synthetic v347 script sandbox failed after preserving partial screenshot evidence, partial results available, and .error-warning were absent in all four viewports.

Dashboard terminal jobs leaked raw service statuses Riddle artifact
May 15, 2026< $0.01Riddle sitedashboardstatus semantics
Catch card

Dashboard terminal jobs leaked raw service statuses

Account-state audits should verify service-contract translation, not just row presence.

What Riddle caught
The failing production run job_0c6e4f93 rendered job_rp345_timeout and job_rp345_error across four viewports, but failed because Timed Out and Failed were absent while Completed Timeout and Completed Error were visible everywhere.
Why it matters
This is a practical dashboard-audit catch: every row existed and the page was clean, but the product leaked backend vocabulary where users needed clear terminal-state meaning.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated dashboard Recent Jobs table rendered service terminal statuses as Completed Timeout and Completed Error instead of human labels Timed Out and Failed.

Why normal checks missed it: The dashboard route loaded, auth setup worked, balance data rendered, API keys rendered, recent job rows rendered, layout stayed clean, and final console/page evidence was clean. The issue only surfaced when the profile asserted exact status semantics for terminal service states.

Proof lesson: Account-state audits should verify service-contract translation, not just row presence. A table can be healthy, populated, and responsive while still mislabeling the business meaning of a job.

Evidence: The failing production run job_0c6e4f93 rendered job_rp345_timeout and job_rp345_error across four viewports, but failed because Timed Out and Failed were absent while Completed Timeout and Completed Error were visible everywhere.

Playground Script assumed artifacts_url Riddle artifact
May 15, 2026< $0.01Riddle sitePlaygroundartifact handling
Catch card

Playground Script assumed artifacts_url

Async artifact UIs should treat job_id as the stable contract and artifacts_url as an optional convenience.

What Riddle caught
The failing production run job_50bcafca hit the Script submit mock 4/4 times, but hit the required /v1/jobs/job_rp342_script/artifacts mock 0/4 times while the browser repeatedly tried https://api.riddledc.comundefined/.
Why it matters
This is an async-contract catch: the UI looked ready to run, but a valid response shape left users polling a malformed URL instead of seeing the artifact they requested.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Playground async Script path accepted a job-id-only response, but then polled https://api.riddledc.comundefined/ instead of the standard /v1/jobs/{job_id}/artifacts endpoint.

Why normal checks missed it: The route loaded, auth setup worked, Script async controls were reachable, the submit request body was correct, and layout stayed clean. The issue only showed up when the proof required the artifacts endpoint hit count, screenshot result, loading-state cleanup, and final console health.

Proof lesson: Async artifact UIs should treat job_id as the stable contract and artifacts_url as an optional convenience. Every async mode should converge on the same job artifacts polling behavior.

Evidence: The failing production run job_50bcafca hit the Script submit mock 4/4 times, but hit the required /v1/jobs/job_rp342_script/artifacts mock 0/4 times while the browser repeatedly tried https://api.riddledc.comundefined/.

Playground timeout hid the artifact reason Riddle artifact
May 15, 2026< $0.01Riddle sitePlaygroundartifact handling
Catch card

Playground timeout hid the artifact reason

Artifact UIs should preserve failure reasons, not just thumbnails.

What Riddle caught
The failing production run job_483da63f returned completed_timeout with Synthetic v340 workflow timed out waiting for purchase-confirmation, kept rp340-timeout-first visible across four viewports, but failed because the timeout detail was absent everywhere.
Why it matters
This is an artifact-trust catch: the proof showed the product kept a useful receipt but hid the reason the receipt mattered, making timeout triage weaker than the underlying Riddle evidence.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Playground async Workflow timeout path preserved a partial screenshot, but replaced the service timeout detail with generic "Workflow timed out after 120 seconds" copy.

Why normal checks missed it: The route loaded, auth setup worked, Workflow async controls were reachable, the submit and artifacts mocks hit exactly once per viewport, the timeout state rendered, the partial screenshot stayed visible, layout stayed clean, and final console/page evidence was clean. The issue was only visible when the proof asserted the exact timeout reason from the artifacts payload.

Proof lesson: Artifact UIs should preserve failure reasons, not just thumbnails. A partial screenshot is useful, but the user still needs the service-provided explanation for why the run stopped.

Evidence: The failing production run job_483da63f returned completed_timeout with Synthetic v340 workflow timed out waiting for purchase-confirmation, kept rp340-timeout-first visible across four viewports, but failed because the timeout detail was absent everywhere.

Dashboard balance failure looked like zero credits Riddle artifact
May 15, 2026< $0.01Riddle sitedashboarderror handling
Catch card

Dashboard balance failure looked like zero credits

Dashboard proof should isolate partial backend failures: one widget can fail while the rest of the page stays healthy, and the user still needs the real reason.

What Riddle caught
The failing production run job_519cdc28 mocked GET /billing/balance as a structured 503, kept jobs and API keys visible across four viewports, but failed because Synthetic v338 dashboard balance unavailable was absent everywhere.
Why it matters
This is a support-quality catch: the page looked healthy enough to trust, but the balance widget silently converted a backend failure into an apparent zero-credit account.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated dashboard hid a structured balance-load backend failure and silently showed 0s / $0.00, making a dependency outage look like an empty account.

Why normal checks missed it: The dashboard route loaded, auth setup worked, recent jobs rendered, the API-key row rendered, layout stayed clean, and final console/page evidence was clean. The issue was only visible when the proof asserted the exact backend message from the failed balance dependency.

Proof lesson: Dashboard proof should isolate partial backend failures: one widget can fail while the rest of the page stays healthy, and the user still needs the real reason.

Evidence: The failing production run job_519cdc28 mocked GET /billing/balance as a structured 503, kept jobs and API keys visible across four viewports, but failed because Synthetic v338 dashboard balance unavailable was absent everywhere.

Auto-recharge disable hid the backend error Riddle artifact
May 15, 2026< $0.01Riddle sitebillingerror handling
Catch card

Auto-recharge disable hid the backend error

Settings rollback proof should verify both state integrity and message integrity.

What Riddle caught
The failing production run job_89d53b2f hit PUT /api/billing/auto-recharge four times with {"enabled":false}, left (ON) visible and (OFF) absent, but failed because Synthetic v333 auto-recharge disable rejected was absent while [object Object] rendered in all four viewports.
Why it matters
This is a billing-support catch: the account state stayed safe, but the UI hid the backend reason a user or support person would need to understand the failed settings change.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The billing page correctly rolled a failed auto-recharge disable attempt back to (ON), but rendered [object Object] instead of the backend human rejection message.

Why normal checks missed it: The route loaded, authenticated billing data rendered, the failed PUT fired exactly once per viewport, the inline error existed, and the visible toggle rollback was correct. The regression was only obvious when the proof asserted the exact backend message and object-placeholder absence.

Proof lesson: Settings rollback proof should verify both state integrity and message integrity. A rejected write can preserve the old setting while still hiding the reason the user needs.

Evidence: The failing production run job_89d53b2f hit PUT /api/billing/auto-recharge four times with {"enabled":false}, left (ON) visible and (OFF) absent, but failed because Synthetic v333 auto-recharge disable rejected was absent while [object Object] rendered in all four viewports.

Playground hid structured workflow errors Riddle artifact
May 15, 2026< $0.01Riddle sitePlaygrounderror handling
Catch card

Playground hid structured workflow errors

Interactive API tools need fallback profiles for realistic structured errors, not just happy-path runs or generic error-element checks.

What Riddle caught
The failing production run job_6a27f3cd submitted a workflow payload with steps, sync false, and screenshot rp330-structured, then failed because Synthetic v330 workflow validation rejected was absent while [object Object] appeared in all four viewports.
Why it matters
This is a support-facing API-tool catch: the workflow failure was handled, but the UI hid the reason a user or support person would need to debug the request.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The Playground async Workflow path handled a structured validation failure but rendered the error as [object Object] instead of showing the backend human message.

Why normal checks missed it: The route loaded, auth setup worked, the Workflow and Async controls were reachable, the request body was correct, and the mocked 400 fired exactly once per viewport. The regression was only visible when the proof asserted the exact structured backend message.

Proof lesson: Interactive API tools need fallback profiles for realistic structured errors, not just happy-path runs or generic error-element checks.

Evidence: The failing production run job_6a27f3cd submitted a workflow payload with steps, sync false, and screenshot rp330-structured, then failed because Synthetic v330 workflow validation rejected was absent while [object Object] appeared in all four viewports.

Payment-method setup hid the backend error Riddle artifact
May 15, 2026< $0.01Riddle sitebillingerror handling
Catch card

Payment-method setup hid the backend error

Fallback profiles should assert the exact human message from structured backend errors, not just that some error element appears.

What Riddle caught
The failing production run job_2882931c clicked Save Payment Method in four viewports, hit the mocked setup failure four times, and failed because Synthetic v328 payment method setup rejected was absent everywhere.
Why it matters
This is a practical checkout/settings catch: the workflow failed safely, but the product hid the backend reason a user or support team would need.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The billing page handled a rejected payment-method setup request but replaced the backend human message with generic "Failed to create setup intent" copy.

Why normal checks missed it: The route loaded, the Stripe-backed form opened, the failed POST fired exactly once per viewport, the no-payment-method state remained, and an inline error rendered. Only the proof checked that the specific backend message survived to the user.

Proof lesson: Fallback profiles should assert the exact human message from structured backend errors, not just that some error element appears.

Evidence: The failing production run job_2882931c clicked Save Payment Method in four viewports, hit the mocked setup failure four times, and failed because Synthetic v328 payment method setup rejected was absent everywhere.

Handled API-key revoke failure still logged as fatal Riddle artifact
May 15, 2026< $0.01Riddle sitedashboardconsole health
Catch card

Handled API-key revoke failure still logged as fatal

Negative-path proof should keep console/page health in scope after the visible UI looks right, because handled failures can still poison the browser evidence stream.

What Riddle caught
The failing production run job_64814348 accepted four revoke dialogs, hit the mocked DELETE four times, preserved the key row, showed Synthetic v327 API key revoke rejected, and failed on the unallowed Revoke API key failed console error.
Why it matters
This catch is useful because the user-visible fallback was already correct.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The dashboard visibly handled a rejected API-key revoke request, but still emitted an app-level console.error for the handled domain failure.

Why normal checks missed it: The visible fallback looked correct: the confirm dialog was accepted, the backend rejection message appeared, the active key row stayed present, and the revoked/empty states stayed absent. The bug was the hidden fatal console signal after an expected failure path.

Proof lesson: Negative-path proof should keep console/page health in scope after the visible UI looks right, because handled failures can still poison the browser evidence stream.

Evidence: The failing production run job_64814348 accepted four revoke dialogs, hit the mocked DELETE four times, preserved the key row, showed Synthetic v327 API key revoke rejected, and failed on the unallowed Revoke API key failed console error.

A structured API-key error crashed the dashboard Riddle artifact
May 15, 2026< $0.01Riddle sitedashboarderror handling
Catch card

A structured API-key error crashed the dashboard

Dashboard and settings profiles should include structured failure payloads, not only string errors, and should prove that existing data remains visible after failed writes.

What Riddle caught
The failing production run job_d622f658 submitted {"name":"Structured Error Key v324"}, then React error #31 removed the dashboard content while Synthetic v324 API key rejected stayed absent.
Why it matters
This is a strong authenticated-product catch: the request was right and the API failure was realistic, but the UI crashed because it rendered an object as a React child.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated dashboard tried to render a structured API-key validation error object directly, crashing the API Keys section instead of showing the human message.

Why normal checks missed it: The dashboard loaded, auth setup worked, existing keys rendered, and the create request body was correct. The bug only appeared when the mocked backend returned a realistic nested error payload.

Proof lesson: Dashboard and settings profiles should include structured failure payloads, not only string errors, and should prove that existing data remains visible after failed writes.

Evidence: The failing production run job_d622f658 submitted {"name":"Structured Error Key v324"}, then React error #31 removed the dashboard content while Synthetic v324 API key rejected stayed absent.

Auto-recharge stayed on after a failed save Riddle artifact
May 15, 2026< $0.01Riddle sitebillingsettings integrity
Catch card

Auto-recharge stayed on after a failed save

Settings proof should verify rollback state after rejected saves, not only that an error message appears.

What Riddle caught
The failing production run job_3bb5a0cf hit the failed auto-recharge PUT four times, showed Synthetic v322 auto-recharge rejected, but found (OFF) absent and (ON) still visible in every viewport.
Why it matters
This is a settings-integrity catch: a route and error toast can both be green while the UI lies about whether the account setting was actually saved.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The billing page showed a rejected auto-recharge save error but left the toggle label at (ON), advertising a setting that the backend had not persisted.

Why normal checks missed it: The page loaded, payment-method state rendered, the expected error appeared, and the mocked PUT request fired. The regression was the stale optimistic UI state after the failed write.

Proof lesson: Settings proof should verify rollback state after rejected saves, not only that an error message appears.

Evidence: The failing production run job_3bb5a0cf hit the failed auto-recharge PUT four times, showed Synthetic v322 auto-recharge rejected, but found (OFF) absent and (ON) still visible in every viewport.

A failed dashboard job looked queued Riddle artifact
May 15, 2026< $0.01Riddle sitedashboardstatus semantics
Catch card

A failed dashboard job looked queued

Authenticated dashboards need profile checks for negative and in-flight states, not only route health and happy-path data.

What Riddle caught
The failing production run job_6711719e showed job_rp317_failed on the dashboard while Failed was absent in every viewport; the same run also caught that the phone widened by 293px.
Why it matters
This is a product-quality dashboard catch: the page was alive and authenticated, but the business meaning of a failed browser job was wrong.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated Riddle site dashboard rendered a mocked failed job row as Queued, hiding the failed state from the recent-jobs table.

Why normal checks missed it: The dashboard route loaded, balance data rendered, API keys rendered, and auth storage was accepted. The issue only showed up when the proof asserted the exact status semantics of a non-happy job row.

Proof lesson: Authenticated dashboards need profile checks for negative and in-flight states, not only route health and happy-path data.

Evidence: The failing production run job_6711719e showed job_rp317_failed on the dashboard while Failed was absent in every viewport; the same run also caught that the phone widened by 293px.

Authenticated nav overflowed on billing Riddle artifact
May 15, 2026< $0.01Riddle sitebillingresponsive
Catch card

Authenticated nav overflowed on billing

Workflow proof should keep app-shell layout assertions active after auth setup, because the shell can break even when the page-level task succeeds.

What Riddle caught
The failing production run job_4eb1e278 reached the successful billing retry state but measured that the desktop nav overflowed by 16px.
Why it matters
This shows why browser proof should stay on after the business flow succeeds: the workflow was green, but the authenticated product shell was visibly broken.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The authenticated Riddle site billing page desktop nav overflowed after the signed-in email and Sign Out control were present.

Why normal checks missed it: The billing route reached the final promo-code success state, all mocked billing calls behaved correctly, and the workflow looked usable. The regression was in the authenticated shell around the workflow.

Proof lesson: Workflow proof should keep app-shell layout assertions active after auth setup, because the shell can break even when the page-level task succeeds.

Evidence: The failing production run job_4eb1e278 reached the successful billing retry state but measured that the desktop nav overflowed by 16px.

A malformed login token opened the builder Riddle artifact
May 14, 2026< $0.01authtrust boundaryrequest proof
Catch card

A malformed login token opened the builder

Auth proof should assert both sides of the boundary: the login surface remains visible after malformed identity responses, and privileged UI stays absent.

What Riddle caught
The failing browser run shows the malformed login opening the builder while request-body assertions prove the mocked Cognito login path was the one exercised.
Why it matters
This is the kind of auth-boundary bug that looks fine in the browser until the proof asks whether the privileged UI opened from a valid token or just a friendly HTTP shape.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The builder treated a successful Cognito response with an empty AuthenticationResult as a real authenticated session and mounted the builder UI without a usable token.

Why normal checks missed it: The HTTP status was 200 and the screen changed to the authenticated builder. A smoke test that only checks the happy path would never inspect whether a valid token was actually present.

Proof lesson: Auth proof should assert both sides of the boundary: the login surface remains visible after malformed identity responses, and privileged UI stays absent.

Evidence: The failing browser run shows the malformed login opening the builder while request-body assertions prove the mocked Cognito login path was the one exercised.

Logout worked, until the delayed build came back Riddle artifact
May 14, 2026< $0.01asyncsession isolationnetwork delay
Catch card

Logout worked, until the delayed build came back

Async session proof needs controlled network delays, logout/relogin actions, and final absence checks for stale previews and save controls.

What Riddle caught
The failing run used a delayed build mock, then proved the fresh session still showed Open in new tab, Save to Arcade, and a preview iframe that should have been gone.
Why it matters
The browser proof makes race conditions reproducible: not by reading code, but by controlling timing and checking what the user sees after the stale response lands.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: A delayed build response was allowed to apply preview and save state after the user logged out and returned to a fresh builder session.

Why normal checks missed it: A normal logout check can pass if no in-flight request completes late. The bug only appears when the browser keeps the delayed network response alive across a session reset.

Proof lesson: Async session proof needs controlled network delays, logout/relogin actions, and final absence checks for stale previews and save controls.

Evidence: The failing run used a delayed build mock, then proved the fresh session still showed Open in new tab, Save to Arcade, and a preview iframe that should have been gone.

Canceling save still leaked the draft Riddle artifact
May 14, 2026< $0.01request bodyform statebuilder
Catch card

Canceling save still leaked the draft

For builder flows, screenshot proof should be paired with request-body assertions so hidden stale form state cannot slip through.

What Riddle caught
The browser run reached the clean player route, but the captured save request still contained the canceled emoji and stale description.
Why it matters
This is the perfect “visually green, semantically wrong” catch: the browser reached the right page, but the network receipt proved the app submitted stale user data.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: After canceling a save form, stale optional name, emoji, and description fields leaked into a later save request even though the final player looked correct.

Why normal checks missed it: The route, iframe, layout, and console checks were all green. Only the captured /api/save request body showed the canceled draft values were still being submitted.

Proof lesson: For builder flows, screenshot proof should be paired with request-body assertions so hidden stale form state cannot slip through.

Evidence: The browser run reached the clean player route, but the captured save request still contained the canceled emoji and stale description.

A rainbow flag was saved as a broken emoji Riddle artifact
May 14, 2026< $0.01unicoderequest bodybuilder
Catch card

A rainbow flag was saved as a broken emoji

Browser proof can catch Unicode/data-boundary bugs by asserting exact request-body content, not just rendered page state.

What Riddle caught
The failing run captured save request bodies in all four viewports and proved they did not contain the full 🏳️‍🌈 value.
Why it matters
This is a crisp example of why proof receipts should include network payload evidence: the page looked fine, but the saved data was corrupt.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The builder emoji input truncated a valid compound emoji, saving the rainbow flag as the broken partial sequence 🏳️‍.

Why normal checks missed it: The builder preview, saved state, iframe, layout, and console checks all looked fine. The failure was inside the serialized save payload.

Proof lesson: Browser proof can catch Unicode/data-boundary bugs by asserting exact request-body content, not just rendered page state.

Evidence: The failing run captured save request bodies in all four viewports and proved they did not contain the full 🏳️‍🌈 value.

The player ignored its own layout metadata Riddle artifact
May 14, 2026< $0.01layoutiframemetadata
Catch card

The player ignored its own layout metadata

Layout proof should inspect embedded frame dimensions and metadata-driven rendering, not just route success or document scroll width.

What Riddle caught
The failing phone screenshot and proof receipt show a playable saved game with a too-wide iframe that escaped page-level overflow checks.
Why it matters
The route was technically healthy, but the user experience was broken.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: A saved game with safe wide-layout metadata rendered as a normal unscaled iframe, overflowing by 434px on phone and 56px on iPad Mini.

Why normal checks missed it: The route loaded, the iframe existed, frame text was visible, page-level overflow was 0px, and there were no console errors. Only iframe overflow checks exposed the user-visible layout break.

Proof lesson: Layout proof should inspect embedded frame dimensions and metadata-driven rendering, not just route success or document scroll width.

Evidence: The failing phone screenshot and proof receipt show a playable saved game with a too-wide iframe that escaped page-level overflow checks.

A manifest row rendered a broken saved game Riddle artifact
May 14, 2026< $0.01resource integrityiframefallback
Catch card

A manifest row rendered a broken saved game

Saved-resource proof should distinguish “listed in manifest” from “actually playable,” and should assert friendly no-iframe fallback states for missing resources.

What Riddle caught
The failing run showed Game not found was absent, an iframe was present, and the browser emitted resource failures for the unavailable saved HTML.
Why it matters
This is a real product-integrity check: the page can look routable while the underlying artifact is gone.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The player trusted a saved-game manifest row enough to render an iframe even when the saved HTML resource was unavailable.

Why normal checks missed it: The manifest route existed and the app shell rendered. Without probing the saved resource and asserting iframe absence, the broken player looked like a normal loading edge case.

Proof lesson: Saved-resource proof should distinguish “listed in manifest” from “actually playable,” and should assert friendly no-iframe fallback states for missing resources.

Evidence: The failing run showed Game not found was absent, an iframe was present, and the browser emitted resource failures for the unavailable saved HTML.

The game worked, but the iframe was clipped Riddle artifact
May 13, 2026penniesiframeresponsiveblack-box
Catch card

The game worked, but the iframe was clipped

Element bounds and screenshots catch user-visible clipping that scalar scroll-width checks miss.

What Riddle caught
A phone viewport screenshot from the browser run shows the embedded game still active, but visibly cropped inside its frame.
Why it matters
This is the kind of issue a product team can miss when automated checks only ask “did the route load?” Riddle Proof turns the browser screenshot into a reviewable receipt.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: Hot Path completed its two-player interaction in every viewport, but the embedded game surface was visibly clipped on phone and tablet.

Why normal checks missed it: The saved game used overflow hidden, so document scroll width stayed clean. A simple page overflow check would have passed.

Proof lesson: Element bounds and screenshots catch user-visible clipping that scalar scroll-width checks miss.

Evidence: A phone viewport screenshot from the browser run shows the embedded game still active, but visibly cropped inside its frame.

The link worked in production, but escaped mounted preview Riddle artifact
May 13, 2026< $0.01previewroutingPR evidence
Catch card

The link worked in production, but escaped mounted preview

PR proof should exercise the artifact reviewers actually open: the preview URL, not only production-shaped assumptions.

What Riddle caught
The browser run opened the saved player flow through a mounted preview URL, which is the same shape a reviewer would click from a PR.
Why it matters
Preview-only route bugs are expensive because reviewers click previews, not production.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: The builder saved-game link was rendered as an absolute /play URL. Production accepted it, but mounted Riddle Preview needed the /s/{preview_id} basename preserved.

Why normal checks missed it: A production-only check would never see the mounted preview basename. A code review could miss the subtle difference between anchor hrefs and router links.

Proof lesson: PR proof should exercise the artifact reviewers actually open: the preview URL, not only production-shaped assumptions.

Evidence: The browser run opened the saved player flow through a mounted preview URL, which is the same shape a reviewer would click from a PR.

A fixed nav made full-screen routes one nav-height too tall Riddle artifact
May 13, 2026~$0.01layoutroute inventoryresponsive
Catch card

A fixed nav made full-screen routes one nav-height too tall

A generic app-shell profile can find repeated layout classes: fixed nav offset, route root height, scroll policy, and top offenders.

What Riddle caught
A desktop screenshot from the route inventory run captures the game route inside the app shell while the measured route-root bounds reveal the repeated nav-height overflow pattern.
Why it matters
One proof profile can find a whole class of layout bugs across a site.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: Multiple older full-screen routes were exactly one fixed navigation bar too tall on desktop/tablet/phone.

Why normal checks missed it: Each route still loaded and looked mostly functional. The bug only became obvious when route-root bounds were measured across a viewport matrix.

Proof lesson: A generic app-shell profile can find repeated layout classes: fixed nav offset, route root height, scroll policy, and top offenders.

Evidence: A desktop screenshot from the route inventory run captures the game route inside the app shell while the measured route-root bounds reveal the repeated nav-height overflow pattern.

A green semantic state still hid the win result Riddle artifact
May 13, 2026penniesvisual evidencecanvasquality
Catch card

A green semantic state still hid the win result

Screenshots are not just decoration.

What Riddle caught
The after-continue screenshot shows the game state after escape, making it possible to judge whether the win/result was actually visible to a player.
Why it matters
A test can be technically green while the customer experience is still unclear.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: Gem Mine reached the escaped state semantically, but the user-facing terminal panel did not clearly show ESCAPED! until the screenshot review caught it.

Why normal checks missed it: State assertions proved the game outcome. They did not prove that the outcome was visible and understandable to a player.

Proof lesson: Screenshots are not just decoration. They catch places where machine-readable state is stronger than the actual user experience.

Evidence: The after-continue screenshot shows the game state after escape, making it possible to judge whether the win/result was actually visible to a player.

Restart-only texture errors after gameplay looked fine Riddle artifact
May 13, 2026< $0.03console errorsgame lifecyclerestart
Catch card

Restart-only texture errors after gameplay looked fine

Terminal/recovery proof finds defects that only appear after users finish, restart, replay, or revisit a route.

What Riddle caught
The restart screenshot captures the post-restart state; the same browser run also recorded the duplicate Phaser texture-key errors that did not appear on first load.
Why it matters
Recovery paths and repeat use are where many browser bugs hide.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: Classic Slalom passed its behavior checks, but repeated scene restart generated 180 duplicate Phaser texture-key console errors.

Why normal checks missed it: A first-load smoke test would stop before restart. The bug lived in the lifecycle, not the happy-path load.

Proof lesson: Terminal/recovery proof finds defects that only appear after users finish, restart, replay, or revisit a route.

Evidence: The restart screenshot captures the post-restart state; the same browser run also recorded the duplicate Phaser texture-key errors that did not appear on first load.

The homepage rendered games, but hid community games Riddle artifact
May 13, 2026< $0.01manifest driftroute inventoryintegration
Catch card

The homepage rendered games, but hid community games

Route inventory should prove both direct route health and source-page clickthrough/discovery health.

What Riddle caught
The homepage screenshot shows the discovered community-games area after the manifest fix, tying the bug to route discovery rather than the direct player route.
Why it matters
A site can have healthy destination pages while the conversion/discovery path is broken.
Does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical ledger

Bug: Saved community games loaded directly by URL, but the homepage did not list them because the manifest schema had drifted.

Why normal checks missed it: The player route was healthy and built-in routes were healthy. The bug was in discovery: the source page failed to expose a valid route.

Proof lesson: Route inventory should prove both direct route health and source-page clickthrough/discovery health.

Evidence: The homepage screenshot shows the discovered community-games area after the manifest fix, tying the bug to route discovery rather than the direct player route.

The pattern

These are not “AI vibes.” Each catch has a browser run behind it: a URL, a viewport, a user-like action sequence, screenshots, console/page evidence, and a structured result.

  • Black-box enough to find bugs on arbitrary real pages.
  • Specific enough to become repeatable project profiles.
  • Cheap enough to run while agents work, not only before a major release.

Why it matters

Every useful catch becomes a short evidence story: what the browser saw, why normal checks missed it, and how that pattern can become a reusable proof profile.