← Good Catch Diary
Evidence Manifest

Not loose screenshots. Reviewable proof receipts.

Each catch below keeps two layers visible: a plain-English catch card for readers, and a technical manifest for agents, engineers, and future proof profiles.

87catch manifests with a reusable proof pattern.
3+artifact links per catch: screenshot, run receipt, and console capture.
0 sourceDesigned to make browser-visible proof understandable without code access.
ProfilesEach manifest points toward a repeatable audit profile.

Table of contents

also the sales pitch
Catch 1

Riddle stopped Neon proof receipts from drifting into app glue.

The app had a reusable proof-pack helper available, but still carried local coverage summary code. The fix moved receipt formatting back to the pack and added a faster verified lane for local and live checks.

Catch 2

Riddle stopped an over-aggressive bass cut.

The user asked to turn the bass down "a little," but the proof loop initially ranked the biggest safe cut. Riddle caught that the candidate moved the right track in the right direction, but too far for the request.

Catch 3

Neon made approval surrogate evidence-backed

Approval is part of the proof surface.

Catch 4

Neon made audio heuristics browser-safe instead of copy-pasted

Reusable proof packs need browser-safe subpaths when app contracts consume them in the runtime bundle.

Catch 5

Neon promoted a guitar-down candidate into a deployed override

For creative agent work, the ratchet needs a promotion loop, not just a recommendation loop.

Catch 6

Neon stopped hiding tiny guitar energy deltas

Review-packet formatting is part of the proof surface.

Catch 7

Neon turned a natural mix request into proof-backed candidates

A creative proof loop gets more useful when the requested claim is part of the run contract.

Catch 8

Neon refused to promote an off-target mix candidate

Claim-candidate loops need target and direction receipts, not just mix-health receipts.

Catch 9

Neon section-energy guard rejected a disappearing chord cut

Audio proof should make candidate rejection deterministic without pretending to automate taste.

Catch 10

Neon profile sync blessed a stale current target

Current-target profiles for app-owned durable state should be generated from the app state they claim to audit.

Catch 11

Neon candidate loop tested the wrong audio path

Claim-candidate loops must use the same source-preparation and state-isolation preconditions as the guardrail proofs they depend on.

Catch 12

Neon current-target proof hid its own nested receipts

Passing proof is not enough if the handoff cannot find the proof.

Catch 13

Neon review packets hid candidate evidence in raw JSON

Human-review packets are proof artifacts, not summaries after the fact.

Catch 14

Neon batch confused patch-plan identity with current target proof

A proof batch needs to distinguish planned durable edits from already-active current-target evidence.

Catch 15

Neon durable mix proof could not prove profile-source agreement

Durable creative edits need a current-target proof that checks both runtime state and source/profile descriptors.

Catch 16

Neon deep exploration found proof-window overclaim and hot presets

One-piece-at-a-time ratchets are useful while shaping the contract, but once a round is clean the efficient move is a deeper local sweep before deploy.

Catch 17

Neon playback proof could pass without proving playback

Interaction proofs need action-specific receipts.

Catch 18

Neon app profiles drifted from the reusable proof pack

Reusable proof packs need a synchronization gate in the target app.

Catch 19

Neon mix candidates needed a durable source handoff

A creative proof loop should separate three things: objective receipts, human or surrogate approval, and durable application.

Catch 20

Neon Step Sequencer had hidden clipping in built-in mixes

Audio proof should separate objective guardrails from taste.

Catch 21

Ski Adventure touch input landed half a player width off

Input proofs should check geometry, not only movement.

Catch 22

Coin Clicker dashboard milestone ETA used wrong source of truth

Dashboards need source-of-truth proofs, not only visible-label proofs.

Catch 23

Dashboard retry copy had no retry button

Recovery copy should include a local recovery action when the user can retry without leaving the page.

Catch 24

Dashboard recent jobs failure looked like no jobs

List-load failures are not empty states.

Catch 25

Playground screenshot hid secondary terminal evidence

No screenshots does not mean no proof evidence.

Catch 26

Playground screenshot leaked malformed success body

Handled action recovery needs content proof, not only an error box.

Catch 27

Playground timeout hid partial evidence

Terminal timeout receipts need the same artifact-honesty contract as terminal errors.

Catch 28

Playground sync terminal error looked successful

Artifact preservation and result honesty are separate contracts.

Catch 29

Billing history failure looked like no transactions

List-load recovery profiles should prove that a failed optional list is not rendered as an empty list.

Catch 30

Dashboard API-key transport failure logged fatal

Transport-failure profiles should distinguish expected browser resource noise from application-level fatal logging, then prove the visible recovery state and browser health together.

Catch 31

Playground Batch discarded secondary error artifacts

Failure receipts should preserve all useful evidence, not only screenshots.

Catch 32

Docs code copy claimed success after clipboard denial

Clipboard-copy controls need feedback honesty under browser permission restrictions.

Catch 33

Redeem promo code leaked malformed success body

Shared backend contracts need per-surface proof.

Catch 34

Billing promo code leaked malformed success body

Handled action recovery is a text contract, not just the presence of an error box.

Catch 35

Dashboard API key create logged handled parser failure

Handled action failures need browser-health proof even when the visible fallback is correct.

Catch 36

Playground optional artifacts leaked browser warnings

Optional evidence failures should degrade silently.

Catch 37

Playground partial results were screenshot-biased

Evidence products must avoid screenshot bias.

Catch 38

Dashboard API key modal copy crashed on clipboard denial

Credential controls need browser-permission-aware interaction proof.

Catch 39

Dashboard MCP token copy crashed on clipboard denial

Security-sensitive controls need interaction proof, not just visual proof.

Catch 40

Docs Markdown leaked code entities

Agent-readable docs need their own proof surface.

Catch 41

Serverless page taught stale screenshot polling

API education pages need contract proof, not just route proof.

Catch 42

Homepage taught stale screenshot JSON

Homepage examples are integration docs.

Catch 43

Preview guide taught stale URL shape

Preview docs are part of the deployment surface.

Catch 44

Playground async Workflow ignored artifact URLs

Agent-facing evidence UIs need artifact-contract checks per mode.

Catch 45

Playground async Script ignored HAR artifacts

Artifact-contract proof needs to check every evidence family, not just the first screenshot or console log.

Catch 46

Playground hid a single partial screenshot label

A proof surface is not done when it merely stores evidence.

Catch 47

OpenClaw Moltbook article was referenced but unpublished

Public proof stories need route, markdown, sitemap, and placeholder checks together.

Catch 48

Good Catch Diary preloaded noisy route assets

Public proof-storytelling pages should be quiet enough to inspect.

Catch 49

Profile Warnings docs lagged behind the shipped surface

The proof product needs proof for its own proof-authoring contract.

Catch 50

Builder accepted a saved preview path as a fresh build

Preview URL safety is contextual.

Catch 51

Evidence Manifest preloaded noisy unused assets

Warning hygiene deserves its own contract.

Catch 52

Profile Mode docs lagged behind proof primitives

The proof surface itself needs proof.

Catch 53

llms.txt hid the raw proof bundle

Agent indexes should point to raw receipts, not only review pages.

Catch 54

Proof example bundle drifted behind the agent-proof contract

Proof examples are product surfaces too.

Catch 55

Agent Guide omitted the proof loop

Agent-facing docs should connect low-level browser control to the reusable proof loop.

Catch 56

Riddle had no llms.txt agent index

Agent-facing product surfaces need an index, not just scattered docs.

Catch 57

Sitemap hid public Riddle routes from crawlers

Agent-facing contracts include sitemap and discovery surfaces.

Catch 58

Robots blocked agent markdown docs

Agent-facing docs need both availability and crawlability.

Catch 59

Builder saved link said home page

This is a semantic UI contract catch: if a link opens a saved game, the visible action should say Play saved game.

Catch 60

Playground Batch curl hid async mode

This is a generated-command contract catch: copy buttons and examples should preserve the same request semantics as the real UI action.

Catch 61

Playground async results hid the job receipt

This is a receipt-traceability catch: artifact UIs should show the durable job id whenever they show async results.

Catch 62

Billing Stripe hydration failed invisibly

This is a screenshot-is-not-enough catch: a proof profile should pair visible business-state assertions with fatal/page-error evidence.

Catch 63

Playground Script failed jobs looked neutral

Async artifact UIs should treat every terminal failure status as a first-class visible state and preserve the service error message plus partial artifact evidence.

Catch 64

Dashboard terminal jobs leaked raw service statuses

Account-state audits should verify service-contract translation, not just row presence.

Catch 65

Playground Script assumed artifacts_url

Async artifact UIs should treat job_id as the stable contract and artifacts_url as an optional convenience.

Catch 66

Playground timeout hid the artifact reason

Artifact UIs should preserve failure reasons, not just thumbnails.

Catch 67

Dashboard balance failure looked like zero credits

Dashboard proof should isolate partial backend failures: one widget can fail while the rest of the page stays healthy, and the user still needs the real reason.

Catch 68

Auto-recharge disable hid the backend error

Settings rollback proof should verify both state integrity and message integrity.

Catch 69

Playground hid structured workflow errors

Interactive API tools need fallback profiles for realistic structured errors, not just happy-path runs or generic error-element checks.

Catch 70

Payment-method setup hid the backend error

Fallback profiles should assert the exact human message from structured backend errors, not just that some error element appears.

Catch 71

Handled API-key revoke failure still logged as fatal

Negative-path proof should keep console/page health in scope after the visible UI looks right, because handled failures can still poison the browser evidence stream.

Catch 72

A structured API-key error crashed the dashboard

Dashboard and settings profiles should include structured failure payloads, not only string errors, and should prove that existing data remains visible after failed writes.

Catch 73

Auto-recharge stayed on after a failed save

Settings proof should verify rollback state after rejected saves, not only that an error message appears.

Catch 74

A failed dashboard job looked queued

Authenticated dashboards need profile checks for negative and in-flight states, not only route health and happy-path data.

Catch 75

Authenticated nav overflowed on billing

Workflow proof should keep app-shell layout assertions active after auth setup, because the shell can break even when the page-level task succeeds.

Catch 76

A malformed login token opened the builder

Auth proof should assert both sides of the boundary: the login surface remains visible after malformed identity responses, and privileged UI stays absent.

Catch 77

Logout worked, until the delayed build came back

Async session proof needs controlled network delays, logout/relogin actions, and final absence checks for stale previews and save controls.

Catch 78

Canceling save still leaked the draft

For builder flows, screenshot proof should be paired with request-body assertions so hidden stale form state cannot slip through.

Catch 79

A rainbow flag was saved as a broken emoji

Browser proof can catch Unicode/data-boundary bugs by asserting exact request-body content, not just rendered page state.

Catch 80

The player ignored its own layout metadata

Layout proof should inspect embedded frame dimensions and metadata-driven rendering, not just route success or document scroll width.

Catch 81

A manifest row rendered a broken saved game

Saved-resource proof should distinguish “listed in manifest” from “actually playable,” and should assert friendly no-iframe fallback states for missing resources.

Catch 82

The game worked, but the iframe was clipped

Element bounds and screenshots catch user-visible clipping that scalar scroll-width checks miss.

Catch 83

The link worked in production, but escaped mounted preview

PR proof should exercise the artifact reviewers actually open: the preview URL, not only production-shaped assumptions.

Catch 84

A fixed nav made full-screen routes one nav-height too tall

A generic app-shell profile can find repeated layout classes: fixed nav offset, route root height, scroll policy, and top offenders.

Catch 85

A green semantic state still hid the win result

Screenshots are not just decoration.

Catch 86

Restart-only texture errors after gameplay looked fine

Terminal/recovery proof finds defects that only appear after users finish, restart, replay, or revisit a route.

Catch 87

The homepage rendered games, but hid community games

Route inventory should prove both direct route health and source-page clickthrough/discovery health.

Catch 1

Neon made coverage receipts reusable instead of app-local

Back to top
Neon made coverage receipts reusable instead of app-local evidence screenshot
May 25, 2026package + app + deployLilArcadeRiddle Proofproof packsaudio proof
Plain-English catch card

Riddle stopped Neon proof receipts from drifting into app glue.

The app had a reusable proof-pack helper available, but still carried local coverage summary code. The fix moved receipt formatting back to the pack and added a faster verified lane for local and live checks.

What went wrong
The real app proof script duplicated coverage summary and Markdown formatting that belonged in the reusable proof pack.
What Riddle caught
The ratchet exposed a framework boundary problem: proof receipts were passing, but the evidence language could drift because it was still app-local.
Why it matters
Reusable proof packs only pay off when the real app consumes them; otherwise every future agent has to maintain app-specific proof vocabulary.
What changed
LilArcade now consumes @riddledc/riddle-proof-packs@0.8.0 for audio exploration coverage receipts and has a documented fast lane for Neon iteration.
What this does not prove
It does not prove the mix sounds better, and the bounded live sample does not replace a full promotion batch when broad coverage is needed.
Technical receipt
PR #531 removed 144 duplicated lines, PR #532 added test:neon and post-deploy-fast, Amplify jobs 709 and 710 succeeded, and the final live fast audit returned 0 findings.

Claim: Neon can consume reusable audio exploration coverage receipts from Riddle Proof packs and verify the deployed target with a fast bounded live audit.

Bug: The reusable audio exploration coverage helper existed in Riddle Proof packs, but LilArcade still carried its own coverage summarizer and Markdown formatter inside the app proof script. That meant the real target could drift from the reusable proof-pack receipt language even after the package shipped.

Why normal checks missed it: Nothing looked broken from a normal pass/fail perspective. The app built, the proof passed, and the deployed target was healthy. The issue was architectural: the ratchet was still using app-local proof glue where the reusable pack should own the evidence shape.

Why this sells Riddle Proof: This is the ratchet becoming operationally useful: reusable proof-pack receipts are now consumed by the live app, and production deploy is no longer the default way to discover basic proof-script or receipt-shape mistakes.

Reusable profile seed: For app proof labs: move reusable evidence summarization into a proof pack, keep app scripts as orchestration, add a fast local unit/profile lane, add a bounded live post-deploy smoke, and reserve full promotion batches for broad release confidence.

What the browser run checked

  • Added reusable audio exploration coverage summaries in @riddledc/riddle-proof-packs.
  • Published @riddledc/riddle-proof-packs@0.8.0 through trusted publishing.
  • Updated LilArcade to consume @riddledc/riddle-proof-packs@0.8.0.
  • Removed duplicated local audio coverage summary and Markdown formatting from scripts/neonDeepExplorationProof.mjs.
  • Ran node --test scripts/__tests__/neonDeepExplorationProof.test.mjs scripts/__tests__/neonRatchetBatch.test.mjs.
  • Ran npm run test:sequencer before merge of PR #531.
  • Ran npm run build before merge of PR #531.
  • Verified GitHub CI passed for LilArcade PR #531.
  • Verified Amplify job 709 succeeded for commit 6380c8976fcaeff2baad6e2177962c101e1962f1.
  • Added test:neon, proof:sequencer:deep-explore-fast, and proof:sequencer:post-deploy-fast.
  • Documented the Neon Ratchet Iteration Lane with local, promotion, and post-deploy gates.
  • Ran npm run test:neon.
  • Ran npm run proof:sequencer:post-deploy-fast.
  • Ran npm run build before merge of PR #532.
  • Verified GitHub CI passed for LilArcade PR #532.
  • Verified Amplify job 710 succeeded for commit 364ee2c4fef8151caeb9150413fc0027a5bc70c3.
  • Reran npm run proof:sequencer:post-deploy-fast from main after deployment.
  • Verified the deployed deep exploration sample returned 1 song, 1 part, 1 window, 0 findings, and restoration OK.
  • Verified the deployed current-target durable audit returned 2 overrides and 0 findings.

Proof lesson

A reusable proof pack is not truly reusable until the real app consumes it. The ratchet should make the fast local path cheap, keep deployment as a promotion gate, and use shared receipt language so future agents do not have to re-learn the same audio coverage vocabulary.

ArtifactTypeWhat it proves
Coverage fast-lane reportMarkdown summary

Summarizes the package release, app consumption, fast-lane scripts, deploy jobs, and proof/taste boundary.

Coverage fast-lane report dataJSON metadata

Structured metadata for the package version, PRs, commits, Amplify jobs, local checks, and live proof outcome.

Post-deploy deep exploration summaryMarkdown summary

Shows the live bounded coverage receipt generated by the reusable proof-pack formatter.

Post-deploy deep exploration dataJSON metadata

Structured live coverage result with sampled song/part/window counts, finding count, and restoration receipt.

Post-deploy current-target summaryMarkdown summary

Shows the current-target durable override audit that ran after the fast coverage sample.

Post-deploy current-target dataJSON metadata

Structured live current-target result with override count, finding count, and deterministic audio guardrails.

Post-deploy fast screenshotPNG screenshot

Visual evidence from the deployed Neon target during the fast deep-exploration proof.

Catch 2

Neon stopped "a little" from ranking the biggest cut

Back to top
Neon stopped "a little" from ranking the biggest cut evidence screenshot
May 25, 2026production proof + deployLilArcadeRiddle Proofclaim translationaudio proof
Plain-English catch card

Riddle stopped an over-aggressive bass cut.

The user asked to turn the bass down "a little," but the proof loop initially ranked the biggest safe cut. Riddle caught that the candidate moved the right track in the right direction, but too far for the request.

What went wrong
The candidate ranking optimized for measurable movement while the claim translation did not yet include requested magnitude.
What Riddle caught
The larger bass cuts failed candidate_magnitude_matches_requested_intent once "a little" became an explicit proof constraint.
Why it matters
Agents can over-optimize small creative requests into heavy-handed edits unless the proof constrains request scope before ranking candidates.
What changed
The Neon ratchet now infers subtle magnitude language, applies a 0.12 max absolute level delta, and rejects oversized candidates as claim-translation mismatches.
What this does not prove
It does not prove bass -0.10 sounds better. It proves the candidate better matches the requested scope and preserved objective guardrails.
Technical receipt
bass -0.18 and bass -0.25 failed candidate_magnitude_matches_requested_intent; bass -0.10 and bass -0.05 passed.

Claim: A Neon claim-candidate packet should constrain target, direction, and magnitude for subtle natural-language mix requests before ranking metric-supported candidates for listening review.

Bug: The Neon ratchet loop could understand "turn the bass part down a little" as bass/down, but it had no objective receipt for the requested magnitude. The production packet therefore ranked the largest tested guardrail-preserving cut, bass -0.25, ahead of subtler candidates even though the user asked for "a little."

Why normal checks missed it: The run was not broken in the usual pass/fail sense. Fast mix health, mobile layout, playback sync, section-energy floors, clipping/headroom, low-level guardrails, and state restoration all passed. The problem only showed up when reading the claim translation: the packet proved target and direction, but not magnitude.

Why this sells Riddle Proof: This is the ratchet becoming more useful for natural agent instructions: it does not pretend to know taste, but it can now prove that a proposed edit matches the requested scope before handing candidates to a listener.

Reusable profile seed: For natural creative requests: parse target, direction, and requested magnitude into explicit claim constraints. Keep metric ranking as review order only, and reject candidates that are objectively safe but semantically too large for the request.

What the browser run checked

  • Ran a before-fix production prelim-candidate batch with the request "turn the bass part down a little."
  • Verified the before packet recommended bass -0.25, supported four candidates, rejected zero candidates, and had no magnitude receipt.
  • Added subtle natural-language magnitude inference to the Neon proof contract.
  • Added candidate_magnitude_matches_requested_intent to the claim-candidate receipts.
  • Classified oversized subtle-request candidates as claim_translation_mismatch.
  • Exposed requested magnitude, maxAbsDelta, magnitude source, candidate delta, and candidate abs delta in the human-review packet.
  • Ran node --test src/proof/__tests__/neonProofContract.test.mjs with 11 passing tests.
  • Ran npm run test:sequencer with 150 passing tests.
  • Ran npm run build successfully.
  • Merged LilArcade PR #527 and verified GitHub CI passed.
  • Verified Amplify job 705 succeeded across BUILD, DEPLOY, and VERIFY.
  • Reran the same production prelim-candidate batch against the live Amplify target.
  • Verified the final packet recommended bass -0.10, supported bass -0.10 and bass -0.05, and rejected bass -0.18 and bass -0.25 on candidate_magnitude_matches_requested_intent.

Proof lesson

Natural-language creative requests need claim constraints, not just metric ranking. For "a little," the proof should reject oversized candidates as claim-translation mismatches before review-order ranking can make the biggest movement look like the best next candidate.

ArtifactTypeWhat it proves
Subtle-intent magnitude reportMarkdown summary

Summarizes the before/after production proof, PR #527 fix, validation, and proof/taste boundary.

Subtle-intent magnitude report dataJSON metadata

Structured metadata for the before recommendation, final recommendation, receipt, PR, deployment, and validation.

Before human-review packetMarkdown review packet

Shows the live packet before the fix recommending bass -0.25 with four supported candidates and no rejected candidates.

Before human-review dataJSON metadata

Structured before-fix packet data showing the missing magnitude constraint.

Before ratchet batch summaryMarkdown summary

Records the before-fix production batch gates that passed while still producing the oversized recommendation.

Before generated profileJSON profile

Shows the generated bass/down profile before magnitude became part of the claim target.

Final human-review packetMarkdown review packet

Shows the deployed packet after the fix: bass -0.10 recommended, two candidates supported, two oversized candidates rejected.

Final human-review dataJSON metadata

Structured final packet data with magnitude subtle, maxAbsDelta 0.12, and rejected candidate receipt evidence.

Final ratchet batch summaryMarkdown summary

Records the production proof steps after deployment: fast mix health, mobile layout, playback sync, claim-candidate review, and human packet extraction.

Final ratchet batch dataJSON metadata

Structured final batch summary with recommendation, step timings, artifact index, and proof/taste boundary.

Final generated profileJSON profile

Shows the generated production profile that carried the subtle intent into the claim-candidate loop.

Final ratchet screenshotPNG screenshot

Visual evidence from the deployed Neon target used for the final subtle-intent proof.

Catch 3

Neon made approval surrogate evidence-backed

Back to top
Neon made approval surrogate evidence-backed evidence screenshot
May 25, 2026package + app + production proofLilArcadeRiddle Proofapproval surrogateaudio proof
Plain-English catch card

Neon made approval surrogate evidence-backed

Approval is part of the proof surface.

What went wrong
The approved-candidate flow could carry mixing_canon_surrogate as an approval mode, but the approval decision itself was not yet a first-class proof artifact.
What Riddle caught
Integrations PR #741 added createMixingCanonSurrogateReview to @riddledc/riddle-proof-packs/audio-mix-review, then release PR #742 published @riddledc/riddle-proof-packs@0.7.0.
Why it matters
This is the practical shape of creative proof: Riddle Proof can keep development moving with a conservative approval surrogate, but every step remains auditable and refuses to claim subjective mix quality.
What changed
For creative candidate loops: make approval a proof artifact.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Neon can convert a metric-supported guitar-down review candidate into a development-only approval surrogate with explicit checks before applying a one-candidate proof, while preserving the listening-review boundary.

Claim: Neon can convert a metric-supported guitar-down review candidate into a development-only approval surrogate with explicit checks before applying a one-candidate proof, while preserving the listening-review boundary.

Bug: The approved-candidate flow could carry mixing_canon_surrogate as an approval mode, but the approval decision itself was not yet a first-class proof artifact. That made it harder to inspect why Codex was allowed to stand in for a human during development iteration.

Why normal checks missed it: The batch could already prove a supported candidate, apply a one-candidate profile, and prepare a durable patch plan. A normal pass/fail run would miss the missing handoff evidence because the approval mode appeared downstream, while the decision checklist that produced it was not independently reviewable.

Why this sells Riddle Proof: This is the practical shape of creative proof: Riddle Proof can keep development moving with a conservative approval surrogate, but every step remains auditable and refuses to claim subjective mix quality.

Reusable profile seed: For creative candidate loops: make approval a proof artifact. Require a ready review packet, conservative action size, passed objective receipts, preserved section-energy floors, clipping/headroom/low-level guardrails, state restoration, review-order-only ranking, and a clear listening-review caveat before applying a candidate.

What the browser run checked

  • Added reusable createMixingCanonSurrogateReview in @riddledc/riddle-proof-packs/audio-mix-review.
  • Published @riddledc/riddle-proof-packs@0.7.0 through Changesets release PR #742 and trusted npm publishing.
  • Wired LilArcade to write mixing-canon-surrogate-review JSON and Markdown artifacts before approved-candidate application.
  • Propagated the evidence-backed approval basis into the generated approved-candidate profile and durable patch plan.
  • Ran pnpm --filter @riddledc/riddle-proof-packs test.
  • Ran node --test scripts/__tests__/neonRatchetBatch.test.mjs with 25 passing tests.
  • Ran node --test src/proof/__tests__/neonProofContract.test.mjs with 10 passing tests.
  • Ran npm run build successfully.
  • Ran npm run test:sequencer with 149 passing tests.
  • Merged LilArcade PR #525 and verified Amplify job 703 succeeded.
  • Ran the production Neon ratchet batch against the live Amplify target.
  • Verified the production surrogate review returned approved_for_development_application with failedChecks [].
  • Verified the approved candidate packet returned candidate_applied_for_listening_review with approval_mode mixing_canon_surrogate.
  • Verified the durable patch plan returned ready_for_durable_patch for guitar 0.6 -> 0.55.
  • Merged LilArcade PR #526 to apply the durable guitar 0.55 override and supersede the previous guitar 0.6 override.
  • Verified Amplify job 704 succeeded for the durable source promotion.
  • Ran a production current-target proof against the live Amplify target after deployment.
  • Verified the deployed target returned ready_for_promotion_review with 2 active overrides and 0 findings.
  • Verified the new guitar override passed current-target proof at guitar 0.55 with no clipping, 2.47 dB headroom, and no low-level proof window.

Proof lesson

Approval is part of the proof surface. If an agent applies a creative candidate to keep work moving, the approval surrogate needs its own artifact with conservative-delta checks, objective receipts, section-energy guardrails, state restoration, and an explicit proof/taste boundary.

ArtifactTypeWhat it proves
Mixing canon surrogate reportMarkdown summary

Summarizes the approval-surrogate gap, reusable helper, package release, LilArcade deployment, and production proof result.

Mixing canon surrogate report dataJSON metadata

Structured metadata for the PRs, package version, deployment, validation, candidate, approval mode, and durable patch plan.

Surrogate review artifactMarkdown review packet

Shows the explicit approval checklist: packet ready, conservative delta, objective receipts, direction match, guardrails, state restoration, review-order ranking, and taste boundary.

Surrogate review dataJSON metadata

Structured review object with status approved_for_development_application, failedChecks [], approval basis, and per-check evidence.

Pre-approval human review packetMarkdown review packet

Shows the original candidate_ready_for_listening_review packet before the development surrogate applied anything.

Pre-approval human review dataJSON metadata

Structured candidate data showing guitar -0.05, two supported candidates, zero rejected candidates, receipts, and section-energy comparisons.

Approved candidate packetMarkdown review packet

Shows the one-candidate packet after the surrogate approval, including approval_mode mixing_canon_surrogate and the inherited approval basis.

Approved candidate dataJSON metadata

Structured proof packet for candidate_applied_for_listening_review with approval fields and the same proof/taste caveats.

Durable patch planMarkdown patch plan

Shows the source-level handoff for guitar 0.6 -> 0.55, with approval boundary and post-patch proof caveat.

Durable patch plan dataJSON metadata

Structured patch handoff with override id, source file, mixer levels, approval, and boundary.

Production ratchet batch summaryMarkdown summary

Records the production batch steps: claim-candidate review, surrogate review, approved candidate, and durable patch plan.

Production ratchet batch dataJSON metadata

Structured rollup with candidate_ready_for_review, surrogate approval, durable patch plan, and artifact index.

Generated approved-candidate profileJSON profile

Shows the one-candidate proof profile generated from the review packet and surrogate approval basis.

Approved candidate profile resultJSON metadata

Records the passed approved-candidate profile run against the live Neon target.

Approved candidate screenshotPNG screenshot

Visual evidence from the live Neon target during the approved-candidate proof.

Post-promotion current-target summaryMarkdown summary

Shows the deployed durable source promotion passed with 2 active overrides and 0 findings.

Post-promotion current-target dataJSON metadata

Structured current-target proof summary for the deployed guitar 0.55 and chord 0.16 overrides.

Post-promotion guitar proofJSON proof

Proof receipt for the deployed guitar 0.55 current-target audit.

Post-promotion guitar profile resultJSON metadata

Records the passed current-target profile for the deployed guitar 0.55 override.

Post-promotion guitar summaryMarkdown summary

Short runner summary for the live guitar 0.55 current-target proof.

Post-promotion guitar consoleJSON console

Browser console evidence for the deployed guitar 0.55 proof.

Post-promotion guitar DOM summaryJSON metadata

Shows the current-target proof loaded the expected Neon route.

Post-promotion guitar screenshotPNG screenshot

Visual evidence from the deployed Neon target after the durable guitar 0.55 override landed.

Catch 4

Neon made audio heuristics browser-safe instead of copy-pasted

Back to top
Neon made audio heuristics browser-safe instead of copy-pasted evidence screenshot
May 25, 2026package + app + deploy proofLilArcadeRiddle Proofpackage boundaryaudio heuristics
Plain-English catch card

Neon made audio heuristics browser-safe instead of copy-pasted

Reusable proof packs need browser-safe subpaths when app contracts consume them in the runtime bundle.

What went wrong
The reusable audio heuristics layer existed in Riddle Proof packs, but the first LilArcade cleanup imported the proof-pack root package from browser app code.
What Riddle caught
Integrations PR #739 added @riddledc/riddle-proof-packs/audio-mix-heuristics as a pure browser-safe subpath, then release PR #740 published @riddledc/riddle-proof-packs@0.6.4 through trusted publishing.
Why it matters
This is the kind of integration bug Riddle Proof should catch early: the proof code was correct in isolation, but the product boundary was wrong for a browser app.
What changed
For reusable app-contract helpers: expose browser-safe subpaths for pure runtime code, keep Node artifact/CLI helpers out of app bundles, then verify with both production build and a real app proof run.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Neon can consume reusable audio section-energy and loudness-style heuristics from Riddle Proof packs in browser app code without importing Node-oriented proof framework surfaces.

Claim: Neon can consume reusable audio section-energy and loudness-style heuristics from Riddle Proof packs in browser app code without importing Node-oriented proof framework surfaces.

Bug: The reusable audio heuristics layer existed in Riddle Proof packs, but the first LilArcade cleanup imported the proof-pack root package from browser app code. That pulled Node-oriented proof framework chunks into the Vite browser bundle and failed the production build on Node externals.

Why normal checks missed it: The focused proof-contract tests passed because they bundle the contract for Node. The issue only appeared when the actual browser production build tried to resolve the package graph. Without the build gate, this would have looked like a harmless dedupe and then broken deploy.

Why this sells Riddle Proof: This is the kind of integration bug Riddle Proof should catch early: the proof code was correct in isolation, but the product boundary was wrong for a browser app. The ratchet turned that into a package contract and a post-deploy proof, not another local copy.

Reusable profile seed: For reusable app-contract helpers: expose browser-safe subpaths for pure runtime code, keep Node artifact/CLI helpers out of app bundles, then verify with both production build and a real app proof run.

What the browser run checked

  • Confirmed the root proof-pack import failed the Vite browser build with a Node externalization error.
  • Added the browser-safe @riddledc/riddle-proof-packs/audio-mix-heuristics subpath in integrations PR #739.
  • Published @riddledc/riddle-proof-packs@0.6.4 through Changesets release PR #740 and trusted npm publishing.
  • Moved LilArcade to the safe subpath and removed duplicated local heuristic code.
  • Ran node --test src/proof/__tests__/neonProofContract.test.mjs with 10 passing tests.
  • Ran npm run build successfully after the subpath fix.
  • Ran npm run test:sequencer with 148 passing tests.
  • Ran a local built-app preliminary candidate proof that returned preliminary_candidate_ready and recommended guitar -0.05.
  • Verified the local human-review packet still contained section-energy and loudness-style details with the proof/taste boundary intact.
  • Merged LilArcade PR #524 and verified Amplify job 702 succeeded.
  • Ran the post-deploy preset against production and verified 0 findings across the bounded catalog sweep and durable override audit.

Proof lesson

Reusable proof packs need browser-safe subpaths when app contracts consume them in the runtime bundle. A reusable helper is not truly reusable until the import boundary matches the environment that will run it.

ArtifactTypeWhat it proves
Browser-safe heuristics reportMarkdown summary

Summarizes the package-boundary catch, fix, release, LilArcade cleanup, deploy, and post-deploy proof.

Browser-safe heuristics report dataJSON metadata

Structured metadata for the root-import failure signal, package version, PRs, deployment, and proof results.

Local human-review packetMarkdown review packet

Shows the built app still produced the section-energy/loudness review packet after consuming the reusable subpath.

Local human-review packet dataJSON metadata

Structured candidate, receipt, target movement, and sectionEnergyComparison data from the built-app proof.

Local ratchet batch summaryMarkdown summary

Records the built-app preliminary proof gates and candidate_ready_for_listening_review handoff.

Local ratchet batch dataJSON metadata

Structured local proof summary with step timings, recommendation, and artifact index.

Generated ratchet profileJSON profile

Shows the generated guitar/down profile that exercised the browser-safe helper path.

Local ratchet screenshotPNG screenshot

Visual evidence from the built-app claim-candidate proof.

Post-deploy batch summaryMarkdown summary

Shows the deployed target passed the post-deploy preset after the package-boundary cleanup.

Post-deploy batch dataJSON metadata

Structured post-deploy proof coverage: 6 songs, 19 parts, 22 windows, 2 active overrides, 0 findings.

Catch 5

Neon promoted a guitar-down candidate into a deployed override

Back to top
Neon promoted a guitar-down candidate into a deployed override evidence screenshot
May 25, 2026production proof + deployLilArcadeRiddle Proofdurable patchcurrent target
Plain-English catch card

Neon promoted a guitar-down candidate into a deployed override

For creative agent work, the ratchet needs a promotion loop, not just a recommendation loop.

What went wrong
The earlier guitar-down packet was still only a transient recommendation.
What Riddle caught
A production promotion batch ran the natural request "turn the guitar part down a little" with explicit guitar/down constraints.
Why it matters
Riddle Proof can manage the boring but critical handoff from supported candidate to deployed app state.
What changed
For promotion workflows: require a review packet, create an approval-scoped one-candidate profile, generate a durable patch plan, apply source data only after that handoff, then run a current-target proof after deploy.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A proof-backed Neon guitar-down candidate can be promoted into a durable deployed override and verified as active on the current target without claiming subjective mix quality.

Claim: A proof-backed Neon guitar-down candidate can be promoted into a durable deployed override and verified as active on the current target without claiming subjective mix quality.

Bug: The earlier guitar-down packet was still only a transient recommendation. That is useful for review, but it does not prove the app can safely carry the candidate into source, deploy it, and verify that the deployed target is actually running the durable override.

Why normal checks missed it: A normal mix-change workflow can stop after "candidate looks supported" or after a source patch lands. The risky gap is the handoff between transient browser state, durable source data, production deploy, and current-target proof. Any of those layers could drift while the packet still looked persuasive.

Why this sells Riddle Proof: Riddle Proof can manage the boring but critical handoff from supported candidate to deployed app state. It keeps the taste boundary explicit while proving the candidate is durable, active, reversible in source, and still inside deterministic audio guardrails.

Reusable profile seed: For promotion workflows: require a review packet, create an approval-scoped one-candidate profile, generate a durable patch plan, apply source data only after that handoff, then run a current-target proof after deploy.

What the browser run checked

  • Ran the production promotion preset with explicit intent, target track, direction, and candidate budget.
  • Verified fast mix health, mobile trainer layout, playback sync, and deep exploration all passed.
  • Verified the claim-candidate packet recommended guitar -0.05 with three supported candidates and zero rejected candidates.
  • Generated a one-candidate approved profile from the review packet.
  • Captured candidate_applied_for_listening_review with approval_mode mixing_canon_surrogate.
  • Generated and applied a durable patch plan for guitar 0.65 -> 0.6.
  • Synced the generated durable current-target profile metadata to reflect two active overrides.
  • Ran npm run test:sequencer with 148 passing tests.
  • Merged LilArcade PR #522 and verified Amplify job 700 succeeded.
  • Ran post-deploy durable current-target proof against the live Amplify target.
  • Verified both active overrides passed current-target proof with zero findings.

Proof lesson

For creative agent work, the ratchet needs a promotion loop, not just a recommendation loop. A supported candidate should become durable only through an explicit approval boundary, generated patch plan, source edit, deploy, and post-deploy current-target audit.

ArtifactTypeWhat it proves
Durable guitar reportMarkdown summary

Human-readable summary of the transient-to-durable promotion and post-deploy current-target proof.

Durable guitar report dataJSON metadata

Structured deploy, candidate, patch, approval, and current-target proof metadata.

Promotion batch summaryMarkdown summary

Shows the full promotion gate: deterministic checks, review packet, approved packet, durable patch plan, and pre-patch current-target audit.

Promotion batch dataJSON metadata

Structured rollup of the promotion batch, coverage, recommendation, approval packet, and artifact index.

Generated ratchet profileJSON profile

Shows the generated guitar/down candidate-loop profile used for the promotion run.

Approved candidate profileJSON profile

Shows the one-candidate approval profile generated from the review packet.

Human review packetMarkdown review packet

Shows the transient candidate recommendation and proof/taste boundary before approval.

Approved review packetMarkdown review packet

Shows the approval-surrogate packet with candidate_applied_for_listening_review.

Durable patch planMarkdown patch plan

Shows the source-level durable override handoff for guitar 0.65 -> 0.6.

Post-deploy current-target summaryMarkdown summary

Shows that both active durable overrides passed after deploy with zero findings.

Post-deploy current-target dataJSON metadata

Structured current-target summary with override count, finding count, and per-override mix health.

Post-deploy guitar proofJSON proof

Proof receipt for the deployed guitar durable override.

Post-deploy guitar profile resultJSON metadata

Records the passed generated current-target profile for the guitar override.

Post-deploy guitar summaryMarkdown summary

Short runner summary for the deployed guitar current-target proof.

Post-deploy guitar consoleJSON console

Browser console evidence for the deployed guitar override proof.

Post-deploy guitar DOM summaryJSON metadata

Shows the current-target proof loaded the expected Neon route.

Post-deploy guitar screenshotPNG screenshot

Visual evidence from the deployed Neon target after the durable guitar override landed.

Catch 6

Neon stopped hiding tiny guitar energy deltas

Back to top
Neon stopped hiding tiny guitar energy deltas evidence screenshot
May 25, 2026local + npm + production proofLilArcadeRiddle Proofaudio heuristicsreview packets
Plain-English catch card

Neon stopped hiding tiny guitar energy deltas

Review-packet formatting is part of the proof surface.

What went wrong
The Neon ratchet packet had started tracking requested-instrument section energy, but the human-facing Markdown rounded some very small nonzero guitar energy deltas to 0.
What Riddle caught
@riddledc/riddle-proof-packs 0.6.3 changed the shared packet formatter to preserve tiny nonzero audio values.
Why it matters
Riddle Proof is not just collecting artifacts; it is improving the reliability of the human handoff.
What changed
For audio proof packets: preserve small nonzero metric deltas in Markdown, keep tracked instruments explicit, and state that section-energy and loudness-style values rank candidates for review rather than proving subjective quality.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Neon review packet can preserve tiny nonzero tracked-instrument audio deltas in human-readable evidence while keeping candidate ranking as review order, not taste proof.

Claim: A Neon review packet can preserve tiny nonzero tracked-instrument audio deltas in human-readable evidence while keeping candidate ranking as review order, not taste proof.

Bug: The Neon ratchet packet had started tracking requested-instrument section energy, but the human-facing Markdown rounded some very small nonzero guitar energy deltas to 0. The raw proof JSON still had the movement, but the review packet made a real rendered change look less observable than it was.

Why normal checks missed it: The proof still passed and the candidate recommendation was reasonable, so a normal pass/fail check would not catch it. The issue only showed up when reading the human packet as a reviewer would: the packet had the right tracked-instrument column, but the formatting was too coarse for tiny audio-energy deltas.

Why this sells Riddle Proof: Riddle Proof is not just collecting artifacts; it is improving the reliability of the human handoff. The same proof packet that guards against clipping and wrong-target edits now preserves tiny measurable audio deltas so a reviewer can evaluate supported candidates without digging into raw JSON.

Reusable profile seed: For audio proof packets: preserve small nonzero metric deltas in Markdown, keep tracked instruments explicit, and state that section-energy and loudness-style values rank candidates for review rather than proving subjective quality.

What the browser run checked

  • Released @riddledc/riddle-proof-packs 0.6.3 through the trusted Changesets publishing flow.
  • Bumped LilArcade to consume @riddledc/riddle-proof-packs ^0.6.3.
  • Ran npm run test:sequencer with 148 passing tests.
  • Merged LilArcade PR #521 and verified Amplify production job 699 succeeded.
  • Ran a live Riddle Proof Playwright preliminary candidate batch against the deployed Amplify branch URL.
  • Verified the live batch returned candidate_ready_for_listening_review with guitar -0.05 recommended.
  • Verified the live packet supported three guitar-down candidates, rejected zero candidates, and restored state after the loop.
  • Verified tracked guitar energy deltas are shown as small nonzero values in Markdown, including -0.000044, -0.000066, and -0.000054.
  • Verified section energy floors, clipping, headroom, low-level, and proof/taste boundary receipts remained visible.

Proof lesson

Review-packet formatting is part of the proof surface. If the proof asks a human to review measurable candidate movement, small nonzero values must stay visible instead of being rounded into apparent no-ops. Metrics still do not prove taste, but they must be precise enough to support review.

ArtifactTypeWhat it proves
Tracked energy precision reportMarkdown summary

Human-readable summary of the packet precision catch, package release, deployment, and live production proof result.

Tracked energy precision report dataJSON metadata

Structured summary of deploy metadata, recommended candidate, small nonzero tracked guitar deltas, and proof boundary.

Generated ratchet profileJSON profile

Shows the temporary profile generated from the guitar/down intent and iteration constraints.

Live human-review packetMarkdown review packet

Shows the reviewer-facing packet with tracked guitar energy deltas preserved as small nonzero values.

Live human-review packet dataJSON metadata

Shows the underlying structured target movement, sectionEnergyComparison, trackedInstruments, and receipt data.

Live ratchet batch summaryMarkdown summary

Records the production preliminary batch steps and candidate_ready_for_listening_review result.

Live ratchet batch dataJSON metadata

Structured batch summary with recommendation, artifact index, and proof/taste boundary.

Post-deploy proof receiptJSON proof

Captures the browser proof receipt from the deployed Amplify branch.

Post-deploy profile resultJSON metadata

Records the passed generated profile, route match, setup action return summary, and proof artifacts.

Post-deploy summaryMarkdown summary

Short summary emitted by the local Playwright runner for the live claim-candidate proof.

Post-deploy consoleJSON console

Shows the deployed proof run browser console events.

Post-deploy DOM summaryJSON metadata

Shows the live route loaded and matched /games/drum-sequencer.

Post-deploy ratchet screenshotPNG screenshot

Visual evidence from the deployed Neon target used for the tracked-energy precision proof.

Catch 7

Neon turned a natural mix request into proof-backed candidates

Back to top
Neon turned a natural mix request into proof-backed candidates evidence screenshot
May 25, 2026local + production proofLilArcadeRiddle Proofclaim candidatesaudio heuristics
Plain-English catch card

Neon turned a natural mix request into proof-backed candidates

A creative proof loop gets more useful when the requested claim is part of the run contract.

What went wrong
The Neon ratchet loop had become good at running a fixed chord-down claim, but trying a new musical request still meant editing profile JSON or leaning on hard-coded intent.
What Riddle caught
LilArcade PR #519 added --ratchet-intent, --ratchet-target-tracks, and --ratchet-direction to the Neon ratchet batch CLI.
Why it matters
Riddle Proof is becoming a practical candidate operator: an agent can ask for a natural musical change, get bounded candidates with objective receipts, and hand off a compact review packet instead of a vague "sounds better" claim.
What changed
For creative proof workflows: make claim text and target constraints first-class run parameters.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Neon ratchet batch can accept a natural mix-change request as a run parameter, generate a constrained claim-candidate profile, and return proof-backed candidates for listening review without claiming automatic taste.

Claim: A Neon ratchet batch can accept a natural mix-change request as a run parameter, generate a constrained claim-candidate profile, and return proof-backed candidates for listening review without claiming automatic taste.

Bug: The Neon ratchet loop had become good at running a fixed chord-down claim, but trying a new musical request still meant editing profile JSON or leaning on hard-coded intent. That slowed the loop and made natural claim translation harder to prove.

Why normal checks missed it: The profile already supported claim candidates, receipts, section-energy comparisons, and review packets. The missing layer was ergonomic: the batch CLI could narrow focus tracks and iteration count, but it could not declare the actual claim text, target tracks, or direction as first-class run inputs.

Why this sells Riddle Proof: Riddle Proof is becoming a practical candidate operator: an agent can ask for a natural musical change, get bounded candidates with objective receipts, and hand off a compact review packet instead of a vague "sounds better" claim.

Reusable profile seed: For creative proof workflows: make claim text and target constraints first-class run parameters. Generate temporary profiles for local iteration, preserve objective receipts and state restoration, and keep ranking as review order rather than taste proof.

What the browser run checked

  • Added CLI flags for ratchet intent, target tracks, and requested direction.
  • Threaded those flags into generated ratchet profiles as runRatchetLoop args.
  • Recorded the generated intent, focus tracks, target tracks, direction, and iteration budget in profile metadata.
  • Exposed claim-translation overrides in dry-run batch plans.
  • Added parser, generated-profile, and batch-plan tests.
  • Ran node --test scripts/__tests__/neonRatchetBatch.test.mjs.
  • Ran npm run test:sequencer with 148 passing tests.
  • Ran a live Riddle Proof Playwright preliminary candidate batch against the deployed Amplify branch URL.
  • Verified the live packet requested guitar/down, supported three guitar-down candidates, rejected zero candidates, restored state, and recommended guitar -0.05 for listening review.
  • Verified the live packet included section-energy details for every supported candidate and kept the proof/taste boundary explicit.

Proof lesson

A creative proof loop gets more useful when the requested claim is part of the run contract. The proof should record the natural request, explicit target constraints, candidate actions, receipt verdicts, section-energy tables, and the listening-review boundary in one packet.

ArtifactTypeWhat it proves
Natural claim reportMarkdown summary

Human-readable summary of the guitar-down claim, generated constraints, supported candidates, receipts, and proof boundary.

Natural claim report dataJSON metadata

Structured summary of the request, deploy, recommended candidate, receipts, ranking role, and section-energy deltas.

Generated ratchet profileJSON profile

Shows the temporary profile generated from CLI intent/target/direction overrides.

Live human-review packetMarkdown review packet

Shows the reviewer-facing supported-candidate packet with all-candidate section-energy tables.

Live human-review packet dataJSON metadata

Shows claimTarget targetTracks ["guitar"], direction "down", supported candidate count, receipts, and sectionEnergyComparison details.

Live ratchet batch summaryMarkdown summary

Records the live preliminary batch steps: mix health, layout, playback, generated claim-candidate run, and review-packet extraction.

Live ratchet batch dataJSON metadata

Structured batch summary with candidate_ready_for_listening_review, recommendation, artifact index, and proof boundary.

Post-deploy proof receiptJSON proof

Captures the browser proof receipt from the deployed Amplify branch.

Post-deploy profile resultJSON metadata

Records the passed generated profile, route match, setup action return summary, and proof artifacts.

Post-deploy summaryMarkdown summary

Short summary emitted by the local Playwright runner for the live claim-candidate proof.

Post-deploy consoleJSON console

Shows the deployed proof run had no fatal browser console events.

Post-deploy DOM summaryJSON metadata

Shows the live route loaded and matched /games/drum-sequencer.

Post-deploy ratchet screenshotPNG screenshot

Visual evidence from the deployed Neon target used for the natural-claim proof.

Catch 8

Neon refused to promote an off-target mix candidate

Back to top
Neon refused to promote an off-target mix candidate evidence screenshot
May 25, 2026local + production proofLilArcadeRiddle Proofintent guardclaim candidates
Plain-English catch card

Neon refused to promote an off-target mix candidate

Claim-candidate loops need target and direction receipts, not just mix-health receipts.

What went wrong
The Neon ratchet could ask for "turn the chord part down" and still surface a bass -0.18 candidate because broad review-order ranking found bass objectively guardrail-preserving.
What Riddle caught
LilArcade PR #517 added request-aware target inference and candidate-track/direction receipts.
Why it matters
Riddle Proof caught a claim-translation bug in a creative loop.
What changed
For natural-language change loops: parse or explicitly declare requested target and direction, attach receipts proving candidate-action alignment, allow no-candidate review packets, and make approval/durable patch handoffs require an actual supported candidate.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Neon claim-candidate packet should not promote a candidate unless the candidate action matches the requested target track and direction, and matching candidates must still satisfy deterministic preservation receipts.

Claim: A Neon claim-candidate packet should not promote a candidate unless the candidate action matches the requested target track and direction, and matching candidates must still satisfy deterministic preservation receipts.

Bug: The Neon ratchet could ask for "turn the chord part down" and still surface a bass -0.18 candidate because broad review-order ranking found bass objectively guardrail-preserving. That was useful exploration evidence, but it was not support for the requested chord-down claim.

Why normal checks missed it: The candidate packet already separated metrics from taste, so a normal review could see that ranking was only review_order_only. The mismatch was subtler: the recommended candidate matched the ranking metric but not the natural-language claim target. Only comparing requested_intent, candidate action, and preservation receipts exposed the claim-translation gap.

Why this sells Riddle Proof: Riddle Proof caught a claim-translation bug in a creative loop. The system did not decide taste; it prevented a wrong-target candidate from becoming a proof-backed recommendation and refused to prepare a durable patch when every matching candidate broke deterministic preservation receipts.

Reusable profile seed: For natural-language change loops: parse or explicitly declare requested target and direction, attach receipts proving candidate-action alignment, allow no-candidate review packets, and make approval/durable patch handoffs require an actual supported candidate.

What the browser run checked

  • Added intent parsing for requested track and direction, with explicit target args still supported.
  • Added candidate_track_matches_requested_intent and candidate_direction_matches_requested_intent receipts to every mix-level claim candidate.
  • Classified off-target or wrong-direction candidates as claim_translation_mismatch.
  • Narrowed default candidate generation to the requested track and direction when the intent is clear.
  • Relaxed the local claim-candidate profile so a no-candidate needs_followup packet is a valid review artifact instead of a product-regression assertion failure.
  • Changed the promotion batch so it skips approval-surrogate and durable patch planning when the review packet has no recommendation.
  • Ran focused contract tests, profile-sync tests, batch tests, full npm test, npm run build, GitHub CI, and Amplify deploy job 695.
  • Ran a post-deploy Riddle Proof Playwright promotion batch against the live Amplify branch URL.
  • Verified the live packet inferred chord/down, supported zero candidates, rejected four chord-down edits, restored state, and kept the current durable target clean.

Proof lesson

Claim-candidate loops need target and direction receipts, not just mix-health receipts. A candidate can be measurable, reversible, and guardrail-preserving while still being the wrong candidate for the claim.

ArtifactTypeWhat it proves
Intent candidate guard reportMarkdown summary

Human-readable summary of the off-target-candidate catch and no-supported-candidate live result.

Intent candidate guard report dataJSON metadata

Structured deploy, target, rejected-candidate, and proof-boundary metadata.

Live human-review packetMarkdown review packet

Shows the reviewer-facing needs_followup packet with zero supported candidates and four rejected chord-down candidates.

Live human-review packet dataJSON metadata

Shows claimTarget targetTracks ["chord"], direction "down", and the rejected candidate receipts.

Live ratchet batch summaryMarkdown summary

Records the live promotion batch steps, skipped approval handoff, coverage, and current-target audit.

Live ratchet batch dataJSON metadata

Structured batch summary with deterministic_findings_present, allowed findings, and current-target result.

Post-deploy proof receiptJSON proof

Captures the browser proof receipt from the deployed Amplify branch after PR #517.

Post-deploy profile resultJSON metadata

Records the passed profile, route match, setup action return summary, and proof artifacts.

Post-deploy summaryMarkdown summary

Short summary emitted by the local Playwright runner for the live claim-candidate proof.

Post-deploy consoleJSON console

Shows the deployed proof run had no fatal browser console events.

Post-deploy DOM summaryJSON metadata

Shows the live route loaded and matched /games/drum-sequencer.

Post-deploy ratchet screenshotPNG screenshot

Visual evidence from the deployed Neon target used for the intent-guard proof.

Catch 9

Neon section-energy guard rejected a disappearing chord cut

Back to top
Neon section-energy guard rejected a disappearing chord cut evidence screenshot
May 25, 2026local + production proofLilArcadeRiddle Proofaudio heuristicsclaim candidates
Plain-English catch card

Neon section-energy guard rejected a disappearing chord cut

Audio proof should make candidate rejection deterministic without pretending to automate taste.

What went wrong
The Neon candidate loop needed a more inspectable reason to reject a small-looking chord cut.
What Riddle caught
After @riddledc/riddle-proof-packs 0.6.0 shipped and LilArcade synced the Neon profiles, a live Riddle Proof Playwright run against the deployed Amplify branch passed.
Why it matters
Riddle Proof turned a fuzzy mixing concern into a deterministic candidate guard: it did not claim to know the better mix, but it caught a candidate that would make a required chord lane vanish and produced a compact review packet explaining why.
What changed
For audio/app proof packs: add section-window summaries, required-lane energy floors, headroom/clipping/low-level guardrails, and review-order-only ranking fields.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Neon claim-candidate packet should reject a candidate when a required section lane drops below measurable energy floors, while ranking surviving candidates only for human review.

Claim: A Neon claim-candidate packet should reject a candidate when a required section lane drops below measurable energy floors, while ranking surviving candidates only for human review.

Bug: The Neon candidate loop needed a more inspectable reason to reject a small-looking chord cut. The control edit chord 0.16 -> 0.06 was just another bounded candidate, but the live section-energy receipt showed the required chord lane disappearing in the Intro Bed proof window.

Why normal checks missed it: A normal candidate table can say a candidate failed preservation, but it does not show whether the failure is a meaningful audio-lane disappearance or just a ranking preference. The new section-energy receipt compared baseline and candidate windows directly and exposed RMS, peak, and total-energy floors for the required chord lane.

Why this sells Riddle Proof: Riddle Proof turned a fuzzy mixing concern into a deterministic candidate guard: it did not claim to know the better mix, but it caught a candidate that would make a required chord lane vanish and produced a compact review packet explaining why.

Reusable profile seed: For audio/app proof packs: add section-window summaries, required-lane energy floors, headroom/clipping/low-level guardrails, and review-order-only ranking fields. Rejections should name the failed receipt and the baseline/candidate metric delta that supports it.

What the browser run checked

  • Added reusable section-energy and RMS-derived loudness-style helpers to @riddledc/riddle-proof-packs.
  • Published @riddledc/riddle-proof-packs 0.6.0 through the trusted Changesets/npm provenance flow.
  • Added Neon proof-contract receipts for required section energy floors and headroom preservation.
  • Synced the LilArcade Neon ratchet profiles so they explicitly pass sectionHeuristics args and capture sectionEnergyComparison return fields.
  • Ran focused Neon proof-contract tests, full npm test, npm run build, GitHub CI, and Amplify deploys for LilArcade PRs #515 and #516.
  • Ran a post-deploy Riddle Proof Playwright profile against the live Amplify branch URL.
  • Verified the live profile passed with five supported claim candidates, one rejected candidate, zero fatal console events, and HTTP 200.
  • Verified chord -0.10 was rejected because Intro Bed chord energy dropped from RMS 0.0022 / peak 0.0079 / total energy 0.000001 to all zeroes.
  • Verified the recommended bass -0.18 candidate preserved section energy floors and guardrails, and remained review-order only rather than a taste claim.

Proof lesson

Audio proof should make candidate rejection deterministic without pretending to automate taste. Section-by-section energy floors and loudness-style deltas are useful review aids when they reject disappearing required lanes, preserve headroom guardrails, and keep the surviving candidates ranked for human listening review.

ArtifactTypeWhat it proves
Candidate guard reportMarkdown summary

Human-readable explanation of the chord -0.10 rejection and proof/taste boundary.

Candidate guard report dataJSON metadata

Structured summary of candidate counts, ranking role, rejected chord metrics, deploy metadata, and caveats.

Post-deploy proof receiptJSON proof

Captures the live browser proof receipt from the deployed Amplify branch.

Post-deploy profile resultJSON metadata

Records the passed profile, claim_candidate_supported packet, section-energy comparisons, and rejected candidate receipt.

Post-deploy summaryMarkdown summary

Short summary emitted by the local Playwright runner for the live deployed proof.

Post-deploy consoleJSON console

Shows the deployed proof run had no fatal browser console events.

Post-deploy DOM summaryJSON metadata

Shows the live route loaded, matched /games/drum-sequencer, and had no horizontal overflow.

Post-deploy ratchet screenshotPNG screenshot

Visual evidence from the deployed Neon target used for the ratchet-loop proof.

Catch 10

Neon profile sync blessed a stale current target

Back to top
Neon profile sync blessed a stale current target evidence screenshot
May 25, 2026local + production proofLilArcadeRiddle Proofprofile synccurrent target
Plain-English catch card

Neon profile sync blessed a stale current target

Current-target profiles for app-owned durable state should be generated from the app state they claim to audit.

What went wrong
After the chord 0.16 preservation candidate became the active durable override, the checked-in Neon durable current-target profile still referenced the older chord 0.18 candidate from the pack sample.
What Riddle caught
LilArcade PR #514 changed the Neon profile sync command so the durable current-target profile is instantiated from the active durable override file.
Why it matters
Riddle Proof caught a source-of-truth drift bug in the proof machinery itself.
What changed
For app-owned current-target audits: pack samples should provide profile shape, while active app state supplies concrete claims.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
The deployed Neon current target and checked-in durable current-target profile should agree with the active app-owned durable override, not a stale pack sample.

Claim: The deployed Neon current target and checked-in durable current-target profile should agree with the active app-owned durable override, not a stale pack sample.

Bug: After the chord 0.16 preservation candidate became the active durable override, the checked-in Neon durable current-target profile still referenced the older chord 0.18 candidate from the pack sample. The profile sync gate passed because it only compared local profiles to the published pack, not to the app source of truth.

Why normal checks missed it: The sync check was internally consistent: local files matched the reusable pack sample. The mismatch only appeared when comparing three evidence roles together: the active app override, the checked-in current-target profile, and the deployed browser proof target.

Why this sells Riddle Proof: Riddle Proof caught a source-of-truth drift bug in the proof machinery itself. The app had the right active override, but a stale synced profile could have produced misleading review evidence. The fix makes the ratchet more trustworthy without adding more deploy churn.

Reusable profile seed: For app-owned current-target audits: pack samples should provide profile shape, while active app state supplies concrete claims. Sync checks should fail when generated proof profiles drift from the app-owned source of truth.

What the browser run checked

  • Found the active source override at chord 0.16 while the synced static current-target profile still expected chord 0.18.
  • Updated the profile sync command to read active durable overrides and generate the durable current-target profile from the first active override.
  • Added regression coverage for default active override generation and custom override files.
  • Regenerated .riddle-proof/profiles/neon-durable-current-target.json with the chord 0.16 preservation candidate.
  • Ran profile sync check, focused node tests, full npm test, npm run build, and a local browser proof against the regenerated static profile.
  • Merged LilArcade PR #514 after GitHub CI passed and waited for Amplify job 692 to deploy commit 1505ab8.
  • Ran a post-deploy current-target proof against the live branch URL and verified chord 0.16 across contract state, profile state, and visible UI.

Proof lesson

Current-target profiles for app-owned durable state should be generated from the app state they claim to audit. Reusable pack samples are useful seeds, but they should not become stale source-of-truth evidence for a live app.

ArtifactTypeWhat it proves
Profile sync reportMarkdown summary

Explains the stale-profile catch, the fix, the deploy, and the proof/taste boundary.

Profile sync report dataJSON metadata

Structured summary of the stale chord 0.18 profile, active chord 0.16 override, and post-deploy proof receipt.

Generated current-target profileJSON profile

Shows the checked-in profile generated from the active chord 0.16 durable override.

Post-deploy proof receiptJSON proof

Captures the browser proof receipt from the deployed branch URL after PR #514 merged.

Post-deploy profile resultJSON metadata

Records status passed, HTTP 200, visible 0.16X, chord level agreement, audio guardrails, and zero fatal console events.

Post-deploy summaryMarkdown summary

Human-readable summary from the deployed current-target proof run.

Post-deploy consoleJSON console

Shows the deployed proof run had no fatal browser console events.

Post-deploy DOM summaryJSON metadata

Shows the route loaded, matched /games/drum-sequencer, and had no horizontal overflow.

Post-deploy current-target screenshotPNG screenshot

Visual evidence from the deployed current-target proof.

Catch 11

Neon candidate loop tested the wrong audio path

Back to top
Neon candidate loop tested the wrong audio path evidence screenshot
May 25, 2026local + production proofLilArcadeRiddle Proofclaim candidatesaudio guardrails
Plain-English catch card

Neon candidate loop tested the wrong audio path

Claim-candidate loops must use the same source-preparation and state-isolation preconditions as the guardrail proofs they depend on.

What went wrong
After the Neon chord 0.08 durable override shipped, the deeper production sweep found the required Monkberry chord lane missing in intro windows.
What Riddle caught
Before the fix, the production candidate batch failed deep exploration with two Monkberry intro findings: chord was required but inactive at chord 0.08, with no clipping or low-level window.
Why it matters
Riddle Proof caught a ratchet consistency bug: one proof path said the chord lane was missing, while another path was about to recommend cutting it further.
What changed
For rich app proof loops: run deterministic sweeps first, use the same setup actions for candidate review, restore state before every candidate, treat candidate ranking as review order only, and reject candidates that violate required active lanes before any human approval surrogate can promote them.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Neon claim-candidate loop should evaluate candidate edits under the same prepared audio path as deep exploration, isolate each candidate from prior attempts, and reject candidates that break required instrument preservation.

Claim: A Neon claim-candidate loop should evaluate candidate edits under the same prepared audio path as deep exploration, isolate each candidate from prior attempts, and reject candidates that break required instrument preservation.

Bug: After the Neon chord 0.08 durable override shipped, the deeper production sweep found the required Monkberry chord lane missing in intro windows. Raising the active chord floor to 0.16 fixed that deterministic guardrail, but the claim-candidate loop still recommended another chord -0.10 cut because it rendered candidates without the prepared sample-source path used by deep exploration. Candidate attempts also inherited prior candidate edits instead of starting from the original mix each time.

Why normal checks missed it: The individual proof surfaces were each plausible: current-target proof could verify the active override, and the candidate packet could produce a green recommendation. The bug only appeared when comparing evidence roles across the ratchet: deep exploration loaded the piano/sample path and rejected the low chord floor, while the review loop used a different render setup where the same cut looked preserved.

Why this sells Riddle Proof: Riddle Proof caught a ratchet consistency bug: one proof path said the chord lane was missing, while another path was about to recommend cutting it further. The fix makes the creative loop safer and more reproducible without pretending the system can hear taste.

Reusable profile seed: For rich app proof loops: run deterministic sweeps first, use the same setup actions for candidate review, restore state before every candidate, treat candidate ranking as review order only, and reject candidates that violate required active lanes before any human approval surrogate can promote them.

What the browser run checked

  • Ran the production candidate batch after the chord 0.08 promotion and saw deep exploration fail on Monkberry intro windows where chord was required but inactive.
  • Ran a bounded current-target level sweep and found chord 0.12 and above restored required chord activity, while 0.08 and 0.10 failed preservation.
  • Superseded the active chord 0.08 durable override and added chord 0.16 as the active preservation candidate.
  • Added candidate-loop isolation so every claim candidate starts from the original mixer levels instead of inheriting previous attempts.
  • Added ratchet-loop source preparation so baseline and candidate renders use the same loaded sample-source path as deep exploration.
  • Added regression tests for candidate isolation and source-preparation failure before candidate search.
  • Merged LilArcade PR #513, waited for deployment, and ran production current-target and candidate batches against https://lilarcade.com.
  • Verified production current-target proof passed with chord 0.16, zero findings, no clipping, and no low-level window.
  • Verified production candidate review rejected chord -0.10 for required_instruments_preserved and kept ranking as review_order_only.

Proof lesson

Claim-candidate loops must use the same source-preparation and state-isolation preconditions as the guardrail proofs they depend on. Otherwise the packet can rank a candidate for listening review under easier conditions than the current-target audit or deep sweep.

ArtifactTypeWhat it proves
Before deep exploration summaryJSON metadata

Shows the production deterministic finding: Monkberry intro windows required chord, but chord was inactive at the chord 0.08 durable override.

Before deep exploration reportMarkdown summary

Human-readable version of the pre-fix deep exploration failure.

Before source-prep review packetMarkdown summary

Shows the review packet still recommending chord -0.10 before the ratchet loop used the prepared audio-source path.

Before source-prep packet dataJSON metadata

Structured packet data behind the inconsistent chord -0.10 recommendation.

Final production candidate batchJSON metadata

Records the deployed candidate batch passing with zero deterministic findings and a review-order recommendation of bass -0.18.

Final production review packetMarkdown summary

Shows the human-readable packet after the fix: chord -0.10 is rejected and ranking remains review-order only.

Final production packet dataJSON metadata

Preserves supported and rejected candidates, including the required_instruments_preserved rejection for chord -0.10.

Final claim-candidate proofJSON proof

Captures the browser proof receipt for the deployed claim-candidate loop.

Final current-target summaryJSON metadata

Shows the deployed active override at chord 0.16 with zero findings, no clipping, and no low-level window.

Final current-target proofJSON proof

Captures the browser proof receipt for the active durable override after deployment.

Generated current-target profileJSON profile

Shows the generated profile that asserted the active chord 0.16 durable override.

Final claim-candidate screenshotPNG screenshot

Shows the deployed Neon target used for the final source-prepared claim-candidate proof.

Final current-target screenshotPNG screenshot

Shows the deployed Neon current target used to prove chord 0.16 was active.

Catch 12

Neon current-target proof hid its own nested receipts

Back to top
Neon current-target proof hid its own nested receipts evidence screenshot
May 25, 2026local run + production proofLilArcadeRiddle Proofartifact handoffaudio guardrails
Plain-English catch card

Neon current-target proof hid its own nested receipts

Passing proof is not enough if the handoff cannot find the proof.

What went wrong
After the Neon chord 0.08 durable override was applied locally, the current-target proof passed but the batch artifact index only listed the three top-level batch files.
What Riddle caught
Before the fix, the post-apply current-target run for the new chord 0.08 override passed with current_target_ready, one active override, zero findings, peak 0.7817, headroom 2.14 dB, clipping false, and lowLevel false, but artifactIndex contained only 3 entries.
Why it matters
Riddle Proof found a handoff failure after the product proof passed: the app state was correct, but the evidence packet was too shallow for reliable review.
What changed
For durable current-target audits: always index nested per-target receipts, not just batch-level summaries.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Neon durable current-target audit should prove the live app sees the active chord 0.08 source override and should expose the nested proof receipts needed to review that claim.

Claim: A Neon durable current-target audit should prove the live app sees the active chord 0.08 source override and should expose the nested proof receipts needed to review that claim.

Bug: After the Neon chord 0.08 durable override was applied locally, the current-target proof passed but the batch artifact index only listed the three top-level batch files. The actual per-override proof receipt, generated profile, console capture, DOM summary, and screenshots were nested under the durable-current-target step and missing from the reviewer-facing artifact index.

Why normal checks missed it: The proof itself was green, so a normal pass/fail check would have stopped there. The weakness only appeared when treating the batch summary as the handoff surface a reviewer or agent would use to inspect the evidence after promotion.

Why this sells Riddle Proof: Riddle Proof found a handoff failure after the product proof passed: the app state was correct, but the evidence packet was too shallow for reliable review. The fix makes promoted creative changes easier to audit without claiming the mix is automatically better.

Reusable profile seed: For durable current-target audits: always index nested per-target receipts, not just batch-level summaries. The summary should answer what changed, what was proved, what stayed guarded, where the screenshots are, and what still requires human taste review.

What the browser run checked

  • Ran the local promotion gate that recommended the reviewed chord -0.10 candidate from the active chord 0.18 state.
  • Applied the durable source patch that superseded chord 0.18 and made chord 0.08 active for Monkberry Moon Delight (Tab).
  • Ran a local post-apply current-target audit and saw the audit pass while the artifact index exposed only three top-level files.
  • Added durable-current-target artifact indexing for aggregate summaries, nested proof receipts, generated profiles, and screenshots.
  • Updated the batch artifact-index test and the Monkberry durable mix-profile assertion.
  • Merged LilArcade PR #512 and waited for the live deployment notification to complete.
  • Ran a production current-target proof against https://lilarcade.com and verified chord 0.08, zero findings, no clipping, no low-level window, and 14 indexed artifacts.

Proof lesson

Passing proof is not enough if the handoff cannot find the proof. Current-target audits need to index nested per-override receipts, screenshots, and aggregate summaries so a durable source change can be reviewed without spelunking through output directories.

ArtifactTypeWhat it proves
Before current-target summaryJSON metadata

Shows the green post-apply local audit with only three top-level artifact-index entries.

Final production batch summaryJSON metadata

Shows the deployed current-target proof with chord 0.08 active, zero findings, and 14 indexed artifacts.

Final production reportMarkdown summary

Gives the reviewer-facing current-target summary after nested artifact indexing was fixed.

Durable current-target summaryJSON metadata

Records the active durable override id, expected chord level, mix health, and zero findings.

Production proof receiptJSON proof

Captures the browser proof receipt for the deployed current target.

Generated current-target profileJSON profile

Shows the generated profile that asserted the active chord 0.08 durable override.

Production current-target screenshotPNG screenshot

Shows the deployed Neon target used by the final current-target proof.

Catch 13

Neon review packets hid candidate evidence in raw JSON

Back to top
Neon review packets hid candidate evidence in raw JSON evidence screenshot
May 25, 2026local runLilArcadeRiddle Proofhuman reviewaudio guardrails
Plain-English catch card

Neon review packets hid candidate evidence in raw JSON

Human-review packets are proof artifacts, not summaries after the fact.

What went wrong
The local Neon preliminary ratchet could produce a valid human-review packet, but the Markdown artifact only showed the recommendation and counts.
What Riddle caught
The before packet from the preliminary Neon loop recommended chord -0.10 and counted two supported candidates, but had no Supported Candidates table.
Why it matters
Riddle Proof improved the handoff between objective browser proof and human creative judgment: the system still does not decide taste, but it gives reviewers the evidence needed to make the listening decision quickly.
What changed
For creative proof loops: treat human-review Markdown as a first-class artifact.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Neon human-review packet should carry the concrete candidate evidence a listener needs for review in Markdown, not only in raw JSON, while preserving that objective receipts do not prove subjective mix taste.

Claim: A Neon human-review packet should carry the concrete candidate evidence a listener needs for review in Markdown, not only in raw JSON, while preserving that objective receipts do not prove subjective mix taste.

Bug: The local Neon preliminary ratchet could produce a valid human-review packet, but the Markdown artifact only showed the recommendation and counts. Candidate actions, measured target movement, receipt pass/fail status, and ranking values were present in JSON but hidden from the reviewer-readable packet.

Why normal checks missed it: The proof run was green and the packet said the right proof/taste boundary. The weakness only showed up when using the artifact for its actual purpose: deciding whether a supported candidate is worth listening to without reading raw JSON.

Why this sells Riddle Proof: Riddle Proof improved the handoff between objective browser proof and human creative judgment: the system still does not decide taste, but it gives reviewers the evidence needed to make the listening decision quickly.

Reusable profile seed: For creative proof loops: treat human-review Markdown as a first-class artifact. Include candidate rows, action deltas, target-movement deltas, pass/fail receipt summaries, ranking-as-review-order, and explicit caveats that objective guardrails are not taste.

What the browser run checked

  • Compared the earlier preliminary packet Markdown against its JSON and confirmed candidate rows were hidden from the human-readable artifact.
  • Added reusable supported and rejected candidate tables to @riddledc/riddle-proof-packs human-review Markdown.
  • Published @riddledc/riddle-proof-packs@0.5.2 through trusted publishing.
  • Updated LilArcade to consume the published package and assert table rows in the Neon review-packet tests.
  • Ran a real local Neon prelim-candidate batch after the package bump.
  • Verified the final Markdown packet lists candidate action, target movement, receipt status, and ranking while retaining review_order_only and taste caveats.

Proof lesson

Human-review packets are proof artifacts, not summaries after the fact. Creative ratchets need to show supported and rejected candidates, target movement, receipt status, and ranking-as-review-order directly in Markdown while still refusing to call the mix automatically better.

ArtifactTypeWhat it proves
Before review packetMarkdown summary

Shows the earlier reviewer-facing packet with recommendation, counts, and boundary language, but no candidate evidence table.

Before review packet dataJSON metadata

Shows that the candidate details existed structurally but required raw JSON inspection.

Final review packetMarkdown summary

Shows the supported-candidates table in the human-readable packet, including action, target movement, receipts, and ranking.

Final review packet dataJSON metadata

Preserves the structured review packet behind the Markdown table.

Final batch summaryJSON metadata

Records preliminary_candidate_ready, chord -0.10 recommendation, two supported candidates, zero rejected candidates, and state restoration.

Final proof receiptJSON metadata

Captures the browser proof receipt for the generated preliminary candidate target.

Generated ratchet profileJSON profile

Shows the generated profile that produced the final local proof run.

Final claim-candidate screenshotPNG screenshot

Shows the Neon target used for the final preliminary candidate proof.

Review packet table reportMarkdown summary

Summarizes the catch, package fix, final receipt, and proof/taste boundary.

Catch 14

Neon batch confused patch-plan identity with current target proof

Back to top
Neon batch confused patch-plan identity with current target proof evidence screenshot
May 24, 2026local runLilArcadeRiddle Proofaudio guardrailslocal ratchet
Plain-English catch card

Neon batch confused patch-plan identity with current target proof

A proof batch needs to distinguish planned durable edits from already-active current-target evidence.

What went wrong
The local Neon ratchet batch could show a new durable patch plan and a durable current-target proof with the same override id even though they referred to different mixer levels: the proposed future patch wanted chord 0.08 while the active current target still proved chord 0.18.
What Riddle caught
Before the fix, the full local batch produced a durable patch plan for chord 0.18 -> 0.08 with override id monkberry-moon-delight-tab-chord-minus-01-approved-candidate, while the current-target proof used the same id for the active chord 0.18 override.
Why it matters
Riddle Proof caught a workflow-level ambiguity: the product state was healthy, but the proof handoff could mislabel a future edit as the same durable object as the current target.
What changed
For creative app patch handoffs: generate durable ids from absolute target state, compare planned edits against current-target receipts, fail on id collisions with different state, and label unapplied plans as handoffs rather than live proof.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Neon ratchet batch should keep durable patch plans and durable current-target proofs semantically distinct, and should fail if the same override id points at different mixer levels.

Claim: A Neon ratchet batch should keep durable patch plans and durable current-target proofs semantically distinct, and should fail if the same override id points at different mixer levels.

Bug: The local Neon ratchet batch could show a new durable patch plan and a durable current-target proof with the same override id even though they referred to different mixer levels: the proposed future patch wanted chord 0.08 while the active current target still proved chord 0.18.

Why normal checks missed it: Each step looked reasonable alone. The approved-candidate proof produced a valid listening-review handoff, and the current-target proof correctly verified the already-active source override. The mismatch only appeared when the batch stitched the patch-plan and current-target roles together.

Why this sells Riddle Proof: Riddle Proof caught a workflow-level ambiguity: the product state was healthy, but the proof handoff could mislabel a future edit as the same durable object as the current target. The fix makes local batching safer before deployment without pretending to judge musical taste.

Reusable profile seed: For creative app patch handoffs: generate durable ids from absolute target state, compare planned edits against current-target receipts, fail on id collisions with different state, and label unapplied plans as handoffs rather than live proof.

What the browser run checked

  • Ran a full local Neon ratchet batch with approved-candidate and durable-current-target gates included.
  • Observed that the old durable patch plan reused the active current-target override id for a different absolute chord level.
  • Changed durable patch-plan override ids from delta-only naming to absolute from/to level naming.
  • Added a batch comparison receipt between the planned durable patch and active current-target proof results.
  • Added a deterministic failure path for planned_override_id_collision.
  • Verified that a not-yet-applied planned patch is reported as planned_override_not_applied_yet rather than confused with the current target.
  • Re-ran the focused browser batch and verified the planned chord 0.08 patch and active chord 0.18 current-target proof were clearly separated.

Proof lesson

A proof batch needs to distinguish planned durable edits from already-active current-target evidence. Repeated deltas are not stable identities for creative changes; durable handoff ids need absolute target evidence, and the batch should fail if one id names two different level states.

ArtifactTypeWhat it proves
Before batch summaryJSON metadata

Shows the full local batch before the comparison receipt existed, including the approved patch plan and durable current-target proof in one run.

Before durable patch planJSON metadata

Records the planned chord 0.08 durable edit using the same delta-based override id as the active chord 0.18 current-target override.

Before durable current-target summaryJSON metadata

Records the active current-target override at chord 0.18, proving the same id referred to the already-active source state.

Final batch summaryJSON metadata

Shows the corrected batch output with planComparison planned_override_not_applied_yet and no current-target findings.

Final durable patch planJSON metadata

Records the corrected planned override id using absolute 0.18-to-0.08 level identity.

Final human review packetJSON metadata

Preserves the approved-candidate packet while keeping ranking as review order, not a taste verdict.

Final current-target proofJSON metadata

Records the active durable current-target proof with chord 0.18, zero findings, no clipping, and no low-level window.

Final approved-candidate screenshotPNG screenshot

Shows the rich Neon target used for the approved-candidate handoff that created the corrected patch plan.

Patch-plan identity reportMarkdown summary

Gives a compact reviewer-readable account of the catch, the fix, and the proof/taste boundary.

Catch 15

Neon durable mix proof could not prove profile-source agreement

Back to top
Neon durable mix proof could not prove profile-source agreement evidence screenshot
May 24, 2026local + production runLilArcadeRiddle Proofapp contractaudio guardrails
Plain-English catch card

Neon durable mix proof could not prove profile-source agreement

Durable creative edits need a current-target proof that checks both runtime state and source/profile descriptors.

What went wrong
The durable current-target proof could see the approved chord level in the live Neon app and visible UI, but the proof contract did not expose mixProfile.mixerLevels, so it could not prove the source/profile descriptor agreed with the running mixer state.
What Riddle caught
The first production run of npm run proof:sequencer:durable-current-target failed with selectedSongMatches true, mixProfileMatches true, contractMatches true, visibleMatches true, actualLevel 0.18, visibleToken 0.18X, but profileMatches false because profileLevel was null.
Why it matters
Riddle Proof found a proof-contract weakness in a healthy-looking applied mix: the UI and live state were right, but the durable source/profile evidence was not strong enough yet.
What changed
For durable app edits: prove the current target by comparing the durable record, route identity, live app contract state, source/profile descriptors, visible UI tokens, render metrics, and artifact summary parser behavior.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A durable approved Neon mix override should be provable on the current production target through live contract state, source/profile descriptor state, visible UI text, and basic offline render guardrails.

Claim: A durable approved Neon mix override should be provable on the current production target through live contract state, source/profile descriptor state, visible UI text, and basic offline render guardrails.

Bug: The durable current-target proof could see the approved chord level in the live Neon app and visible UI, but the proof contract did not expose mixProfile.mixerLevels, so it could not prove the source/profile descriptor agreed with the running mixer state.

Why normal checks missed it: The production app looked correct: Monkberry Moon Delight (Tab) loaded, the mix profile id matched, the live contract level was chord 0.18, and the UI showed 0.18X. A weaker current-target smoke would have stopped there. The stricter durable override audit compared live state, visible text, and profile/source evidence together and found profileLevel null.

Why this sells Riddle Proof: Riddle Proof found a proof-contract weakness in a healthy-looking applied mix: the UI and live state were right, but the durable source/profile evidence was not strong enough yet. The fix makes future creative edits more auditable without pretending the proof has judged musical taste.

Reusable profile seed: For durable app edits: prove the current target by comparing the durable record, route identity, live app contract state, source/profile descriptors, visible UI tokens, render metrics, and artifact summary parser behavior. Keep subjective quality out of the verdict.

What the browser run checked

  • Ran a production durable current-target proof for the active Monkberry Tab chord 0.18 override.
  • Verified the selected song and mix profile id matched the durable override target.
  • Verified the live app contract level and visible UI token already showed chord 0.18 / 0.18X.
  • Caught that the profile/source descriptor did not expose mixerLevels, leaving profileLevel null and profileMatches false.
  • Caught that the first summary parser missed stored window_eval evidence because the runner reported return_stored_to rather than label.
  • Added a repeatable npm run proof:sequencer:durable-current-target command with current_target evidence-role language.
  • Exposed mixProfile.mixerLevels through Neon proof state, mixer state, and offline metric receipts.
  • Re-ran production proof after deploy and verified contractMatches, profileMatches, visibleMatches, non-clipping, and non-silence guardrails all passed.

Proof lesson

Durable creative edits need a current-target proof that checks both runtime state and source/profile descriptors. The result still does not judge taste; it proves the applied override is observable, source-backed, and inside objective guardrails.

ArtifactTypeWhat it proves
Initial durable current-target screenshotPNG screenshot

Shows the production Neon target that looked visually correct while the profile/source proof receipt was still incomplete.

Initial failing profile resultJSON metadata

Preserves the returned check with actualLevel 0.18, visibleToken 0.18X, profileLevel null, and profileMatches false.

Initial failing summaryJSON metadata

Shows the top-level summary blind spot before the parser learned to read return_stored_to evidence.

Final durable current-target screenshotPNG screenshot

Shows the same production Neon target after the proof contract exposed profile mixer levels.

Final passing profile resultJSON metadata

Records the final production pass with actualLevel 0.18, profileLevel 0.18, visibleMatches true, peak 0.7777, RMS 0.1112, clipping false, and lowLevel false.

Final command summaryJSON metadata

Records the repeatable command output: one active override, zero findings, and ready_for_promotion_review.

Durable current-target catch reportJSON metadata

Summarizes the proof-contract gap, PR #501 fix, validation commands, deploy notification, and final production receipt.

Durable current-target narrativeMarkdown summary

Gives a compact reviewer-readable account of the catch while keeping the proof/taste boundary explicit.

Catch 16

Neon deep exploration found proof-window overclaim and hot presets

Back to top
Neon deep exploration found proof-window overclaim and hot presets evidence screenshot
May 24, 2026local runLilArcadeRiddle Proofaudio guardrailslocal ratchet
Plain-English catch card

Neon deep exploration found proof-window overclaim and hot presets

One-piece-at-a-time ratchets are useful while shaping the contract, but once a round is clean the efficient move is a deeper local sweep before deploy.

What went wrong
The bounded Neon proof loop was clean, but a deeper all-current-song local sweep found one proof-window calibration overclaim and five additional built-in preset clipping regressions.
What Riddle caught
The failing local sweep sampled 6 available songs, 6 proof-capable songs, 19 parts, and 23 proof windows.
Why it matters
Riddle Proof found both proof weakness and product weakness: a declared-window overclaim and objective clipping in rich audio presets.
What changed
Use a two-speed ratchet for rich apps: keep a fast bounded current-target profile for normal iteration, then run a deeper local exploration profile that samples more app states, validates declared assumptions against live contract receipts, records deterministic findings, restores state, and leaves subjective taste to human review.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A deeper local Neon exploration proof should catch deterministic app/audio guardrail failures that the fast bounded loop can miss, then close with a repeatable command and a clean final receipt.

Claim: A deeper local Neon exploration proof should catch deterministic app/audio guardrail failures that the fast bounded loop can miss, then close with a repeatable command and a clean final receipt.

Bug: The bounded Neon proof loop was clean, but a deeper all-current-song local sweep found one proof-window calibration overclaim and five additional built-in preset clipping regressions.

Why normal checks missed it: The default ratchet was intentionally bounded for speed, and single-song smoke proofs only exercised the current target. They could miss later Monkberry parts whose active lanes differed from the declared proof window, plus hot presets outside the first sampled set.

Why this sells Riddle Proof: Riddle Proof found both proof weakness and product weakness: a declared-window overclaim and objective clipping in rich audio presets. The final command makes the expensive check repeatable locally, so deploys can be batched after a cleaner proof suite.

Reusable profile seed: Use a two-speed ratchet for rich apps: keep a fast bounded current-target profile for normal iteration, then run a deeper local exploration profile that samples more app states, validates declared assumptions against live contract receipts, records deterministic findings, restores state, and leaves subjective taste to human review.

What the browser run checked

  • Ran an all-current-song local Neon exploration sweep against the live Drum Sequencer target.
  • Sampled 6 available songs, 6 proof-capable songs, 19 parts, and 23 proof windows in the failing run.
  • Classified a Monkberry Tab declared-window mismatch as profile_calibration rather than product taste.
  • Classified Yakety Yak Drop/Resolve, Dark Progression A/B, and Midnight Protocol A clipping as product_regression guardrail failures.
  • Validated declared proof windows against per-part active lanes before accepting required-active claims.
  • Lowered hot built-in preset levels and reran the deeper sweep.
  • Added a repeatable npm run proof:sequencer:deep-explore command so this deeper local ratchet can run before batching deploys.

Proof lesson

One-piece-at-a-time ratchets are useful while shaping the contract, but once a round is clean the efficient move is a deeper local sweep before deploy. The deeper pass should batch deterministic guardrail failures while still saying nothing about subjective mix taste.

ArtifactTypeWhat it proves
Failing deep exploration screenshotPNG screenshot

Shows the local Neon target used when the deeper sweep surfaced declared-window and clipping findings.

Failing deep exploration receiptJSON metadata

Records the product_regression proof with 6 findings across 19 sampled parts and 23 sampled windows.

Failing deep profile resultJSON metadata

Preserves the setup-action receipt where runExplorationSweep returned the declared-window mismatch and hot-preset clipping findings.

Deep exploration catch reportJSON metadata

Summarizes the failing coverage, each deterministic finding, the clean final coverage, and the repeatable command.

Final deep exploration screenshotPNG screenshot

Shows the same local Neon target after the deeper sweep closed with 0 findings.

Final deep exploration receiptJSON metadata

Records the final passed profile after the proof-window validation and preset headroom fixes.

Final deep exploration command summaryJSON metadata

Records the repeatable command output: 6 proof-capable songs, 19 sampled parts, 22 sampled windows, 0 findings, and restoration ok.

Deep exploration narrativeMarkdown summary

Gives a compact reviewer-readable account of the catch, the cleanup, and the objective-guardrail boundary.

Catch 17

Neon playback proof could pass without proving playback

Back to top
Neon playback proof could pass without proving playback evidence screenshot
May 24, 2026local runLilArcadeRiddle Proofinteraction proof
Plain-English catch card

Neon playback proof could pass without proving playback

Interaction proofs need action-specific receipts.

What went wrong
The Neon playback-sync proof pack profile could pass after clicking Play even when the captured app contract still reported playback stopped: post-action isPlaying false and trainer currentStep 0.
What Riddle caught
The before receipt passed with 7 setup actions and no Stop text wait, while post-action contract evidence still showed isPlaying false and currentStep 0.
Why it matters
Riddle Proof caught a flaw in the proof, not just the product.
What changed
For interaction proofs: capture pre-action state, perform the user action, wait for a visible UI transition, capture post-action app-contract state, and assert both the state value and movement/delta implied by the claim.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Neon playback interaction proof should not pass merely because Play was clicked; it should prove the visible control switched to Stop and the app contract reports playback running with the trainer playhead advanced.

Claim: A Neon playback interaction proof should not pass merely because Play was clicked; it should prove the visible control switched to Stop and the app contract reports playback running with the trainer playhead advanced.

Bug: The Neon playback-sync proof pack profile could pass after clicking Play even when the captured app contract still reported playback stopped: post-action isPlaying false and trainer currentStep 0.

Why normal checks missed it: The route loaded, the Play button was visible, the click did not throw, the screenshot was captured, and the profile only asserted that post-playback evidence existed. It did not wait for the visible Stop state or assert that playback was actually running and advancing.

Why this sells Riddle Proof: Riddle Proof caught a flaw in the proof, not just the product. The fix turns a vague click-and-screenshot interaction into a deterministic receipt that the target state actually changed.

Reusable profile seed: For interaction proofs: capture pre-action state, perform the user action, wait for a visible UI transition, capture post-action app-contract state, and assert both the state value and movement/delta implied by the claim.

What the browser run checked

  • Re-ran the old synced playback-sync profile against a local LilArcade dev server and captured a passing false positive.
  • Confirmed the old proof had no wait_for_text receipt for Stop after clicking Play.
  • Confirmed the old post-action contract capture still returned isPlaying false and trainer currentStep 0.
  • Verified the app itself could play by checking a direct browser sanity run where Play All changed to Stop and the trainer playhead advanced.
  • Hardened the reusable Neon playback-sync pack to normalize nested trainer fields, wait for Stop, assert isPlaying true, and assert movedForward true.
  • Published the corrected proof pack as @riddledc/riddle-proof-packs@0.4.8 through the trusted changesets flow.
  • Synced LilArcade to the published pack and reran the local playback proof from the app-local profile.

Proof lesson

Interaction proofs need action-specific receipts. For playback, the proof should pair a visible UI transition with live app-contract state, then assert the state changed in the claimed direction.

ArtifactTypeWhat it proves
False-positive playback receiptJSON metadata

Records the old passing proof with no Stop wait and post-action isPlaying false, currentStep 0 evidence.

False-positive playback screenshotPNG screenshot

Shows why a screenshot alone was too weak to prove the playback claim.

Hardened playback receiptJSON metadata

Records the corrected proof with visible Stop text, isPlaying true, currentStep 2, and movedForward true.

Hardened playback screenshotPNG screenshot

Shows the running Neon target after the hardened interaction proof captured the playback state.

Playback proof catch reportJSON metadata

Summarizes the before/after receipts, package release, LilArcade sync, validation commands, and does-not-prove boundary.

Playback proof catch narrativeMarkdown summary

Gives a compact reviewer-readable account of the proof-profile false positive and the hardened interaction-proof pattern.

Catch 18

Neon app profiles drifted from the reusable proof pack

Back to top
Neon app profiles drifted from the reusable proof pack evidence screenshot
May 24, 2026local runLilArcadeRiddle Proofproof packs
Plain-English catch card

Neon app profiles drifted from the reusable proof pack

Reusable proof packs need a synchronization gate in the target app.

What went wrong
LilArcade had only five local Neon Riddle Proof profiles while the published Neon Step Sequencer pack had nine, and the existing local files were missing newer pack checks, receipts, and metadata.
What Riddle caught
LilArcade PR #492 added a profile sync/check script, synced the full nine-profile Neon pack surface, and wired npm test through proof:sequencer:check-profiles.
Why it matters
Riddle Proof caught a workflow bug in the proof system itself: the reusable pack had advanced, but the app was no longer carrying the whole proof surface.
What changed
For apps consuming proof packs: generate local profile fixtures from the pack, store the source pack profile in metadata, fail test runs on drift, and run at least one representative current-target proof after syncing.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A target app that consumes reusable Riddle Proof packs should fail tests when its local proof profiles drift from the published pack surface.

Claim: A target app that consumes reusable Riddle Proof packs should fail tests when its local proof profiles drift from the published pack surface.

Bug: LilArcade had only five local Neon Riddle Proof profiles while the published Neon Step Sequencer pack had nine, and the existing local files were missing newer pack checks, receipts, and metadata.

Why normal checks missed it: The app tests and individual proof runs could still pass because they only exercised whichever local profile files happened to exist. Nothing forced the app-local profiles to stay generated from the reusable pack, so coverage could quietly lag behind the framework work.

Why this sells Riddle Proof: Riddle Proof caught a workflow bug in the proof system itself: the reusable pack had advanced, but the app was no longer carrying the whole proof surface. The fix turns that drift into a deterministic failing test.

Reusable profile seed: For apps consuming proof packs: generate local profile fixtures from the pack, store the source pack profile in metadata, fail test runs on drift, and run at least one representative current-target proof after syncing.

What the browser run checked

  • Compared the local LilArcade Neon profile set against the published neon_step_sequencer pack.
  • Found four pack profiles missing from the app-local .riddle-proof/profiles directory.
  • Found five local profiles out of sync with newer pack checks, receipts, and metadata.
  • Added a deterministic generator that maps neon-step-sequencer-* pack profiles to lilarcade-neon-* local profiles.
  • Added a check mode that fails on missing, stale, or unexpected local Neon profile files.
  • Wired npm test through proof:sequencer:check-profiles so drift becomes a normal app test failure.
  • Re-ran local Riddle Proof fast mix health and source-readiness profiles after sync.

Proof lesson

Reusable proof packs need a synchronization gate in the target app. Otherwise the pack can improve while the app keeps running stale local profiles and loses coverage without an obvious failure.

ArtifactTypeWhat it proves
Profile sync reportJSON metadata

Records the missing profiles, stale profiles, PR, merge commit, validation commands, proof metrics, and does-not-prove boundary.

Profile sync narrativeMarkdown summary

Summarizes the catch in the same compact form a reviewer can read without opening the full PR diff.

Synced fast mix health screenshotPNG screenshot

Shows the running Neon target after the synced fast current-target profile passed.

Synced fast mix health receiptJSON metadata

Records the synced fast profile run: 7 checks, 9 setup actions, source preparation ok, RMS 0.1236, peak 0.8356, clipping false, and lowLevel false.

Synced source-readiness screenshotPNG screenshot

Shows one of the profiles that was missing locally before the sync gate was added.

Synced source-readiness receiptJSON metadata

Records the newly local source-readiness proof: source helper available, preparation ok, required source states idle, and clean browser health.

Catch 19

Neon mix candidates needed a durable source handoff

Back to top
Neon mix candidates needed a durable source handoff evidence screenshot
May 24, 2026local runLilArcadeRiddle Proofhuman review
Plain-English catch card

Neon mix candidates needed a durable source handoff

A creative proof loop should separate three things: objective receipts, human or surrogate approval, and durable application.

What went wrong
The Neon ratchet loop could produce and apply a proof-backed mix candidate inside the running browser, but that was still not enough to make the change a durable, reviewable source edit.
What Riddle caught
Run 007 produced a human-review packet with status candidate_applied_for_listening_review: Monkberry Moon Delight (Tab), candidate chord -0.10, mixer action chord 0.38 -> 0.28, 6 supported candidates, 0 rejected candidates, approval mode mixing_canon_surrogate, approvedCandidateApplied true, candidateActionsAreTransient false, and ranking role review_order_only.
Why it matters
Riddle Proof did not pretend the chord reduction was artistically better.
What changed
For proof-backed creative edits: generate bounded candidates, produce a human-review packet, require explicit approval metadata before durable application, write a scoped source patch, and re-run a current-target proof to verify the app now sees the durable state.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A proof-backed Neon mix candidate should not become product state until there is an explicit applied-candidate packet, a narrow durable source patch, and a final current-target proof showing the running app sees the durable level.

Claim: A proof-backed Neon mix candidate should not become product state until there is an explicit applied-candidate packet, a narrow durable source patch, and a final current-target proof showing the running app sees the durable level.

Bug: The Neon ratchet loop could produce and apply a proof-backed mix candidate inside the running browser, but that was still not enough to make the change a durable, reviewable source edit.

Why normal checks missed it: A normal browser proof can stop once the app state says the approved candidate was applied. That misses the last-mile question: whether the accepted candidate became a narrow source-level patch, whether its approval basis stayed attached, and whether the current target still passes after the durable edit lands.

Why this sells Riddle Proof: Riddle Proof did not pretend the chord reduction was artistically better. It made the path from candidate receipt to source patch observable, constrained, reversible, and auditable.

Reusable profile seed: For proof-backed creative edits: generate bounded candidates, produce a human-review packet, require explicit approval metadata before durable application, write a scoped source patch, and re-run a current-target proof to verify the app now sees the durable state.

What the browser run checked

  • Started from an approved human-review packet rather than a transient candidate packet.
  • Required candidate_applied_for_listening_review, listen_to_applied_candidate, approvedCandidateApplied true, candidateActionsAreTransient false, approval metadata, and review_order_only ranking.
  • Rejected broad patch plans by requiring a selected song and mixProfileId before writing the durable handoff.
  • Applied the chord level 0.28 override only to Monkberry Moon Delight (Tab), leaving the broader Sheet import at chord level 0.38.
  • Ran the full LilArcade test suite and production build after the source edit.
  • Ran a local Riddle Proof current-target profile after the durable edit and verified the browser proof contract saw chordLevel 0.28 with no clipping or low-level audio window.

Proof lesson

A creative proof loop should separate three things: objective receipts, human or surrogate approval, and durable application. Riddle Proof can prove that a candidate changed a measurable mix control and stayed inside guardrails; the handoff should still say that musical taste needs listening review.

ArtifactTypeWhat it proves
Approved Neon candidate screenshotPNG screenshot

Shows the running Neon target used for the approved claim-candidate loop before source persistence.

Approved human-review packetJSON metadata

Records the explicit applied-candidate status, approval mode, chord 0.38 -> 0.28 action, review-order ranking role, and taste caveat.

Durable mix patch planJSON metadata

Shows the validated handoff from approved packet to a scoped source-file override, including the refusal boundary for non-approved packets.

Durable current-target screenshotPNG screenshot

Shows the Neon target after the durable source edit landed locally and the current-target proof still passed.

Durable current-target receiptJSON metadata

Records the final proof: selected song Monkberry Moon Delight (Tab), chordLevel 0.28, peak 0.8303, RMS 0.1234, clipping false, lowLevel false, and healthy route/layout checks.

Durable proof summaryMarkdown summary

Summarizes the post-source-edit current-target proof in a compact reviewable form.

Catch 20

Neon Step Sequencer had hidden clipping in built-in mixes

Back to top
Neon Step Sequencer had hidden clipping in built-in mixes evidence screenshot
May 24, 2026local runLilArcadeRiddle Proofaudio guardrails
Plain-English catch card

Neon Step Sequencer had hidden clipping in built-in mixes

Audio proof should separate objective guardrails from taste.

What went wrong
A bounded Neon Step Sequencer exploration sweep found objective clipping in built-in song presets even though the UI loaded, audio sources prepared, and the main Monkberry Tab proof windows were healthy.
What Riddle caught
Run 005 first exposed proof and app-contract gaps while making the exploration sweep real: arbitrary song/part states needed tempo/bar-count normalization, and saved/song snapshots preserved rhythmSynthEnabled but not bass/chord/guitar lane flags.
Why it matters
Riddle Proof caught an objective audio regression in a running app without pretending to judge taste: it found clipping, explained the weak proof layers that had to be fixed first, and ended with inspectable before/final receipts.
What changed
Medium term, Neon should become the audio proof lab for Riddle Proof: bounded current-target sweeps, interaction snapshots for edits, claim-candidate loops, app-contract diagnostics, and human-review packets that prove what changed and what stayed inside guardrails while leaving musical taste to people.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A bounded Neon current-target exploration sweep should sample multiple songs and parts, preserve lane state, render offline proof windows, and fail when objective audio guardrails such as clipping are violated.

Claim: A bounded Neon current-target exploration sweep should sample multiple songs and parts, preserve lane state, render offline proof windows, and fail when objective audio guardrails such as clipping are violated.

Bug: A bounded Neon Step Sequencer exploration sweep found objective clipping in built-in song presets even though the UI loaded, audio sources prepared, and the main Monkberry Tab proof windows were healthy.

Why normal checks missed it: A route smoke, a single selected-song proof, or a subjective listen pass could all miss this. The issue surfaced only after the proof contract swept multiple song/part states, normalized historical snapshots, rendered offline audio windows, and treated peak/headroom receipts as guardrails.

Why this sells Riddle Proof: Riddle Proof caught an objective audio regression in a running app without pretending to judge taste: it found clipping, explained the weak proof layers that had to be fixed first, and ended with inspectable before/final receipts.

Reusable profile seed: Medium term, Neon should become the audio proof lab for Riddle Proof: bounded current-target sweeps, interaction snapshots for edits, claim-candidate loops, app-contract diagnostics, and human-review packets that prove what changed and what stayed inside guardrails while leaving musical taste to people.

What the browser run checked

  • Loaded /games/drum-sequencer with the Monkberry Tab trainer target on a local dev server.
  • Verified the Neon proof contract was available and exposed runExplorationSweep.
  • Prepared audio sources for drums samples, bass/chord/guitar hybrid sources, and voice_oohs vocal source.
  • Swept 4 songs and 8 song/part entries with bounded proof windows.
  • Captured active-instrument receipts, peak/headroom receipts, clipping receipts, prioritized findings, screenshots, console health, and layout overflow.
  • Required the final profile to assert __neonProof.exploration.ok === true so product findings fail the proof, not only appear inside captured JSON.
  • Re-ran after the app-contract normalization, lane-state preservation, and targeted mix-headroom changes until the sweep passed with 8/8 entries and 0 findings.

Proof lesson

Audio proof should separate objective guardrails from taste. Riddle Proof can prove that a running app prepared sources, rendered bounded audio windows, preserved lane activity, avoided clipping, and produced a confidence map. It should not claim that the mix is artistically better without human review.

ArtifactTypeWhat it proves
Failing Neon exploration screenshotPNG screenshot

Shows the local Neon target used for the sweep that surfaced clipping findings after the app-contract gaps were fixed.

Failing Neon exploration receiptJSON metadata

Records the product_regression findings for Yakety Yak clipping and the bounded sweep evidence that made the issue deterministic.

Failing Neon exploration console captureJSON logs

Shows the catch was not caused by fatal browser console failures.

Final Neon exploration screenshotPNG screenshot

Shows the final local Neon target after the sweep closed with no prioritized findings.

Final Neon exploration receiptJSON metadata

Records the final asserted pass: 4 songs, 8 song/part entries, 8 passed entries, 0 findings, and clean browser/layout health.

Final Neon exploration summaryMarkdown summary

Summarizes the claim, evidence receipts, final peak/headroom table, and what the bounded sweep does not prove.

Catch 21

Ski Adventure touch input landed half a player width off

Back to top
Ski Adventure touch input landed half a player width off evidence screenshot
May 20, 2026< $0.01LilArcademobile inputgameplay geometry
Plain-English catch card

Ski Adventure touch input landed half a player width off

Input proofs should check geometry, not only movement.

What went wrong
Ski Adventure responded to touch movement, but the skier landed one half-width left of the finger because the touch handler subtracted half the player width before CSS translateX(-50%) applied the same offset visually.
What Riddle caught
Initial phone job job_7e220c74 proved live trees existed, then dragged on .game-area from ratio 0.5,0.82 to 0.8,0.82.
Why it matters
Riddle Proof caught a subtle mobile gameplay-feel bug that a clickability or movement smoke would likely miss: the player moved, but under the user finger it felt wrong.
What changed
For mobile games: prove the moving world is live, dispatch a real touch gesture, read both intended input geometry and rendered actor geometry, set a tight alignment threshold, and repeat on phone/tablet viewports.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A mobile Ski Adventure drag should place the visual skier center at the finger target, not merely move the skier in the right direction.

Claim: A mobile Ski Adventure drag should place the visual skier center at the finger target, not merely move the skier in the right direction.

Bug: Ski Adventure responded to touch movement, but the skier landed one half-width left of the finger because the touch handler subtracted half the player width before CSS translateX(-50%) applied the same offset visually.

Why normal checks missed it: A basic mobile smoke could prove the skier moved and the game stayed playable. The regression only appeared when the proof compared the drag target against the rendered player center after proving live obstacles existed.

Why this sells Riddle Proof: Riddle Proof caught a subtle mobile gameplay-feel bug that a clickability or movement smoke would likely miss: the player moved, but under the user finger it felt wrong.

Reusable profile seed: For mobile games: prove the moving world is live, dispatch a real touch gesture, read both intended input geometry and rendered actor geometry, set a tight alignment threshold, and repeat on phone/tablet viewports.

What the browser run checked

  • Loaded Ski Adventure and proved the gameplay scene had live randomized trees before isolating the input contract.
  • Dispatched a real touch drag on .game-area from the center lane toward the right lane.
  • Measured targetX, rendered player center, playerX, player width, and alignment error after the gesture.
  • Failed production when alignmentError was about half the player width even though movement happened.
  • Re-ran the fixed production matrix across desktop, phone, iPad Mini, and iPad.
  • Required touch-aligned phone/tablet evidence, 0px overflow, no page errors, no fatal console errors, and no console warnings.

Proof lesson

Input proofs should check geometry, not only movement. For touch games, prove live gameplay first, dispatch a real gesture, then compare intended target coordinates to the rendered actor center.

ArtifactTypeWhat it proves
Failing Ski Adventure touch alignment screenshotPNG screenshot

Shows the live mobile gameplay state used for the failing touch-alignment measurement.

Failing Ski Adventure touch alignment receipt job_7e220c74JSON metadata

Records targetX, visualCenterX, playerX, playerWidth, and the half-width alignment error from the failing production phone proof.

Failing Ski Adventure touch alignment console captureJSON logs

Preserves browser-health evidence from the failing production proof.

Final Ski Adventure touch alignment screenshotPNG screenshot

Shows the fixed production phone run after the skier center aligned with the touch target.

Final Ski Adventure touch alignment receipt job_f9b9b6ceJSON metadata

Records the fixed matrix proof with phone and tablet alignment errors near zero plus clean layout and browser health.

Final Ski Adventure touch alignment console captureJSON logs

Preserves browser-health evidence from the final production proof.

Catch 22

Coin Clicker dashboard milestone ETA used wrong source of truth

Back to top
Coin Clicker dashboard milestone ETA used wrong source of truth evidence screenshot
May 20, 2026< $0.01LilArcadedashboard mathstate seeding
Plain-English catch card

Coin Clicker dashboard milestone ETA used wrong source of truth

Dashboards need source-of-truth proofs, not only visible-label proofs.

What went wrong
Coin Clicker Math Dashboard selected the next total-earned milestone correctly, but computed the ETA from current spendable coins instead of lifetime totalCoins after the player had spent currency on upgrades.
What Riddle caught
Initial desktop job job_7e208368 seeded a saved Coin Clicker state with 100000 current coins, 250000 lifetime totalCoins, and spent upgrades.
Why it matters
Riddle Proof caught a source-of-truth math bug in a healthy-looking dashboard by seeding realistic spent-state data and checking the arithmetic contract behind the displayed copy.
What changed
For dashboards: seed persisted state with intentionally divergent source metrics, assert both the selected row/label and the derived calculation, then prove reset or cleanup so stale seed data cannot mask the result.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
The Coin Clicker dashboard should compute next total-earned milestone ETA from lifetime totalCoins, even when current spendable coins are lower because the player bought upgrades.

Claim: The Coin Clicker dashboard should compute next total-earned milestone ETA from lifetime totalCoins, even when current spendable coins are lower because the player bought upgrades.

Bug: Coin Clicker Math Dashboard selected the next total-earned milestone correctly, but computed the ETA from current spendable coins instead of lifetime totalCoins after the player had spent currency on upgrades.

Why normal checks missed it: The dashboard looked healthy: the route loaded, upgrades rendered, and the next milestone was the right 1.00M target. The bug only surfaced after seeding realistic prior-spend state and checking the arithmetic behind the ETA text.

Why this sells Riddle Proof: Riddle Proof caught a source-of-truth math bug in a healthy-looking dashboard by seeding realistic spent-state data and checking the arithmetic contract behind the displayed copy.

Reusable profile seed: For dashboards: seed persisted state with intentionally divergent source metrics, assert both the selected row/label and the derived calculation, then prove reset or cleanup so stale seed data cannot mask the result.

What the browser run checked

  • Seeded persisted Coin Clicker state with current coins lower than lifetime totalCoins after upgrade spending.
  • Loaded the Math Dashboard and required the next milestone to be the total-earned 1.00M target.
  • Asserted the displayed ETA matched the remaining lifetime-total-earned distance rather than spendable coins.
  • Checked dashboard what-if values and reset-cleared receipts so the seeded state did not leave stale UI behind.
  • Re-ran the fixed production matrix across desktop, phone, iPad Mini, and iPad.
  • Required 0px overflow, no page errors, no fatal console errors, and no console warnings.

Proof lesson

Dashboards need source-of-truth proofs, not only visible-label proofs. Seed realistic persisted state, prove the chosen milestone, and assert the displayed estimate uses the same metric as the milestone definition.

ArtifactTypeWhat it proves
Failing Coin Clicker milestone ETA screenshotPNG screenshot

Shows the seeded dashboard state where the visible next milestone was correct but the ETA source was wrong.

Failing Coin Clicker milestone ETA receipt job_7e208368JSON metadata

Records the failing ETA assertion, seeded spent-state values, and dashboard evidence from production.

Failing Coin Clicker milestone ETA console captureJSON logs

Preserves browser-health evidence from the failing production proof.

Final Coin Clicker milestone ETA screenshotPNG screenshot

Shows the fixed production dashboard displaying the total-earned ETA from the correct source.

Final Coin Clicker milestone ETA receipt job_7e5005e1JSON metadata

Records the fixed matrix proof across desktop, phone, iPad Mini, and iPad with the corrected ETA and reset-cleared receipts.

Final Coin Clicker milestone ETA console captureJSON logs

Preserves browser-health evidence from the final production proof.

Catch 23

Dashboard retry copy had no retry button

Back to top
Dashboard retry copy had no retry button evidence screenshot
May 19, 2026< $0.01Riddle siteDashboardactionable recovery
Plain-English catch card

Dashboard retry copy had no retry button

Recovery copy should include a local recovery action when the user can retry without leaving the page.

What went wrong
The authenticated Dashboard correctly stopped showing a failed recent-jobs load as an empty account, but the recovery copy told users to try again without rendering a Retry recent jobs button.
What Riddle caught
Initial production job job_246afd37 loaded /dashboard/ across desktop, phone, iPad Mini, and iPad with valid balance and API-key mocks while /v1/jobs?limit=10 returned HTTP 503.
Why it matters
Riddle Proof caught a practical recovery UX gap: the dashboard no longer lied about failed history, but it still left users without the local retry action the copy promised.
What changed
For dashboard recovery states: fail one list endpoint while adjacent account data succeeds, require honest unavailable copy plus an actionable retry control, click the control, and assert stale error/empty/backend text disappears after recovery.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Dashboard recent-jobs load failure should render an actionable retry affordance, and clicking it should recover the list without disturbing balance or API-key data.

Claim: A Dashboard recent-jobs load failure should render an actionable retry affordance, and clicking it should recover the list without disturbing balance or API-key data.

Bug: The authenticated Dashboard correctly stopped showing a failed recent-jobs load as an empty account, but the recovery copy told users to try again without rendering a Retry recent jobs button.

Why normal checks missed it: The page looked broadly healthy: auth worked, balance loaded, API keys loaded, the recent-jobs section showed an honest unavailable message, raw backend details stayed hidden, layout stayed stable, and browser health was clean. The issue only surfaced when the proof required the recovery state to be actionable, not merely truthful.

Why this sells Riddle Proof: Riddle Proof caught a practical recovery UX gap: the dashboard no longer lied about failed history, but it still left users without the local retry action the copy promised.

Reusable profile seed: For dashboard recovery states: fail one list endpoint while adjacent account data succeeds, require honest unavailable copy plus an actionable retry control, click the control, and assert stale error/empty/backend text disappears after recovery.

What the browser run checked

  • Loaded authenticated Dashboard across desktop, phone, iPad Mini, and iPad.
  • Mocked billing balance as successful with 2h 55m and one active job.
  • Mocked /v1/jobs?limit=10 as HTTP 503 with a synthetic recent-jobs outage.
  • Mocked API keys as successful with Riddle Proof v556 existing key.
  • Required Recent jobs unavailable. Please try again., a visible .recent-jobs-section .jobs-retry-button, and Retry recent jobs in the recovery surface.
  • Rejected No jobs yet, raw backend message/code, [object Object], Application error, horizontal overflow, warnings, page errors, and fatal console errors.
  • Re-ran the fixed matrix contract on final production, then ran a focused click-through proof where the first jobs response failed and the retry response returned job_v556_retry_recovered.

Proof lesson

Recovery copy should include a local recovery action when the user can retry without leaving the page. Honest error text is only half the contract for account dashboards.

ArtifactTypeWhat it proves
Failing Dashboard retry affordance screenshotPNG screenshot

Shows production rendering unavailable recent-jobs copy without the Retry recent jobs affordance.

Failing Dashboard retry affordance receipt job_246afd37JSON metadata

Records the missing retry-button assertions across desktop, phone, iPad Mini, and iPad while balance and API-key evidence stayed healthy.

Failing Dashboard retry affordance console captureJSON logs

Preserves browser-health evidence from the failing proof.

Final Dashboard retry affordance screenshotPNG screenshot

Shows final production rendering the retry affordance in the unavailable recent-jobs state.

Final Dashboard retry affordance receipt job_1b771150JSON metadata

Shows the fixed recovery state passed the four-viewport matrix contract with the retry button visible.

Retry click before screenshotPNG screenshot

Captures the retryable unavailable state immediately before the scripted click.

Retry click after screenshotPNG screenshot

Shows the recovered recent-jobs table after clicking Retry recent jobs.

Retry click receipt job_8e924e2fJSON metadata

Records the fail-then-success mock sequence, click action, recovered row, stale-copy absence, and clean browser/layout checks.

Catch 24

Dashboard recent jobs failure looked like no jobs

Back to top
Dashboard recent jobs failure looked like no jobs evidence screenshot
May 19, 2026< $0.01Riddle siteDashboardaccount-state honesty
Plain-English catch card

Dashboard recent jobs failure looked like no jobs

List-load failures are not empty states.

What went wrong
The authenticated Dashboard rendered the empty-account recent-jobs state after the recent jobs endpoint failed, even though balance and API-key data loaded normally.
What Riddle caught
Initial production job job_ea46646e loaded /dashboard/ across desktop, phone, iPad Mini, and iPad with valid balance and API-key mocks while /v1/jobs?limit=10 returned HTTP 503.
Why it matters
Riddle Proof caught an account-state lie in a Dashboard recent-jobs surface: a failed history load was shown as an empty account while other account data looked healthy.
What changed
For dashboard list sections: fail one list endpoint while adjacent account data succeeds, require an unavailable/retry state, forbid empty-state copy and backend leakage, and prove responsive/browser health across the viewport matrix.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Dashboard recent-jobs load failure should show unavailable-history recovery, not the empty no-jobs account state, while balance and API-key sections remain usable.

Claim: A Dashboard recent-jobs load failure should show unavailable-history recovery, not the empty no-jobs account state, while balance and API-key sections remain usable.

Bug: The authenticated Dashboard rendered the empty-account recent-jobs state after the recent jobs endpoint failed, even though balance and API-key data loaded normally.

Why normal checks missed it: Most of the dashboard looked healthy: auth worked, balance loaded, API keys loaded, layout stayed stable, and the failed jobs request was just one section of a larger page. A shallow dashboard smoke could see useful account data and miss that a service outage was represented as no user activity.

Why this sells Riddle Proof: Riddle Proof caught an account-state lie in a Dashboard recent-jobs surface: a failed history load was shown as an empty account while other account data looked healthy.

Reusable profile seed: For dashboard list sections: fail one list endpoint while adjacent account data succeeds, require an unavailable/retry state, forbid empty-state copy and backend leakage, and prove responsive/browser health across the viewport matrix.

What the browser run checked

  • Loaded authenticated Dashboard across desktop, phone, iPad Mini, and iPad.
  • Mocked billing balance as successful with 2h 15m and one active job.
  • Mocked /v1/jobs?limit=10 as HTTP 503 with a synthetic unavailable-history response.
  • Mocked API keys as successful with Riddle Proof v553 existing key.
  • Required Recent jobs unavailable. Please try again., API-key continuity, and one .recent-jobs-section .dashboard-inline-error.
  • Rejected No jobs yet, raw backend message/code, [object Object], Application error, horizontal overflow, warnings, page errors, and fatal console errors.
  • Re-ran the same contract on final production after recentJobsError separated list-load failure from empty-list state.

Proof lesson

List-load failures are not empty states. Account dashboards should distinguish unavailable history from an empty account while preserving independent account data and keeping raw backend details out of the UI.

ArtifactTypeWhat it proves
Failing Dashboard recent-jobs load recoveryPNG screenshot

Shows production rendering No jobs yet after the recent-jobs endpoint failed while balance and API keys still loaded.

Failing Dashboard recent-jobs receipt job_ea46646eJSON metadata

Records the 503 recent-jobs mock, visible empty-state copy, missing unavailable copy, healthy adjacent account data, and clean layout evidence.

Failing Dashboard recent-jobs console captureJSON logs

Preserves browser-health evidence from the failing recent-jobs load proof.

Final Dashboard recent-jobs load recoveryPNG screenshot

Shows final production rendering the unavailable recent-jobs state while balance and API keys remain visible.

Final Dashboard recent-jobs receipt job_24bf862fJSON metadata

Shows final production passed the recent-jobs load-failure contract across desktop, phone, iPad Mini, and iPad.

Final Dashboard recent-jobs console captureJSON logs

Shows the fixed Dashboard stayed warning-clean and fatal-console-clean while exercising the failed recent-jobs endpoint.

Catch 25

Playground screenshot hid secondary terminal evidence

Back to top
Playground screenshot hid secondary terminal evidence evidence screenshot
May 19, 2026< $0.01Riddle sitePlaygroundevidence honesty
Plain-English catch card

Playground screenshot hid secondary terminal evidence

No screenshots does not mean no proof evidence.

What went wrong
The authenticated Playground Screenshot flow received a terminal completed_error JSON response with console and HAR evidence but no screenshot, then treated it like an async job receipt and kept waiting instead of rendering the returned evidence.
What Riddle caught
Initial production job job_96fb3328 loaded /playground/ across desktop, phone, iPad Mini, and iPad, submitted sync:true Screenshot requests 4/4 with console and HAR included, and received terminal completed_error JSON with job_rp551_sync_screenshot_secondary_only, one console log, one warning, one HAR entry, and no screenshot.
Why it matters
Riddle Proof caught secondary evidence being hidden when a terminal Screenshot response returned no image but did return console and HAR artifacts.
What changed
For API consoles with optional screenshot evidence: return terminal JSON with no screenshots but useful console/HAR evidence, require immediate receipt rendering and secondary evidence reviewability, and assert selector polarity plus browser health in the same viewport matrix.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Playground sync Screenshot response with terminal completed_error status, no screenshot, and secondary console/HAR evidence should render an error receipt immediately instead of falling into async polling.

Claim: A Playground sync Screenshot response with terminal completed_error status, no screenshot, and secondary console/HAR evidence should render an error receipt immediately instead of falling into async polling.

Bug: The authenticated Playground Screenshot flow received a terminal completed_error JSON response with console and HAR evidence but no screenshot, then treated it like an async job receipt and kept waiting instead of rendering the returned evidence.

Why normal checks missed it: The request itself was correct: auth worked, Screenshot mode submitted sync:true with console and HAR included, the mock was hit exactly once per viewport, and the response carried a job ID plus useful secondary artifacts. A shallow API or click smoke could stop at the successful submission and miss that the visible receipt never appeared.

Why this sells Riddle Proof: Riddle Proof caught secondary evidence being hidden when a terminal Screenshot response returned no image but did return console and HAR artifacts.

Reusable profile seed: For API consoles with optional screenshot evidence: return terminal JSON with no screenshots but useful console/HAR evidence, require immediate receipt rendering and secondary evidence reviewability, and assert selector polarity plus browser health in the same viewport matrix.

What the browser run checked

  • Loaded authenticated Playground Screenshot mode across desktop, phone, iPad Mini, and iPad.
  • Submitted /v1/run with sync:true, include:["console","har"], and URL https://example.com/rp551-screenshot-secondary-only, then capped the mocked request at exactly four viewport hits.
  • Returned HTTP 200 JSON with status completed_error, success:false, job_rp551_sync_screenshot_secondary_only, one console log, one console warning, one HAR request, billing metadata, raw response data, and no screenshot.
  • Required Result, Error, exact service message, job ID, partial results available, No screenshots captured, Console Output, Network HAR, HAR URL evidence, and Raw Response (Debug).
  • Rejected Success, empty console/HAR copy, Failed at step, Application error, screenshot items, loading state, horizontal overflow, console warnings, page errors, and fatal console errors.
  • Re-ran the same contract on final production after terminal Screenshot JSON was routed through the shared sync result renderer.

Proof lesson

No screenshots does not mean no proof evidence. Terminal receipts should render returned console, HAR, raw response, job ID, and service error evidence immediately, even when image evidence is absent.

ArtifactTypeWhat it proves
Failing Screenshot secondary-only terminal receiptPNG screenshot

Shows production after the terminal response was submitted but before any useful result receipt rendered.

Failing Screenshot secondary-only receipt job_96fb3328JSON metadata

Records the correct sync Screenshot request body, four mock hits, terminal completed_error response, missing .result-state, and clean browser/layout evidence.

Failing Screenshot secondary-only console captureJSON logs

Preserves browser-health evidence from the failing terminal Screenshot proof.

Final Screenshot secondary-only terminal receiptPNG screenshot

Shows final production rendering the error receipt, no-screenshot state, console evidence, and HAR evidence after the fix.

Final Screenshot secondary-only receipt job_80b21e0fJSON metadata

Shows final production passed the terminal secondary-evidence contract across desktop, phone, iPad Mini, and iPad.

Final Screenshot secondary-only console captureJSON logs

Shows the fixed Playground stayed warning-clean and fatal-console-clean while exercising terminal Screenshot JSON with no image evidence.

Catch 26

Playground screenshot leaked malformed success body

Back to top
Playground screenshot leaked malformed success body evidence screenshot
May 19, 2026< $0.01Riddle sitePlaygroundrecovery honesty
Plain-English catch card

Playground screenshot leaked malformed success body

Handled action recovery needs content proof, not only an error box.

What went wrong
The authenticated Playground Screenshot flow handled an HTTP 200 response with malformed JSON by exposing parser-specific failure text and the raw malformed response body to the user.
What Riddle caught
Initial production job job_cccf7a47 loaded /playground/ across desktop, phone, iPad Mini, and iPad, submitted sync:true Screenshot requests 4/4 with console and HAR included, and mocked HTTP 200 application/json with a malformed body.
Why it matters
Riddle Proof caught raw-response leakage in a malformed-success recovery that otherwise looked visibly handled and browser-clean.
What changed
For API consoles and action forms: return HTTP 200 with malformed JSON, require action-specific generic recovery, forbid parser/raw/object leaks and contradictory success UI, and keep selector polarity plus browser health in the same proof.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Playground sync Screenshot response with HTTP 200 but malformed JSON should render one generic screenshot failure, not parser-specific or raw response text, while avoiding success states and keeping browser evidence clean.

Claim: A Playground sync Screenshot response with HTTP 200 but malformed JSON should render one generic screenshot failure, not parser-specific or raw response text, while avoiding success states and keeping browser evidence clean.

Bug: The authenticated Playground Screenshot flow handled an HTTP 200 response with malformed JSON by exposing parser-specific failure text and the raw malformed response body to the user.

Why normal checks missed it: Most of the flow looked healthy: authentication worked, Screenshot mode submitted the right sync request, the error container appeared, mock counts matched, browser health was clean, and layout stayed stable. A shallow recovery check would see a handled error state and miss that the recovery copy leaked internal response details.

Why this sells Riddle Proof: Riddle Proof caught raw-response leakage in a malformed-success recovery that otherwise looked visibly handled and browser-clean.

Reusable profile seed: For API consoles and action forms: return HTTP 200 with malformed JSON, require action-specific generic recovery, forbid parser/raw/object leaks and contradictory success UI, and keep selector polarity plus browser health in the same proof.

What the browser run checked

  • Loaded authenticated Playground Screenshot mode across desktop, phone, iPad Mini, and iPad.
  • Submitted /v1/run with sync:true, include:["console","har"], and URL https://example.com/rp546-malformed-screenshot-success, then capped the mocked request at exactly four viewport hits.
  • Returned HTTP 200 application/json with a syntactically malformed body.
  • Required Failed to take screenshot. Please try again. and exactly one .error-state .error-message.
  • Rejected raw malformed body text, Failed to parse JSON response, SyntaxError, [object Object], Success, No screenshots captured, Application error, result/loading/success selectors, horizontal overflow, console warnings, page errors, and fatal console errors.
  • Re-ran the same contract on final production after the fallback-aware sync JSON parse fix.

Proof lesson

Handled action recovery needs content proof, not only an error box. Successful HTTP status with malformed JSON should render a generic action-specific fallback while rejecting raw response text, parser text, contradictory success states, and browser noise in the same viewport matrix.

ArtifactTypeWhat it proves
Failing Screenshot malformed-success recoveryPNG screenshot

Shows the Playground error state leaking parser text and the raw malformed JSON body instead of the generic screenshot fallback.

Failing Screenshot malformed-success receipt job_cccf7a47JSON metadata

Records the correct sync Screenshot request body, four mock hits, missing generic fallback, visible raw response text, and clean browser/layout evidence.

Failing Screenshot malformed-success console captureJSON logs

Preserves browser-health evidence from the failing malformed-success recovery proof.

Final Screenshot malformed-success recoveryPNG screenshot

Shows final production rendering the generic screenshot fallback after the malformed sync JSON response.

Final Screenshot malformed-success receipt job_2ec55527JSON metadata

Shows final production passed the recovery-honesty contract across desktop, phone, iPad Mini, and iPad.

Final Screenshot malformed-success console captureJSON logs

Shows the fixed Playground stayed warning-clean and fatal-console-clean while exercising the malformed sync Screenshot response.

Catch 27

Playground timeout hid partial evidence

Back to top
Playground timeout hid partial evidence evidence screenshot
May 18, 2026< $0.01Riddle sitePlaygroundresult honesty
Plain-English catch card

Playground timeout hid partial evidence

Terminal timeout receipts need the same artifact-honesty contract as terminal errors.

What went wrong
The authenticated Playground sync Workflow flow rendered a terminal timeout and preserved screenshot, console, HAR, raw response, and billing evidence, but failed to tell the user that the returned evidence was partial.
What Riddle caught
Initial production job job_d7a29899 loaded /playground/ across desktop, phone, iPad Mini, and iPad, submitted sync:true Workflow requests 4/4 with screenshots, console, and HAR included, and rendered Timed Out, the exact timeout message, screenshot evidence, console evidence, HAR evidence, and raw response debug.
Why it matters
Riddle Proof caught incomplete evidence-status copy in a timeout receipt that otherwise looked healthy: the debug evidence was preserved, but the user was not told it was partial.
What changed
For API consoles with timeout states: return terminal timeout JSON with partial evidence in a 200 response, require timeout status honesty and partial-results copy, preserve each artifact class, and assert positive/negative selectors in the same viewport matrix.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Playground sync Workflow response with terminal completed_timeout status should render as a timeout with partial evidence, not as a bare timeout receipt, while preserving screenshots, console logs, HAR, billing, and raw response evidence.

Claim: A Playground sync Workflow response with terminal completed_timeout status should render as a timeout with partial evidence, not as a bare timeout receipt, while preserving screenshots, console logs, HAR, billing, and raw response evidence.

Bug: The authenticated Playground sync Workflow flow rendered a terminal timeout and preserved screenshot, console, HAR, raw response, and billing evidence, but failed to tell the user that the returned evidence was partial.

Why normal checks missed it: The receipt looked mostly healthy: the route loaded, auth worked, the Workflow request body was correct, Timed Out rendered, the timeout message rendered, artifact sections were populated, and browser health was clean. A shallow timeout check would stop there and miss that the evidence status copy was incomplete.

Why this sells Riddle Proof: Riddle Proof caught incomplete evidence-status copy in a timeout receipt that otherwise looked healthy: the debug evidence was preserved, but the user was not told it was partial.

Reusable profile seed: For API consoles with timeout states: return terminal timeout JSON with partial evidence in a 200 response, require timeout status honesty and partial-results copy, preserve each artifact class, and assert positive/negative selectors in the same viewport matrix.

What the browser run checked

  • Loaded authenticated Playground Workflow mode across desktop, phone, iPad Mini, and iPad.
  • Submitted /v1/run with sync:true and include:["screenshots","console","har"], then capped the mocked request at exactly four viewport hits.
  • Returned HTTP 200 JSON with status completed_timeout, success:false, a timeout message, one screenshot, one console log, one HAR request, billing metadata, and raw response data.
  • Required Timed Out, the exact timeout message, partial results available, screenshot evidence, console evidence, HAR evidence, and raw response debug.
  • Rejected Success, Error:, empty evidence states, Application error, wrong success/error/timeout selector counts, loading state, horizontal overflow, console warnings, page errors, and fatal console errors.
  • Re-ran the same contract on static Preview and final production after the timeout partial-results fix.

Proof lesson

Terminal timeout receipts need the same artifact-honesty contract as terminal errors. If screenshots, console logs, or HAR entries survive a timeout, the UI should label them as partial results while keeping success/error states and browser health honest.

ArtifactTypeWhat it proves
Failing sync Workflow timeout screenshotPNG screenshot

Shows the Playground rendering a timeout with screenshot, console, and HAR evidence while omitting partial-results copy.

Failing sync Workflow timeout receipt job_d7a29899JSON metadata

Records the completed_timeout response, correct sync request body, rendered artifacts, missing partial-results copy, and timeout/success/error selector counts.

Failing sync Workflow timeout console captureJSON logs

Preserves browser-health evidence from the failing Playground timeout receipt proof.

Final sync Workflow timeout screenshotPNG screenshot

Shows the final production Playground rendering Timed Out, partial-results copy, screenshot, console, HAR, and raw response evidence.

Final sync Workflow timeout receipt job_5ef41407JSON metadata

Shows final production passed the full terminal-timeout-with-partial-evidence contract across all four viewports.

Final sync Workflow timeout console captureJSON logs

Shows the fixed Playground stayed warning-clean and fatal-console-clean while exercising the sync Workflow timeout path.

Catch 28

Playground sync terminal error looked successful

Back to top
Playground sync terminal error looked successful evidence screenshot
May 18, 2026< $0.01Riddle sitePlaygroundresult honesty
Plain-English catch card

Playground sync terminal error looked successful

Artifact preservation and result honesty are separate contracts.

What went wrong
The authenticated Playground sync Script flow preserved screenshot, console, HAR, raw response, and billing evidence from a terminal completed_error response, but labeled the run Success and hid the service error message.
What Riddle caught
Initial production job job_994d843b loaded /playground/ across desktop, phone, iPad Mini, and iPad, submitted sync:true Script requests 4/4 with screenshots, console, and HAR included, and rendered the returned screenshot, console log/error, HAR request, raw response, and billing metadata.
Why it matters
Riddle Proof caught a semantic receipt lie in a proof surface that otherwise looked artifact-rich: the debug evidence was there, but the user-facing result said the failed job succeeded.
What changed
For sync API consoles: return terminal error JSON with partial evidence in a 200 response, require status honesty and service error copy, preserve each artifact class, and assert positive/negative selectors in the same viewport matrix.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Playground sync Script response with terminal completed_error status should render as an error with partial evidence, not as a successful run, while preserving screenshots, console logs, HAR, billing, and raw response evidence.

Claim: A Playground sync Script response with terminal completed_error status should render as an error with partial evidence, not as a successful run, while preserving screenshots, console logs, HAR, billing, and raw response evidence.

Bug: The authenticated Playground sync Script flow preserved screenshot, console, HAR, raw response, and billing evidence from a terminal completed_error response, but labeled the run Success and hid the service error message.

Why normal checks missed it: The proof surface looked rich: the route loaded, the sync request body was correct, the screenshot appeared, console and HAR evidence expanded, and billing metadata rendered. A shallow artifact check would see plenty of useful evidence and miss that the top-level receipt contradicted the terminal backend status.

Why this sells Riddle Proof: Riddle Proof caught a semantic receipt lie in a proof surface that otherwise looked artifact-rich: the debug evidence was there, but the user-facing result said the failed job succeeded.

Reusable profile seed: For sync API consoles: return terminal error JSON with partial evidence in a 200 response, require status honesty and service error copy, preserve each artifact class, and assert positive/negative selectors in the same viewport matrix.

What the browser run checked

  • Loaded authenticated Playground Script mode across desktop, phone, iPad Mini, and iPad.
  • Submitted /v1/run with sync:true and include:["screenshots","console","har"], then capped the mocked request at exactly four viewport hits.
  • Returned HTTP 200 JSON with status completed_error, success:false, one screenshot, one console log, one console error, one HAR request, billing metadata, and raw response data.
  • Required Error, the exact service error, partial results available, screenshot evidence, console evidence, HAR evidence, and raw response debug.
  • Rejected Success, empty evidence states, Failed at step, Application error, wrong success/error selector counts, loading state, horizontal overflow, console warnings, page errors, and fatal console errors.
  • Re-ran the same contract on static Preview and final production after the sync terminal-state fix.

Proof lesson

Artifact preservation and result honesty are separate contracts. Proof profiles for sync terminal JSON should assert the visible status, service error copy, partial-results copy, success/error selectors, artifact sections, and browser health together.

ArtifactTypeWhat it proves
Failing sync terminal screenshotPNG screenshot

Shows the Playground preserving screenshot, console, and HAR evidence while labeling the terminal error as Success.

Failing sync terminal receipt job_994d843bJSON metadata

Records the completed_error response, correct sync request body, rendered artifacts, missing service error copy, missing partial-results copy, and wrong success/error selectors.

Failing sync terminal console captureJSON logs

Preserves browser-health evidence from the failing Playground sync terminal proof.

Final sync terminal screenshotPNG screenshot

Shows the final production Playground rendering Error, the service message, partial-results copy, screenshot, console, HAR, and raw response evidence.

Final sync terminal receipt job_4bc68ef7JSON metadata

Shows final production passed the full terminal-error-with-partial-evidence contract across all four viewports.

Final sync terminal console captureJSON logs

Shows the fixed Playground stayed warning-clean and fatal-console-clean while exercising the sync terminal error path.

Catch 29

Billing history failure looked like no transactions

Back to top
Billing history failure looked like no transactions evidence screenshot
May 18, 2026< $0.01Riddle siteBillingload recovery
Plain-English catch card

Billing history failure looked like no transactions

List-load recovery profiles should prove that a failed optional list is not rendered as an empty list.

What went wrong
The authenticated Billing page handled balance and auto-recharge correctly when transaction history failed, but translated the failed history request into the empty-account state No transactions yet.
What Riddle caught
Initial corrected production job job_e99911a5 mocked /api/billing/history to return 503 while balance and auto-recharge loaded normally across desktop, phone, iPad Mini, and iPad.
Why it matters
Riddle Proof caught an account-state lie in a Billing surface: the page looked usable, but a failed data load was being reported as if the user had no transactions.
What changed
For optional list data: mock sibling data as healthy, fail the target list, require explicit unavailable copy, reject empty-state copy, reject raw backend text, and keep independent page sections visible.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A Billing transaction-history load failure should be presented as a load failure, not as an empty account, while the rest of Billing stays usable and browser-clean.

Claim: A Billing transaction-history load failure should be presented as a load failure, not as an empty account, while the rest of Billing stays usable and browser-clean.

Bug: The authenticated Billing page handled balance and auto-recharge correctly when transaction history failed, but translated the failed history request into the empty-account state No transactions yet.

Why normal checks missed it: Most of the page looked healthy: the user was signed in, balance rendered, active job copy rendered, purchase controls loaded, auto-recharge loaded, and the history section stayed visible. The bug was a small but important state distinction inside one optional list.

Why this sells Riddle Proof: Riddle Proof caught an account-state lie in a Billing surface: the page looked usable, but a failed data load was being reported as if the user had no transactions.

Reusable profile seed: For optional list data: mock sibling data as healthy, fail the target list, require explicit unavailable copy, reject empty-state copy, reject raw backend text, and keep independent page sections visible.

What the browser run checked

  • Seeded Cognito-style authenticated storage for Billing.
  • Mocked balance and auto-recharge as healthy while making /api/billing/history return 503.
  • Required Browser Time Balance, active job copy, Auto-Recharge, Promo Code, and Transaction History to remain visible.
  • Required Transaction history unavailable. Please try again. and one .transaction-history .error-message.
  • Rejected No transactions yet., No billable transactions yet., raw backend text, internal error tokens, object leakage, and application error copy.
  • Asserted the history table had zero rows for the failed-load state, but did not allow that zero-row state to become the empty-account message.
  • Proved the same contract across desktop, phone, iPad Mini, and iPad with overflow, warning, page-error, and fatal-console health checks.

Proof lesson

List-load recovery profiles should prove that a failed optional list is not rendered as an empty list. The contract needs explicit failure copy, empty-state absence, raw backend text absence, and browser/layout health in the same run.

ArtifactTypeWhat it proves
Failing Billing history screenshotPNG screenshot

Shows the Billing page looked mostly healthy while the failed history request was presented as No transactions yet.

Failing Billing history receipt job_e99911a5JSON metadata

Records the mocked 503 history response, passing sibling Billing sections, missing unavailable message, visible empty-state copy, and absent history error element.

Failing Billing history console captureJSON logs

Shows browser console evidence from the failing proof, with the mocked 503 resource noise treated as expected.

Final Billing history screenshotPNG screenshot

Shows the final production Billing page with the explicit history-unavailable recovery state.

Final Billing history receipt job_59ebe466JSON metadata

Shows final production passed the full load-failure contract across all four viewports.

Final Billing history console captureJSON logs

Shows no warnings and only the expected mocked history 503 resource events remained.

Catch 30

Dashboard API-key transport failure logged fatal

Back to top
Dashboard API-key transport failure logged fatal evidence screenshot
May 18, 2026< $0.01Riddle sitedashboardnetwork recovery
Plain-English catch card

Dashboard API-key transport failure logged fatal

Transport-failure profiles should distinguish expected browser resource noise from application-level fatal logging, then prove the visible recovery state and browser health together.

What went wrong
The Dashboard API-key revoke flow handled a fetch-level transport failure visibly, but still emitted an app-level fatal console error for the expected recovery path.
What Riddle caught
Initial production job job_e27b8e47 used Riddle Proof abort mocks to make the API-key DELETE fail at the fetch layer across desktop, phone, iPad Mini, and iPad.
Why it matters
Riddle Proof caught operational-health debt in a transport failure that looked visibly handled: the UI recovered, but browser evidence still showed an app-level fatal error.
What changed
For destructive network actions: include both cancel and accept paths, use fetch-level abort mocks for transport failures, cap destructive request counts, require preserved state, and separate expected resource errors from application console failures.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A dropped API-key revoke request should preserve the active key row, show human recovery, count destructive confirm/cancel events, allow only the expected mocked resource failure, and emit no app-level fatal console errors.

Claim: A dropped API-key revoke request should preserve the active key row, show human recovery, count destructive confirm/cancel events, allow only the expected mocked resource failure, and emit no app-level fatal console errors.

Bug: The Dashboard API-key revoke flow handled a fetch-level transport failure visibly, but still emitted an app-level fatal console error for the expected recovery path.

Why normal checks missed it: The visible UI looked safe: canceling the confirm dialog preserved the active key, accepting the dialog kept the key active after the failed request, and the user saw Failed to revoke API key. The bug was hidden in browser evidence: the app logged the handled TypeError as fatal while the mocked net::ERR_FAILED resource event was expected.

Why this sells Riddle Proof: Riddle Proof caught operational-health debt in a transport failure that looked visibly handled: the UI recovered, but browser evidence still showed an app-level fatal error.

Reusable profile seed: For destructive network actions: include both cancel and accept paths, use fetch-level abort mocks for transport failures, cap destructive request counts, require preserved state, and separate expected resource errors from application console failures.

What the browser run checked

  • Seeded Cognito-style authenticated storage for the Dashboard.
  • Mocked balance, jobs, and API-key list responses while aborting the DELETE request at the network layer.
  • Dismissed the destructive confirm dialog first and proved no failure copy or revoked state appeared.
  • Accepted the confirm dialog next and proved the visible transport-failure recovery, preserved active key row, exact dialog counts, and exact DELETE hit count.
  • Asserted no object leakage, no application error, stable row/button counts, horizontal overflow, warning hygiene, page health, and fatal-console health across desktop, phone, iPad Mini, and iPad.

Proof lesson

Transport-failure profiles should distinguish expected browser resource noise from application-level fatal logging, then prove the visible recovery state and browser health together.

ArtifactTypeWhat it proves
Failing transport-recovery screenshotPNG screenshot

Shows the visible recovery looked reasonable: the key stayed active and the user saw the revoke failure message.

Failing transport-recovery receipt job_e27b8e47JSON metadata

Records the mocked DELETE abort, dialog counts, preserved key state, allowed browser resource failure, and unallowed app-level fatal console error.

Failing transport-recovery console captureJSON logs

Preserves the hidden Error revoking API key console error that separated this from expected net::ERR_FAILED browser noise.

Final transport-recovery screenshotPNG screenshot

Shows the same user-facing recovery state after the fix, with the active key still preserved.

Final transport-recovery receipt job_ed5c02fcJSON metadata

Shows final production passed the full abort-mock recovery contract across all four viewports with no unallowed fatal console errors.

Final transport-recovery console captureJSON logs

Shows only the expected mocked network-failure browser noise remained after the handled app-level console error was removed.

Catch 31

Playground Batch discarded secondary error artifacts

Back to top
Playground Batch discarded secondary error artifacts evidence screenshot
May 18, 2026< $0.01Riddle sitePlaygroundartifacts
Plain-English catch card

Playground Batch discarded secondary error artifacts

Failure receipts should preserve all useful evidence, not only screenshots.

What went wrong
The authenticated Playground Batch result rendered a terminal error and no-screenshot guidance, but silently discarded returned console.json and network.har artifacts.
What Riddle caught
Initial production job job_6c3bcbf9 loaded /playground/ across desktop, phone, iPad Mini, and iPad, submitted Batch /v1/run 4/4 with sync:false, used the custom artifacts_url 4/4, and avoided the guessed /artifacts URL.
Why it matters
Riddle Proof caught evidence loss in a page that otherwise looked like it had handled the failure: the useful debug artifacts existed, but the UI hid them.
What changed
For async terminal jobs: include a zero-screenshot failed case with console/HAR artifacts, require the explicit artifact URL, assert partial-result copy, expand secondary sections, and verify no guessed fallback URL was used.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A failed Playground Batch job with no screenshots but valid console/HAR artifacts should show a useful error receipt, preserve secondary evidence, mark partial results available, avoid guessed artifact URLs, and stay browser-clean.

Claim: A failed Playground Batch job with no screenshots but valid console/HAR artifacts should show a useful error receipt, preserve secondary evidence, mark partial results available, avoid guessed artifact URLs, and stay browser-clean.

Bug: The authenticated Playground Batch result rendered a terminal error and no-screenshot guidance, but silently discarded returned console.json and network.har artifacts.

Why normal checks missed it: The route loaded, auth state worked, Batch submitted with sync:false, the service-provided artifacts_url was used, the backend error rendered, and the UI honestly said no screenshots were captured. A shallow check would see an error receipt and stop; the proof required secondary artifacts to be fetched and shown as partial evidence.

Why this sells Riddle Proof: Riddle Proof caught evidence loss in a page that otherwise looked like it had handled the failure: the useful debug artifacts existed, but the UI hid them.

Reusable profile seed: For async terminal jobs: include a zero-screenshot failed case with console/HAR artifacts, require the explicit artifact URL, assert partial-result copy, expand secondary sections, and verify no guessed fallback URL was used.

What the browser run checked

  • Loaded authenticated Playground Batch mode across desktop, phone, iPad Mini, and iPad.
  • Submitted two Batch URLs to /v1/run with sync:false and captured the request-body contract.
  • Required the service-provided custom artifacts_url to be used and the guessed /artifacts URL to stay uncalled.
  • Mocked a terminal completed_error artifact response with no screenshots but valid console.json and network.har files.
  • Required the backend error, partial results available receipt, honest No screenshots captured guidance, Console Output summary, Network HAR summary, expanded console messages, and expanded HAR request URL.
  • Rejected misleading empty secondary sections, loading state, application errors, page errors, fatal console errors, console warnings, and horizontal overflow.
  • Re-ran the same zero-screenshot secondary-evidence contract on static Preview and final production after the fix.

Proof lesson

Failure receipts should preserve all useful evidence, not only screenshots. Console and HAR artifacts are partial results too when a terminal job has no images.

ArtifactTypeWhat it proves
Failing Batch secondary-artifact screenshotPNG screenshot

Shows the production Playground Batch error receipt with no screenshots and no visible secondary evidence.

Failing Batch secondary-artifact receipt job_6c3bcbf9JSON metadata

Records that console.json and network.har were returned but hit 0/4, while partial results available and secondary evidence were absent.

Failing Batch secondary-artifact console captureJSON logs

Preserves browser-health evidence from the failing Playground Batch proof.

Fixed static Preview Batch secondary-artifact screenshotPNG screenshot

Shows the deploy candidate rendering partial-result copy plus expanded console and HAR evidence.

Fixed static Preview Batch secondary-artifact receipt job_be3524fcJSON metadata

Shows Preview fetched console/HAR artifacts 4/4, avoided the guessed artifact URL, and passed browser/layout health.

Fixed static Preview Batch secondary-artifact console captureJSON logs

Shows the Preview proof had no unallowed browser-health noise after the Batch artifact fix.

Final production Batch secondary-artifact screenshotPNG screenshot

Shows the fixed Batch error receipt live in production with secondary evidence expanded.

Final production Batch secondary-artifact receipt job_5b0cd240JSON metadata

Shows final production passed the full zero-screenshot Batch secondary-evidence contract across the viewport matrix.

Final production Batch secondary-artifact console captureJSON logs

Shows the final production proof stayed app/page clean while exercising the failed Batch artifact path.

Catch 32

Docs code copy claimed success after clipboard denial

Back to top
Docs code copy claimed success after clipboard denial evidence screenshot
May 18, 2026< $0.01Riddle siteDocsclipboard
Plain-English catch card

Docs code copy claimed success after clipboard denial

Clipboard-copy controls need feedback honesty under browser permission restrictions.

What went wrong
The public Preview docs code-copy control visibly claimed Copied!
What Riddle caught
Initial production job job_bc7d9340 loaded /docs/preview/ across desktop, phone, iPad Mini, and iPad, clicked the first code-block copy button, and saw Copied!
Why it matters
Riddle Proof caught feedback dishonesty in a public agent-facing docs control: the UI claimed success while the browser recorded a failed clipboard write.
What changed
For public copy controls: click under realistic browser clipboard restrictions, require visible feedback plus clean page-error evidence, and then repeat the same contract for each sibling copy surface touched by the shared helper.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A public docs code-copy click should never report success while leaving a browser clipboard permission denial as an unhandled page error.

Claim: A public docs code-copy click should never report success while leaving a browser clipboard permission denial as an unhandled page error.

Bug: The public Preview docs code-copy control visibly claimed Copied! after the browser denied clipboard write permission, while the page recorded an unhandled clipboard permission error.

Why normal checks missed it: The docs route loaded, the code block rendered, required docs links stayed present, and the clicked button changed to Copied!. A shallow visual or smoke check would treat that as success; the proof required browser page health after the interaction.

Why this sells Riddle Proof: Riddle Proof caught feedback dishonesty in a public agent-facing docs control: the UI claimed success while the browser recorded a failed clipboard write.

Reusable profile seed: For public copy controls: click under realistic browser clipboard restrictions, require visible feedback plus clean page-error evidence, and then repeat the same contract for each sibling copy surface touched by the shared helper.

What the browser run checked

  • Loaded the public Preview docs page across desktop, phone, iPad Mini, and iPad.
  • Required the Preview docs layout, code-block wrapper, code-copy buttons, plain-text markdown link, and tool comparison copy to remain visible.
  • Clicked the first code-block copy button and captured before/after screenshots in every viewport.
  • Required visible post-click feedback while rejecting browser page errors, fatal console errors, application errors, and horizontal overflow.
  • Verified the same interaction after the shared clipboard helper fix on static Preview and final production.

Proof lesson

Clipboard-copy controls need feedback honesty under browser permission restrictions. A visible success label is not enough when the browser API rejected the write.

ArtifactTypeWhat it proves
Failing docs code-copy screenshotPNG screenshot

Shows the public Preview docs page reporting Copied! immediately after the denied clipboard write.

Failing docs code-copy receipt job_bc7d9340JSON metadata

Records the visible Copied! state together with page_error_count 1 and the browser clipboard write denial.

Failing docs code-copy console captureJSON logs

Preserves browser-health evidence from the failed public docs copy proof.

Fixed static Preview docs code-copy screenshotPNG screenshot

Shows the deploy candidate rendering the same post-click docs copy state after the shared clipboard helper fix.

Fixed static Preview docs code-copy receipt job_81b738ddJSON metadata

Shows Preview passed the docs copy, layout, app/page health, and warning contract after the fix.

Fixed static Preview docs code-copy console captureJSON logs

Shows the Preview proof had no fatal browser-health noise after the copy helper change.

Final production docs code-copy screenshotPNG screenshot

Shows the fixed docs copy interaction live in production.

Final production docs code-copy receipt job_59dc499bJSON metadata

Shows final production passed the full public docs copy contract across the viewport matrix.

Final production docs code-copy console captureJSON logs

Shows the final production proof stayed app/page clean while exercising the docs copy control.

Catch 33

Redeem promo code leaked malformed success body

Back to top
Redeem promo code leaked malformed success body evidence screenshot
May 18, 2026< $0.01Riddle siteRedeemerror handling
Plain-English catch card

Redeem promo code leaked malformed success body

Shared backend contracts need per-surface proof.

What went wrong
The public Redeem promo-code flow visibly recovered from a malformed 200 response, but showed Server error (200) and raw malformed response text to the user.
What Riddle caught
Initial production job job_28a9fecb submitted the promo redeem request across desktop, phone, iPad Mini, and iPad with {"promo_code":"RP523-MALFORMED-SUCCESS"}.
Why it matters
Riddle Proof caught sibling-surface contract drift: a backend recovery path fixed in Billing still leaked raw response text on public Redeem.
What changed
For sibling action recovery: reuse the same malformed-success mock against each UI surface that calls the endpoint, require the local surface state to survive, and assert the exact fallback plus raw-text absences.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A malformed promo-code redeem success body on the public Redeem page should show one generic redeem error, preserve signed-in form state, avoid success copy, reject raw status/body/parser/object leaks, and stay clean on app/page health.

Claim: A malformed promo-code redeem success body on the public Redeem page should show one generic redeem error, preserve signed-in form state, avoid success copy, reject raw status/body/parser/object leaks, and stay clean on app/page health.

Bug: The public Redeem promo-code flow visibly recovered from a malformed 200 response, but showed Server error (200) and raw malformed response text to the user.

Why normal checks missed it: The route loaded, synthetic auth worked, the signed-in form stayed usable, the promo POST fired with the expected uppercase request body, and success UI stayed absent. A shallow check would see an error state and stop; the proof asserted the exact recovery text and raw-body absences.

Why this sells Riddle Proof: Riddle Proof caught sibling-surface contract drift: a backend recovery path fixed in Billing still leaked raw response text on public Redeem.

Reusable profile seed: For sibling action recovery: reuse the same malformed-success mock against each UI surface that calls the endpoint, require the local surface state to survive, and assert the exact fallback plus raw-text absences.

What the browser run checked

  • Loaded the authenticated public Redeem page across desktop, phone, iPad Mini, and iPad using synthetic local auth state and network mocks.
  • Required the signed-in Redeem form and user identity to remain visible before and after the failed promo action.
  • Captured the promo-code POST request body and required uppercase RP523-MALFORMED-SUCCESS in all four viewports.
  • Mocked the promo response as HTTP 200 with malformed JSON to exercise a success/body mismatch.
  • Required exactly one generic Failed to redeem promo code. Please try again. recovery state while rejecting Credits Added!, Code redeemed successfully!, Server error (200), raw malformed body text, parser text, object placeholders, application errors, and horizontal overflow.
  • Rejected fatal console errors, console warnings, and page errors on the handled malformed-success path.
  • Re-ran the same action-recovery contract on static Preview and final production after the fix.

Proof lesson

Shared backend contracts need per-surface proof. Billing had already been fixed for malformed promo responses, but the sibling Redeem route still leaked raw status/body text until it had its own browser contract.

ArtifactTypeWhat it proves
Failing Redeem promo recovery screenshotPNG screenshot

Shows the public Redeem page preserved signed-in form state while leaking Server error (200) and raw malformed response text.

Failing Redeem promo recovery receipt job_28a9fecbJSON metadata

Records the uppercase request-body match, preserved Redeem state, missing generic recovery copy, and raw text leakage.

Failing Redeem promo recovery console captureJSON logs

Preserves the browser-health evidence from the failing public Redeem malformed-success proof.

Fixed static Preview Redeem promo recovery screenshotPNG screenshot

Shows the deploy candidate rendered the generic recovery text without raw response leakage.

Fixed static Preview Redeem promo recovery receipt job_00373d49JSON metadata

Shows Preview passed the malformed-success Redeem recovery, layout, app/page health, and warning contract.

Fixed static Preview Redeem promo recovery console captureJSON logs

Shows the Preview proof had no unallowed browser-health noise after the Redeem recovery fix.

Final production Redeem promo recovery screenshotPNG screenshot

Shows the fixed malformed-success Redeem recovery live in production.

Final production Redeem promo recovery receipt job_3210c98aJSON metadata

Shows final production passed the full public Redeem handled action recovery contract across the viewport matrix.

Final production Redeem promo recovery console captureJSON logs

Shows the final production proof stayed app/page clean while exercising the malformed-success Redeem path.

Catch 34

Billing promo code leaked malformed success body

Back to top
Billing promo code leaked malformed success body evidence screenshot
May 17, 2026< $0.01Riddle siteBillingerror handling
Plain-English catch card

Billing promo code leaked malformed success body

Handled action recovery is a text contract, not just the presence of an error box.

What went wrong
The Billing promo-code redeem flow visibly recovered from a malformed 200 response, but showed Server error (200) and raw malformed response text to the user.
What Riddle caught
Initial production job job_794020bd submitted the promo redeem request across desktop, phone, iPad Mini, and iPad with {"promo_code":"RP521-MALFORMED-SUCCESS"}.
Why it matters
Riddle Proof caught user-visible raw response leakage in a Billing action that otherwise looked like it had handled the failure.
What changed
For handled action recovery: mock HTTP 200 with malformed body, require preserved surrounding state plus one visible fallback, assert success UI and raw response text stay absent, and keep app/page health clean.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A malformed promo-code redeem success body should show one generic redeem error, preserve Billing state, avoid success copy, reject raw status/body/parser/object leaks, and stay clean on app/page health.

Claim: A malformed promo-code redeem success body should show one generic redeem error, preserve Billing state, avoid success copy, reject raw status/body/parser/object leaks, and stay clean on app/page health.

Bug: The Billing promo-code redeem flow visibly recovered from a malformed 200 response, but showed Server error (200) and raw malformed response text to the user.

Why normal checks missed it: The Billing route loaded, authentication worked, the promo POST fired with the expected request body, and the surrounding balance, transaction history, and auto-recharge state stayed visible. A shallow check would see an error state and stop; the proof asserted the exact recovery text and raw-body absences.

Why this sells Riddle Proof: Riddle Proof caught user-visible raw response leakage in a Billing action that otherwise looked like it had handled the failure.

Reusable profile seed: For handled action recovery: mock HTTP 200 with malformed body, require preserved surrounding state plus one visible fallback, assert success UI and raw response text stay absent, and keep app/page health clean.

What the browser run checked

  • Loaded the authenticated Billing page across desktop, phone, iPad Mini, and iPad using synthetic local auth state and network mocks.
  • Required balance, transaction history, and auto-recharge state to remain visible before and after the failed promo action.
  • Captured the promo-code POST request body and required uppercase RP521-MALFORMED-SUCCESS in all four viewports.
  • Mocked the promo response as HTTP 200 with malformed JSON to exercise a success/body mismatch.
  • Required exactly one generic Failed to redeem promo code. Please try again. recovery state while rejecting Code redeemed successfully!, Server error (200), raw malformed body text, parser text, object placeholders, application errors, and horizontal overflow.
  • Rejected fatal console errors and page errors, while allowing known third-party Stripe hCaptcha WebGL warnings unrelated to the app recovery path.
  • Re-ran the same action-recovery contract on static Preview and final production after the fix.

Proof lesson

Handled action recovery is a text contract, not just the presence of an error box. Successful HTTP status with malformed body should produce generic recovery copy and never leak raw response text.

ArtifactTypeWhat it proves
Failing Billing promo recovery screenshotPNG screenshot

Shows the production Billing page preserved state while leaking Server error (200) and raw malformed response text.

Failing Billing promo recovery receipt job_794020bdJSON metadata

Records the request-body match, preserved Billing state, missing generic recovery copy, and raw text leakage.

Failing Billing promo recovery console captureJSON logs

Preserves the browser-health evidence from the failing malformed-success promo proof.

Fixed static Preview Billing promo recovery screenshotPNG screenshot

Shows the deploy candidate rendered the generic recovery text without raw response leakage.

Fixed static Preview Billing promo recovery receipt job_bfc43eefJSON metadata

Shows Preview passed the malformed-success promo recovery, layout, app/page health, and warning contract.

Fixed static Preview Billing promo recovery console captureJSON logs

Shows the Preview proof had no unallowed browser-health noise after the promo recovery fix.

Final production Billing promo recovery screenshotPNG screenshot

Shows the fixed malformed-success promo recovery live in production.

Final production Billing promo recovery receipt job_616c2533JSON metadata

Shows final production passed the full handled action recovery contract across the viewport matrix.

Final production Billing promo recovery console captureJSON logs

Shows the final production proof stayed app/page clean while exercising the malformed-success promo path.

Catch 35

Dashboard API key create logged handled parser failure

Back to top
Dashboard API key create logged handled parser failure evidence screenshot
May 17, 2026< $0.01Riddle siteDashboardconsole health
Plain-English catch card

Dashboard API key create logged handled parser failure

Handled action failures need browser-health proof even when the visible fallback is correct.

What went wrong
The Dashboard API-key create flow visibly recovered from a malformed 200 response, but still logged the JSON parser failure as a fatal browser console error.
What Riddle caught
Initial production job job_4a3386ea submitted the API-key create request across desktop, phone, iPad Mini, and iPad with the expected request body.
Why it matters
Riddle Proof caught hidden browser-health debt in a Dashboard action that looked correctly handled to the user.
What changed
For handled action recovery: mock a successful HTTP status with malformed body, require preserved surrounding data plus one visible fallback, assert success UI stays absent, and fail on parser leaks or console/page noise.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A malformed API-key create success body should show one generic create error, preserve the existing API-key row, avoid the success modal, reject parser/object leaks, and stay clean on browser console health.

Claim: A malformed API-key create success body should show one generic create error, preserve the existing API-key row, avoid the success modal, reject parser/object leaks, and stay clean on browser console health.

Bug: The Dashboard API-key create flow visibly recovered from a malformed 200 response, but still logged the JSON parser failure as a fatal browser console error.

Why normal checks missed it: The Dashboard route loaded, surrounding account data stayed visible, the existing API key remained active, the create form showed a generic failure, and the success modal stayed closed. The failure only appeared because the proof treated browser console health as part of the handled action contract.

Why this sells Riddle Proof: Riddle Proof caught hidden browser-health debt in a Dashboard action that looked correctly handled to the user.

Reusable profile seed: For handled action recovery: mock a successful HTTP status with malformed body, require preserved surrounding data plus one visible fallback, assert success UI stays absent, and fail on parser leaks or console/page noise.

What the browser run checked

  • Loaded the authenticated Dashboard across desktop, phone, iPad Mini, and iPad using synthetic local auth state and network mocks.
  • Required balance, recent jobs, and an existing active API key to remain visible before and after the failed create action.
  • Captured the API-key create POST request body and required the expected key name in all four viewports.
  • Mocked the create response as HTTP 200 with malformed JSON to exercise a success/body mismatch.
  • Required exactly one generic Failed to create API key recovery state while rejecting API Key Created!, parser text, object placeholders, application errors, and horizontal overflow.
  • Rejected fatal console errors, console warnings, and page errors on the handled malformed-success path.
  • Re-ran the same action-recovery contract on static Preview and final production after the fix.

Proof lesson

Handled action failures need browser-health proof even when the visible fallback is correct. A 200 response with malformed body should be treated like a recovery path, not a fatal app error.

ArtifactTypeWhat it proves
Failing API key create recovery screenshotPNG screenshot

Shows the production Dashboard visibly recovered while the hidden console-health assertion still failed.

Failing API key create recovery receipt job_4a3386eaJSON metadata

Records the request-body match, preserved key row, absent success modal, and failing fatal-console assertion.

Failing API key create recovery console captureJSON logs

Preserves the Error creating API key: SyntaxError browser event that made the handled recovery contract fail.

Fixed static Preview API key create recovery screenshotPNG screenshot

Shows the deploy candidate kept the visible recovery while removing the fatal console noise.

Fixed static Preview API key create recovery receipt job_e7070e36JSON metadata

Shows Preview passed the malformed-success action recovery, layout, page-error, fatal-console, and warning contract.

Fixed static Preview API key create recovery console captureJSON logs

Shows the Preview proof stayed browser-clean after the handled create recovery fix.

Final production API key create recovery screenshotPNG screenshot

Shows the fixed malformed-success create recovery live in production.

Final production API key create recovery receipt job_55ca9d1dJSON metadata

Shows final production passed the full handled action recovery contract across the viewport matrix.

Final production API key create recovery console captureJSON logs

Shows the final production proof stayed clean while exercising the malformed-success create path.

Catch 36

Playground optional artifacts leaked browser warnings

Back to top
Playground optional artifacts leaked browser warnings evidence screenshot
May 17, 2026< $0.01Riddle sitePlaygroundartifact handling
Plain-English catch card

Playground optional artifacts leaked browser warnings

Optional evidence failures should degrade silently.

What went wrong
A failed async Script result preserved screenshot evidence and rendered the correct error state, but malformed optional console.json leaked a browser console warning from the Playground artifact loader.
What Riddle caught
Initial production job job_4fb7aedd loaded a failed Script result with a valid screenshot artifact and intentionally malformed optional secondary artifacts.
Why it matters
Riddle Proof caught an evidence-product bug that made a reviewable failed result look healthy while optional artifact parsing still leaked browser warning noise.
What changed
For evidence viewers: load a failed result with one valid artifact and malformed optional secondary artifacts, require the useful receipt to remain visible, and fail on browser warnings or page errors.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
The Playground should keep failed Script results reviewable when optional secondary artifacts are missing or malformed, without leaking browser warnings from artifact parsing.

Claim: The Playground should keep failed Script results reviewable when optional secondary artifacts are missing or malformed, without leaking browser warnings from artifact parsing.

Bug: A failed async Script result preserved screenshot evidence and rendered the correct error state, but malformed optional console.json leaked a browser console warning from the Playground artifact loader.

Why normal checks missed it: The page was usable, the route stayed stable, network and layout checks passed, and the failure receipt looked reviewable. The issue only appeared because the proof treated browser-health signals as part of the artifact contract.

Why this sells Riddle Proof: Riddle Proof caught an evidence-product bug that made a reviewable failed result look healthy while optional artifact parsing still leaked browser warning noise.

Reusable profile seed: For evidence viewers: load a failed result with one valid artifact and malformed optional secondary artifacts, require the useful receipt to remain visible, and fail on browser warnings or page errors.

What the browser run checked

  • Loaded the Playground Script result route with a failed async job receipt.
  • Required the failed status, screenshot evidence, route stability, and core result text to remain visible.
  • Presented intentionally malformed optional console/HAR artifacts to exercise graceful degradation.
  • Rejected browser console warnings, fatal console noise, page errors, and horizontal overflow.
  • Re-ran the same malformed-secondary-artifact contract on static Preview and final production after the fix.

Proof lesson

Optional evidence failures should degrade silently. Evidence UIs need browser-health checks, not screenshot-only review.

ArtifactTypeWhat it proves
Failing malformed optional artifact screenshotPNG screenshot

Shows the failed Script result stayed reviewable even while optional secondary artifact parsing leaked warning noise.

Failing malformed optional artifact receipt job_4fb7aeddJSON metadata

Records the warning failure that made the optional artifact loader contract fail in production.

Failing malformed optional artifact console captureJSON logs

Preserves the browser-health evidence from the broken optional console.json handling.

Fixed static Preview malformed optional artifact screenshotPNG screenshot

Shows the deploy candidate kept the failed result reviewable after malformed optional artifacts were made silent.

Fixed static Preview malformed optional artifact receipt job_7566934eJSON metadata

Shows Preview passed the malformed-secondary-artifact, layout, page-error, fatal-console, and warning contract.

Fixed static Preview malformed optional artifact console captureJSON logs

Shows the Preview proof stayed clean while loading malformed optional secondary artifacts.

Final production malformed optional artifact screenshotPNG screenshot

Shows the fixed failed Script receipt live in production.

Final production malformed optional artifact receipt job_a4ba8716JSON metadata

Shows final production passed the full malformed-secondary-artifact browser-health contract.

Final production malformed optional artifact console captureJSON logs

Shows the final production proof had no warnings after the optional artifact handling fix.

Catch 37

Playground partial results were screenshot-biased

Back to top
Playground partial results were screenshot-biased evidence screenshot
May 17, 2026< $0.01Riddle sitePlaygroundartifact handling
Plain-English catch card

Playground partial results were screenshot-biased

Evidence products must avoid screenshot bias.

What went wrong
A failed Script result with no screenshots but valid console and HAR artifacts rendered the secondary evidence, but omitted the partial results available receipt because the product counted only screenshots as partial evidence.
What Riddle caught
Initial production job job_f8acfe7d loaded a failed Script result with zero screenshots and valid console/HAR artifacts.
Why it matters
Riddle Proof caught screenshot bias in an evidence UI by proving that console and HAR artifacts need the same recovery weight as screenshots.
What changed
For failed-result evidence viewers: run one proof with screenshots absent and console/HAR present, assert the secondary evidence is visible, and require the same partial-results receipt used for screenshot-backed failures.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
The Playground should mark failed Script results as partial when any useful evidence exists, including console and HAR artifacts without screenshots.

Claim: The Playground should mark failed Script results as partial when any useful evidence exists, including console and HAR artifacts without screenshots.

Bug: A failed Script result with no screenshots but valid console and HAR artifacts rendered the secondary evidence, but omitted the partial results available receipt because the product counted only screenshots as partial evidence.

Why normal checks missed it: The route, failed state, console section, HAR section, layout, browser errors, and warning checks all passed. The semantic receipt check caught that console and HAR evidence were not treated as first-class partial results.

Why this sells Riddle Proof: Riddle Proof caught screenshot bias in an evidence UI by proving that console and HAR artifacts need the same recovery weight as screenshots.

Reusable profile seed: For failed-result evidence viewers: run one proof with screenshots absent and console/HAR present, assert the secondary evidence is visible, and require the same partial-results receipt used for screenshot-backed failures.

What the browser run checked

  • Loaded the Playground Script result route with a failed async job receipt.
  • Required zero screenshots while providing valid console and HAR artifacts.
  • Required console and HAR evidence sections to render without empty secondary sections.
  • Required the partial results available receipt even when no screenshot evidence existed.
  • Rejected browser warnings, fatal console noise, page errors, default artifact fallbacks, and horizontal overflow.
  • Re-ran the same secondary-only partial evidence contract on static Preview and final production after the fix.

Proof lesson

Evidence products must avoid screenshot bias. Console logs and HAR records are first-class recovery artifacts, especially when screenshots are missing.

ArtifactTypeWhat it proves
Failing secondary-only partial result screenshotPNG screenshot

Shows console and HAR evidence rendered in production while the partial-results receipt was missing.

Failing secondary-only partial result receipt job_f8acfe7dJSON metadata

Records the missing semantic receipt that revealed screenshot-biased partial-result detection.

Failing secondary-only partial result console captureJSON logs

Shows the browser stayed clean while the semantic artifact receipt failed.

Fixed static Preview secondary-only partial result screenshotPNG screenshot

Shows the deploy candidate marked secondary-only evidence as partial results.

Fixed static Preview secondary-only partial result receipt job_1fe8aa7bJSON metadata

Shows Preview passed the console/HAR partial-evidence, layout, page-error, fatal-console, and warning contract.

Fixed static Preview secondary-only partial result console captureJSON logs

Shows the Preview proof stayed clean while rendering secondary-only partial evidence.

Final production secondary-only partial result screenshotPNG screenshot

Shows the fixed secondary-only partial-result receipt live in production.

Final production secondary-only partial result receipt job_adba0f31JSON metadata

Shows final production passed the full secondary-only partial-evidence contract.

Final production secondary-only partial result console captureJSON logs

Shows the final production proof stayed clean while proving console/HAR as first-class recovery evidence.

Catch 38

Dashboard API key modal copy crashed on clipboard denial

Back to top
Dashboard API key modal copy crashed on clipboard denial evidence screenshot
May 17, 2026< $0.01Riddle siteDashboardsecurity controls
Plain-English catch card

Dashboard API key modal copy crashed on clipboard denial

Credential controls need browser-permission-aware interaction proof.

What went wrong
The Dashboard one-time API-key modal called the browser clipboard API directly.
What Riddle caught
Initial production job job_6599260b created the API key across desktop, phone, iPad Mini, and iPad, rendered API Key Created!, the one-time secret, and the curl usage snippet, then failed because modal Copy threw a clipboard permission page error and Copied never appeared.
Why it matters
Riddle Proof caught a real browser-permission failure in a credential-copy flow that looked healthy until the proof exercised the modal interaction.
What changed
For one-time secret modals: set synthetic auth state, mock account APIs, submit the exact create action, assert request body and modal content, click Copy, require success or visible recovery, and fail on page errors plus console noise.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
The Dashboard API-key creation modal should create a one-time key, show the secret and usage snippet, copy the secret under browser clipboard restrictions, show visible Copied feedback, and stay clean on page errors and console noise.

Claim: The Dashboard API-key creation modal should create a one-time key, show the secret and usage snippet, copy the secret under browser clipboard restrictions, show visible Copied feedback, and stay clean on page errors and console noise.

Bug: The Dashboard one-time API-key modal called the browser clipboard API directly. In Riddle's browser run, clipboard writes were denied, the page threw a fatal browser error, and the modal never reached the expected Copied state.

Why normal checks missed it: The Dashboard loaded, mocked account data rendered, the API-key POST succeeded, the one-time key modal appeared, and the curl snippet was visible. The failure only appeared when the proof clicked Copy inside the browser permission model and treated modal feedback plus page errors as part of the contract.

Why this sells Riddle Proof: Riddle Proof caught a real browser-permission failure in a credential-copy flow that looked healthy until the proof exercised the modal interaction.

Reusable profile seed: For one-time secret modals: set synthetic auth state, mock account APIs, submit the exact create action, assert request body and modal content, click Copy, require success or visible recovery, and fail on page errors plus console noise.

What the browser run checked

  • Loaded the authenticated Dashboard across desktop, phone, iPad Mini, and iPad using synthetic local auth state and network mocks.
  • Required mocked balance, recent jobs, and existing API-key data to render before creating the new key.
  • Captured the API-key create POST request body and required the expected key name in all four viewports.
  • Required the one-time API-key modal, secret, and curl usage snippet to render.
  • Clicked modal Copy and required visible Copied feedback inside the modal.
  • Rejected manual-copy failure text, object placeholders, application errors, page errors, fatal console noise, warning noise, and horizontal overflow.
  • Re-ran the same modal-copy contract on static Preview and final production after the fix.

Proof lesson

Credential controls need browser-permission-aware interaction proof. If the product asks users to copy a secret, the proof should click that exact control, require visible feedback, and fail on page errors across real browser environments.

ArtifactTypeWhat it proves
Failing API key modal screenshotPNG screenshot

Shows the one-time API-key modal in production before the Copy click exposed the clipboard-denial failure.

Failing API key modal receipt job_6599260bJSON metadata

Records the create request-body receipts, missing Copied assertion, clipboard permission page error, and viewport/browser evidence.

Failing API key modal console captureJSON logs

Preserves the browser error evidence from the failed modal copy flow.

Fixed static Preview API key modal screenshotPNG screenshot

Shows the deploy candidate after modal Copy reached visible Copied feedback.

Fixed static Preview API key modal receipt job_8cf778adJSON metadata

Shows Preview passed the modal-copy, request-body, layout, page-error, fatal-console, and warning contract across all four viewports.

Fixed static Preview API key modal console captureJSON logs

Shows the Preview proof had no page errors, fatal console errors, or warnings after the fix.

Final production API key modal before-copy screenshotPNG screenshot

Shows the fixed one-time key modal live in production before the copy action.

Final production API key modal copied screenshotPNG screenshot

Shows production reached visible Copied feedback in the credential modal.

Final production API key modal receipt job_28c9b5b7JSON metadata

Shows final production passed the full modal-copy, request-body, layout, page-error, fatal-console, and warning contract.

Final production API key modal console captureJSON logs

Shows the final production modal-copy proof stayed clean while exercising the sensitive control.

Catch 39

Dashboard MCP token copy crashed on clipboard denial

Back to top
Dashboard MCP token copy crashed on clipboard denial evidence screenshot
May 17, 2026< $0.01Riddle siteDashboardsecurity controls
Plain-English catch card

Dashboard MCP token copy crashed on clipboard denial

Security-sensitive controls need interaction proof, not just visual proof.

What went wrong
The Dashboard MCP Login Token card called the browser clipboard API directly.
What Riddle caught
Initial production job job_0c54da5c proved the authenticated Dashboard loaded across desktop, phone, iPad Mini, and iPad, then failed because Copy token threw a clipboard permission page error.
Why it matters
This is authenticated product proof material: Riddle Proof caught a real browser-permission failure in a token-control flow that looked healthy until the proof exercised the interaction.
What changed
For sensitive auth/account controls: set synthetic auth state, mock account APIs, click the exact controls, assert secret redaction before and after copy, assert explicit reveal/hide behavior, and fail on page errors plus console noise.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
The Dashboard MCP Login Token card should let users copy a short-lived token without revealing it, reveal it only on explicit click, hide it again, and stay clean when clipboard writes are denied.

Claim: The Dashboard MCP Login Token card should let users copy a short-lived token without revealing it, reveal it only on explicit click, hide it again, and stay clean when clipboard writes are denied.

Bug: The Dashboard MCP Login Token card called the browser clipboard API directly. In Riddle's browser run, clipboard writes were denied, the page threw a fatal browser error, and the token copy flow never reached the expected masked Copied state.

Why normal checks missed it: The Dashboard loaded, mocked account data rendered, the token stayed masked, and the buttons looked right. The failure only appeared when the proof clicked Copy token inside the browser permission model and treated page errors plus token-redaction state as part of the contract.

Why this sells Riddle Proof: This is authenticated product proof material: Riddle Proof caught a real browser-permission failure in a token-control flow that looked healthy until the proof exercised the interaction.

Reusable profile seed: For sensitive auth/account controls: set synthetic auth state, mock account APIs, click the exact controls, assert secret redaction before and after copy, assert explicit reveal/hide behavior, and fail on page errors plus console noise.

What the browser run checked

  • Loaded the authenticated Dashboard across desktop, phone, iPad Mini, and iPad using synthetic local auth state and network mocks.
  • Required mocked balance, recent jobs, API-key, and MCP Login Token sections to render.
  • Asserted the synthetic token marker stayed absent before copy, after copy, and after Hide.
  • Clicked Copy token, Reveal, and Hide while capturing screenshots after each state transition.
  • Rejected unhandled page errors, fatal console noise, warning noise, and horizontal overflow.
  • Re-ran the same token-state contract on static Preview and final production after the fix.

Proof lesson

Security-sensitive controls need interaction proof, not just visual proof. Copy, reveal, hide, and redaction states should be checked as a state machine across browser environments.

ArtifactTypeWhat it proves
Failing Dashboard token screenshotPNG screenshot

Shows the token card in its masked production state before the copy click exposed the clipboard-denial failure.

Failing Dashboard token receipt job_0c54da5cJSON metadata

Records the clipboard permission page error, failed setup action, token-redaction assertions, and viewport/browser evidence.

Failing Dashboard token console captureJSON logs

Preserves the fatal browser error that made the copy path fail under Riddle.

Fixed static Preview token screenshotPNG screenshot

Shows the deploy candidate after copy, reveal, and hide completed with the token masked again.

Fixed static Preview token receipt job_27b77be1JSON metadata

Shows Preview passed the copy/reveal/hide token-state contract across all four viewports.

Fixed static Preview token console captureJSON logs

Shows the Preview proof had no page errors, fatal console errors, or warnings after the fix.

Final production token copied screenshotPNG screenshot

Shows production reached the copied state while the token remained masked.

Final production token revealed screenshotPNG screenshot

Shows production reveals the token only after the explicit Reveal action.

Final production token hidden screenshotPNG screenshot

Shows production returns to the masked token state after Hide.

Final production Dashboard token receipt job_49453ba2JSON metadata

Shows final production passed the full token-state, layout, page-error, fatal-console, and warning contract.

Final production Dashboard token console captureJSON logs

Shows the final production token proof stayed clean while exercising the sensitive control.

Catch 40

Docs Markdown leaked code entities

Back to top
Docs Markdown leaked code entities evidence screenshot
May 17, 2026< $0.01Riddle siteagent markdowndocs copy
Plain-English catch card

Docs Markdown leaked code entities

Agent-readable docs need their own proof surface.

What went wrong
The agent-facing Riddle Proof docs markdown export leaked HTML entity text into a code example, so the rendered docs were readable but the raw markdown that agents consume carried a stale escaped-code contract.
What Riddle caught
Initial production job job_ced017b6 proved the rendered Riddle Proof docs were healthy but the markdown export still leaked escaped code entities.
Why it matters
This is a public agent-surface catch: Riddle Proof caught docs drift that a human browser review could miss because the bug lived in the markdown contract agents actually read.
What changed
For docs and agent surfaces: prove rendered HTML, raw markdown exports, required terms, forbidden stale or escaped snippets, viewport layout, fatal-console health, and warning hygiene together.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Riddle Proof docs should expose clean, machine-consumable markdown examples without leaked HTML entity text while the rendered docs remain healthy.

Claim: Riddle Proof docs should expose clean, machine-consumable markdown examples without leaked HTML entity text while the rendered docs remain healthy.

Bug: The agent-facing Riddle Proof docs markdown export leaked HTML entity text into a code example, so the rendered docs were readable but the raw markdown that agents consume carried a stale escaped-code contract.

Why normal checks missed it: The human docs page loaded, the visible section looked correct, and normal route checks would not read the raw markdown body as an agent would. The failure only surfaced when the proof treated rendered docs and /docs/riddle-proof/markdown.md as separate public contracts.

Why this sells Riddle Proof: This is a public agent-surface catch: Riddle Proof caught docs drift that a human browser review could miss because the bug lived in the markdown contract agents actually read.

Reusable profile seed: For docs and agent surfaces: prove rendered HTML, raw markdown exports, required terms, forbidden stale or escaped snippets, viewport layout, fatal-console health, and warning hygiene together.

What the browser run checked

  • Loaded the public Riddle Proof docs across desktop, phone, iPad Mini, and iPad.
  • Fetched the agent-facing markdown export and checked the raw response body directly.
  • Required current Riddle Proof terms and example syntax to stay present in both human and machine-readable surfaces.
  • Rejected leaked code entity text in the raw markdown export.
  • Re-ran the same rendered-docs, markdown-body, overflow, fatal-console, and warning contract on static Preview and final production.

Proof lesson

Agent-readable docs need their own proof surface. A rendered page can be correct while the markdown export that models and CLIs rely on is stale, escaped, or semantically different.

ArtifactTypeWhat it proves
Failing docs markdown screenshotPNG screenshot

Shows the production docs page while the same proof caught leaked entity text in the raw markdown export.

Failing docs markdown receipt job_ced017b6JSON metadata

Records the raw markdown body failure and the clean browser/layout evidence around it.

Failing docs markdown console captureJSON logs

Shows the failure was not caused by unrelated fatal console or warning noise.

Fixed static Preview docs markdown screenshotPNG screenshot

Shows the deploy candidate after the markdown entity leak was corrected.

Fixed static Preview docs markdown receipt job_8a7bd738JSON metadata

Shows Preview passed the rendered docs and raw markdown contract before production deploy.

Fixed static Preview docs markdown console captureJSON logs

Shows the Preview proof stayed clean while proving the agent markdown fix.

Final production docs markdown screenshotPNG screenshot

Shows the fixed docs page live in production after deploy.

Final production docs markdown receipt job_764fcc77JSON metadata

Shows final production passed the raw markdown, rendered route, overflow, fatal-console, and warning checks.

Final production docs markdown console captureJSON logs

Shows the final production proof had no browser noise hiding the docs result.

Catch 41

Serverless page taught stale screenshot polling

Back to top
Serverless page taught stale screenshot polling evidence screenshot
May 17, 2026< $0.01Riddle siteserverless docsscreenshot API
Plain-English catch card

Serverless page taught stale screenshot polling

API education pages need contract proof, not just route proof.

What went wrong
The Serverless docs still taught an old screenshot polling response shape, even though the current API returns immediate screenshot evidence for the simple screenshot flow.
What Riddle caught
Initial production job job_5cf3a0a0 proved the Serverless page was alive but still described stale screenshot polling behavior instead of the current simple screenshot response.
Why it matters
This is public docs proof material: Riddle Proof caught stale integration guidance before it could keep teaching users and agents the wrong screenshot API shape.
What changed
For API documentation: prove route health, exact current response-shape language, forbidden stale snippets, raw or rendered docs as needed, viewport layout, fatal-console health, and warning hygiene together.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
The Serverless docs should teach the current simple screenshot response contract and avoid stale polling-oriented screenshot guidance.

Claim: The Serverless docs should teach the current simple screenshot response contract and avoid stale polling-oriented screenshot guidance.

Bug: The Serverless docs still taught an old screenshot polling response shape, even though the current API returns immediate screenshot evidence for the simple screenshot flow.

Why normal checks missed it: The page loaded, the surrounding copy looked plausible, and stale API snippets can survive normal visual checks. The proof failed because it searched for the exact current contract and rejected the older polling-oriented language.

Why this sells Riddle Proof: This is public docs proof material: Riddle Proof caught stale integration guidance before it could keep teaching users and agents the wrong screenshot API shape.

Reusable profile seed: For API documentation: prove route health, exact current response-shape language, forbidden stale snippets, raw or rendered docs as needed, viewport layout, fatal-console health, and warning hygiene together.

What the browser run checked

  • Loaded the Serverless docs across desktop, phone, iPad Mini, and iPad.
  • Required current simple screenshot response language and code-contract terms to be visible.
  • Rejected stale screenshot polling copy that would send integrators down the wrong path.
  • Kept the page clean on horizontal overflow, fatal browser errors, and console warnings.
  • Re-ran the same Serverless docs contract on static Preview and final production.

Proof lesson

API education pages need contract proof, not just route proof. A stale snippet can be more damaging than a missing page because it teaches agents and users the wrong integration path.

ArtifactTypeWhat it proves
Failing Serverless docs screenshotPNG screenshot

Shows the production Serverless docs while the same run caught stale screenshot polling guidance.

Failing Serverless docs receipt job_5cf3a0a0JSON metadata

Records the missing current contract text, forbidden stale-copy evidence, and viewport/browser health checks.

Failing Serverless docs console captureJSON logs

Shows the stale docs catch was not masked by unrelated browser errors.

Fixed static Preview Serverless screenshotPNG screenshot

Shows the deploy candidate after the Serverless screenshot response copy was corrected.

Fixed static Preview Serverless receipt job_fa59cd7aJSON metadata

Shows Preview passed the current screenshot response and stale-copy rejection checks before deploy.

Fixed static Preview Serverless console captureJSON logs

Shows the Preview proof stayed clean while proving the docs correction.

Final production Serverless screenshotPNG screenshot

Shows the fixed Serverless docs live in production after deploy.

Final production Serverless receipt job_42300b18JSON metadata

Shows final production passed the current response-shape, stale-copy, layout, fatal-console, and warning checks.

Final production Serverless console captureJSON logs

Shows the final production proof had no warning or fatal-console noise.

Catch 42

Homepage taught stale screenshot JSON

Back to top
Homepage taught stale screenshot JSON evidence screenshot
May 17, 2026< $0.01Riddle sitehomepagescreenshot API
Plain-English catch card

Homepage taught stale screenshot JSON

Homepage examples are integration docs.

What went wrong
The homepage still advertised a stale screenshot JSON response shape, so the top public entry point taught a contract that no longer matched the simple screenshot API.
What Riddle caught
Initial production job job_0a9292ea proved the homepage loaded cleanly but still carried stale screenshot JSON copy.
Why it matters
This is public conversion-surface proof: Riddle Proof caught stale API teaching copy on the homepage, where wrong examples shape a buyer or agent before they reach deeper docs.
What changed
For homepages with live API examples: prove exact response-shape copy, forbidden stale examples, visible call-to-action context, viewport layout, fatal-console health, and warning hygiene together.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
The Riddle homepage should teach the current simple screenshot response shape and avoid stale screenshot JSON examples.

Claim: The Riddle homepage should teach the current simple screenshot response shape and avoid stale screenshot JSON examples.

Bug: The homepage still advertised a stale screenshot JSON response shape, so the top public entry point taught a contract that no longer matched the simple screenshot API.

Why normal checks missed it: The homepage was visually polished and route-healthy. A normal screenshot review would see the marketing section, not whether the JSON example matched the actual API response contract.

Why this sells Riddle Proof: This is public conversion-surface proof: Riddle Proof caught stale API teaching copy on the homepage, where wrong examples shape a buyer or agent before they reach deeper docs.

Reusable profile seed: For homepages with live API examples: prove exact response-shape copy, forbidden stale examples, visible call-to-action context, viewport layout, fatal-console health, and warning hygiene together.

What the browser run checked

  • Loaded the homepage across desktop, phone, iPad Mini, and iPad.
  • Required current simple screenshot response copy and integration terms to stay visible.
  • Rejected stale screenshot JSON snippets from the public homepage example.
  • Kept the homepage clean on horizontal overflow, fatal browser errors, and console warnings.
  • Re-ran the same homepage contract on static Preview and final production.

Proof lesson

Homepage examples are integration docs. The public first impression should be checked for exact API contract language, forbidden stale snippets, layout health, and browser noise together.

ArtifactTypeWhat it proves
Failing homepage screenshotPNG screenshot

Shows the production homepage while the same run caught stale screenshot JSON copy.

Failing homepage receipt job_0a9292eaJSON metadata

Records the stale homepage contract failure and clean layout/browser evidence.

Failing homepage console captureJSON logs

Shows the homepage catch was not driven by unrelated console errors.

Fixed static Preview homepage screenshotPNG screenshot

Shows the deploy candidate after the homepage screenshot response example was corrected.

Fixed static Preview homepage receipt job_c1fb675bJSON metadata

Shows Preview passed the current screenshot response and stale-copy rejection checks before deploy.

Fixed static Preview homepage console captureJSON logs

Shows the Preview proof stayed clean while proving the homepage correction.

Final production homepage screenshotPNG screenshot

Shows the fixed homepage live in production after deploy.

Final production homepage receipt job_6d79c361JSON metadata

Shows final production passed the current response-shape, stale-copy, layout, fatal-console, and warning checks.

Final production homepage console captureJSON logs

Shows the final production proof had no warning or fatal-console noise.

Catch 43

Preview guide taught stale URL shape

Back to top
Preview guide taught stale URL shape evidence screenshot
May 17, 2026< $0.01Riddle sitePreviewdocs copy
Plain-English catch card

Preview guide taught stale URL shape

Preview docs are part of the deployment surface.

What went wrong
The Preview Tools guide still taught the old subdomain-style Preview URL and response field names instead of the current /s/pv_* preview_url contract.
What Riddle caught
Initial production job job_e8beb4d5 proved the Preview Tools guide still contained stale Preview URL shape copy.
Why it matters
This is public integration-docs proof: Riddle Proof caught a stale Preview URL contract in the guide that agents use to create before/after proof environments.
What changed
For Preview and deployment docs: prove exact current URL shapes, response field names, before/after wiring examples, forbidden legacy snippets, viewport layout, fatal-console health, and warning hygiene together.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
The Preview Tools guide should teach the current preview_url and /s/pv_* URL contract, and should not contain legacy subdomain Preview examples.

Claim: The Preview Tools guide should teach the current preview_url and /s/pv_* URL contract, and should not contain legacy subdomain Preview examples.

Bug: The Preview Tools guide still taught the old subdomain-style Preview URL and response field names instead of the current /s/pv_* preview_url contract.

Why normal checks missed it: The guide route loaded and the examples looked credible. The failure was a stale integration contract embedded in code snippets, so it needed exact text checks for both required current snippets and forbidden legacy snippets.

Why this sells Riddle Proof: This is public integration-docs proof: Riddle Proof caught a stale Preview URL contract in the guide that agents use to create before/after proof environments.

Reusable profile seed: For Preview and deployment docs: prove exact current URL shapes, response field names, before/after wiring examples, forbidden legacy snippets, viewport layout, fatal-console health, and warning hygiene together.

What the browser run checked

  • Loaded the Preview Tools guide across desktop, phone, iPad Mini, and iPad.
  • Required the current preview_url response field and /s/pv_* URL example to be visible.
  • Required url_before and url_after examples to use preview_url values.
  • Rejected the legacy pv-a1b2c3d4.preview.riddledc.com shape and stale url field examples.
  • Re-ran the same guide contract on static Preview and final production with overflow, fatal-console, and warning checks.

Proof lesson

Preview docs are part of the deployment surface. If the docs teach stale URL shapes, agents can deploy correctly and then wire proof runs to the wrong URL field.

ArtifactTypeWhat it proves
Failing Preview guide screenshotPNG screenshot

Shows the production Preview Tools guide while the same run caught stale URL contract copy.

Failing Preview guide receipt job_e8beb4d5JSON metadata

Records the missing current Preview snippets, forbidden legacy URL snippets, and clean browser evidence.

Failing Preview guide console captureJSON logs

Shows the stale Preview guide catch was not caused by unrelated browser errors.

Fixed static Preview guide screenshotPNG screenshot

Shows the deploy candidate after the Preview URL examples were corrected.

Fixed static Preview guide receipt job_f1902246JSON metadata

Shows Preview passed the current URL-shape and forbidden legacy-snippet checks before deploy.

Fixed static Preview guide console captureJSON logs

Shows the Preview proof stayed clean while proving the guide correction.

Final production Preview guide screenshotPNG screenshot

Shows the fixed Preview Tools guide live in production after deploy.

Final production Preview guide receipt job_ebf7a878JSON metadata

Shows final production passed the current Preview URL, forbidden legacy-snippet, layout, fatal-console, and warning checks.

Final production Preview guide console captureJSON logs

Shows the final production proof had no warning or fatal-console noise.

Catch 44

Playground async Workflow ignored artifact URLs

Back to top
Playground async Workflow ignored artifact URLs evidence screenshot
May 17, 2026< $0.01Riddle sitePlaygroundartifact evidence
Plain-English catch card

Playground async Workflow ignored artifact URLs

Agent-facing evidence UIs need artifact-contract checks per mode.

What went wrong
The public Playground accepted an async Workflow job response with a service-returned artifacts_url, but still polled the guessed default /artifacts URL and rendered no Workflow screenshot, console, or HAR evidence.
What Riddle caught
Initial production job job_d040676d proved the async Workflow flow across desktop, phone, iPad Mini, and iPad: /v1/run returned job_rp486_workflow_har_artifacts plus an explicit artifacts_url, but the UI never hit that custom URL, hit the forbidden guessed /artifacts URL 4/4 times, and rendered No screenshots captured with no console or Network HAR evidence.
Why it matters
This is proof-driven-product material: Riddle Proof caught a mode-specific missing evidence branch inside Riddle Playground after the Workflow job itself already looked successful.
What changed
For multi-mode artifact UIs: prove each mode honors service-returned artifact URLs, accepts files[] and artifacts[] payloads, supports url and download_url variants, avoids guessed fallback URLs, renders every expected evidence family, and stays clean on layout, fatal-console, and warning hygiene.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
The public Playground should render async Workflow screenshot, console, and Network HAR evidence from the service-returned artifacts_url, including files[] and download_url artifact payloads.

Claim: The public Playground should render async Workflow screenshot, console, and Network HAR evidence from the service-returned artifacts_url, including files[] and download_url artifact payloads.

Bug: The public Playground accepted an async Workflow job response with a service-returned artifacts_url, but still polled the guessed default /artifacts URL and rendered no Workflow screenshot, console, or HAR evidence.

Why normal checks missed it: The visible job result looked superficially successful: the Workflow request submitted, the job ID appeared, and the page showed a Success result. The failure lived in the evidence branch: the UI ignored the exact artifact collection URL returned by the service, so the proof artifacts existed but never reached the review surface.

Why this sells Riddle Proof: This is proof-driven-product material: Riddle Proof caught a mode-specific missing evidence branch inside Riddle Playground after the Workflow job itself already looked successful.

Reusable profile seed: For multi-mode artifact UIs: prove each mode honors service-returned artifact URLs, accepts files[] and artifacts[] payloads, supports url and download_url variants, avoids guessed fallback URLs, renders every expected evidence family, and stays clean on layout, fatal-console, and warning hygiene.

What the browser run checked

  • Loaded /playground/ with mocked Cognito-style authenticated state across desktop, phone, iPad Mini, and iPad.
  • Submitted Workflow async mode and verified the request body included steps, sync:false, target URL, and screenshot step.
  • Required the UI to poll the explicit custom artifacts_url returned by /v1/run exactly once per viewport and never hit the forbidden guessed /artifacts fallback.
  • Required files[] artifacts with screenshot, console.json, and network.har download_url values to render together.
  • Required the screenshot label rp486-workflow-before, console output rp486 workflow har console log, Network HAR, and HAR request rp486-workflow-resource to be visible.
  • Required empty screenshot, console, and network states, failed-step copy, app errors, horizontal overflow, fatal errors, and console warnings to stay absent.

Proof lesson

Agent-facing evidence UIs need artifact-contract checks per mode. Script, Workflow, and Batch can share a product surface while accidentally carrying different artifact URL and payload assumptions.

ArtifactTypeWhat it proves
Failing Playground Workflow result screenshotPNG screenshot

Shows the production Playground result after the Workflow job succeeded but before any returned artifact evidence was visible.

Failing Playground Workflow receipt job_d040676dJSON metadata

Records the custom artifacts_url miss, forbidden fallback hits, missing screenshot/console/HAR assertions, and clean layout/browser evidence across the viewport matrix.

Failing Playground Workflow console captureJSON logs

Shows the failing proof had no unrelated fatal browser-console noise hiding the artifact-contract issue.

Fixed static Preview Workflow screenshotPNG screenshot

Shows the deploy candidate rendering screenshot, console, and expanded Network HAR evidence from the same mocked Workflow artifact payload.

Fixed static Preview Workflow receipt job_4c04dfe6JSON metadata

Shows the Preview fix passed all Workflow artifact, HAR, network-mock, layout, fatal-console, and warning checks.

Fixed static Preview Workflow console captureJSON logs

Shows the fixed Preview proof had clean console evidence while proving the Workflow HAR branch.

Final production Workflow screenshotPNG screenshot

Shows the fixed Workflow artifact behavior live in production after Amplify deployed PR #146.

Final production Workflow receipt job_3fc1124eJSON metadata

Shows final production passed the async Workflow screenshot, console, HAR, fallback-avoidance, overflow, fatal-console, and warning checks.

Final production Workflow console captureJSON logs

Shows the final production proof had zero warning events and no fatal console or page errors.

Catch 45

Playground async Script ignored HAR artifacts

Back to top
Playground async Script ignored HAR artifacts evidence screenshot
May 17, 2026< $0.01Riddle sitePlaygroundartifact evidence
Plain-English catch card

Playground async Script ignored HAR artifacts

Artifact-contract proof needs to check every evidence family, not just the first screenshot or console log.

What went wrong
The public Playground rendered screenshot and console artifacts from an async Script files[] response, but ignored the network.har artifact in the same payload.
What Riddle caught
Initial production job job_abb56468 proved the async Script flow across desktop, phone, iPad Mini, and iPad: /v1/run returned job_rp484_async_har_artifacts plus an explicit artifacts_url, the UI polled that custom endpoint 4/4 times, never hit the forbidden default /artifacts URL, rendered screenshot label rp484-har-before, and fetched console output rp484 async har console log.
Why it matters
This is proof-driven-product material: Riddle Proof caught a missing evidence branch inside Riddle Playground after the async run already looked successful.
What changed
For artifact UIs and agent-run consoles: prove service-returned artifact URLs, all artifact payload families, url and download_url variants, forbidden fallback URLs, expandable evidence panes, layout, fatal-console health, and warning hygiene together.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
The public Playground should render screenshot, console, and Network HAR evidence from async Script files[] artifact payloads, including download_url-only HAR files.

Claim: The public Playground should render screenshot, console, and Network HAR evidence from async Script files[] artifact payloads, including download_url-only HAR files.

Bug: The public Playground rendered screenshot and console artifacts from an async Script files[] response, but ignored the network.har artifact in the same payload.

Why normal checks missed it: The visible run looked mostly healthy: the async job completed, the custom artifacts_url was used instead of a guessed /artifacts URL, the screenshot label appeared, and console output rendered. The missing branch was narrower: HAR evidence existed in the artifact payload but never became a Network HAR review section.

Why this sells Riddle Proof: This is proof-driven-product material: Riddle Proof caught a missing evidence branch inside Riddle Playground after the async run already looked successful.

Reusable profile seed: For artifact UIs and agent-run consoles: prove service-returned artifact URLs, all artifact payload families, url and download_url variants, forbidden fallback URLs, expandable evidence panes, layout, fatal-console health, and warning hygiene together.

What the browser run checked

  • Loaded /playground/ with mocked Cognito-style authenticated state across desktop, phone, iPad Mini, and iPad.
  • Submitted Script async mode and verified the request body included the script, sync:false, console log text, and saveScreenshot call.
  • Required the UI to poll the explicit custom artifacts_url returned by /v1/run exactly once per viewport and never hit the forbidden guessed /artifacts fallback.
  • Required files[] artifacts with screenshot download_url, console url, and network.har download_url to render together.
  • Required the screenshot label rp484-har-before, console output rp484 async har console log, Network HAR, and HAR request rp484-har-resource to be visible.
  • Required empty screenshot, console, and network states, failed-step copy, app errors, horizontal overflow, fatal errors, and console warnings to stay absent.

Proof lesson

Artifact-contract proof needs to check every evidence family, not just the first screenshot or console log. A result can look successful while one artifact type silently disappears from the review surface.

ArtifactTypeWhat it proves
Failing Playground HAR result screenshotPNG screenshot

Shows the production Playground result after screenshot and console rendered but before Network HAR evidence was available.

Failing Playground HAR receipt job_abb56468JSON metadata

Records the explicit artifacts_url, files[] screenshot/console/HAR payload, forbidden fallback hit count, missing Network HAR assertion, and warning evidence across the viewport matrix.

Failing Playground HAR console captureJSON logs

Preserves the five unused CSS preload warnings caught during the long async artifact review flow.

Fixed static Preview HAR screenshotPNG screenshot

Shows the deploy candidate rendering screenshot, console, and expanded Network HAR evidence from the same mocked artifact payload.

Fixed static Preview HAR receipt job_b69d0aa4JSON metadata

Shows the Preview fix passed all async artifact, HAR, network-mock, layout, fatal-console, and warning checks.

Fixed static Preview HAR console captureJSON logs

Shows the fixed Preview proof had clean console evidence while proving the HAR branch.

Final production HAR screenshotPNG screenshot

Shows the fixed Playground HAR artifact behavior live in production after Amplify deployed PR #144.

Final production HAR receipt job_ad6fa952JSON metadata

Shows final production passed the async Script screenshot, console, HAR, fallback-avoidance, overflow, fatal-console, and warning checks.

Final production HAR console captureJSON logs

Shows the final production proof had zero warning events and no fatal console or page errors.

Catch 46

Playground hid a single partial screenshot label

Back to top
Playground hid a single partial screenshot label evidence screenshot
May 17, 2026< $0.01Riddle sitePlaygroundartifact evidence
Plain-English catch card

Playground hid a single partial screenshot label

A proof surface is not done when it merely stores evidence.

What went wrong
The public Playground preserved a partial screenshot from an async Script run that ended completed_error, but hid the screenshot name when there was only one screenshot.
What Riddle caught
Initial production job job_188e7a69 passed the explicit artifacts_url, files[], download_url, completed_error, partial-results, console, no-fallback, overflow, fatal-console, and warning checks across desktop, phone, iPad Mini, and iPad, but failed because .screenshots-section did not visibly contain rp482-explicit-before-error.
Why it matters
This is proof-driven-product material: Riddle Proof caught an evidence-reviewability bug inside Riddle Playground itself after the underlying async artifact contract already worked.
What changed
For hosted tool and agent UI audits: prove the service-returned artifact URL, artifact payload shape, partial-failure evidence, visible artifact labels, console evidence, forbidden fallback URLs, layout, fatal-console health, and warning hygiene together.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
The public Playground should render partial async Script evidence from the explicit service-provided artifacts URL, including visible screenshot names for single-screenshot completed_error runs.

Claim: The public Playground should render partial async Script evidence from the explicit service-provided artifacts URL, including visible screenshot names for single-screenshot completed_error runs.

Bug: The public Playground preserved a partial screenshot from an async Script run that ended completed_error, but hid the screenshot name when there was only one screenshot.

Why normal checks missed it: The hard artifact contract worked: /v1/run returned an explicit artifacts_url, the UI used that custom endpoint instead of a guessed /artifacts path, accepted files[] with screenshot download_url, showed the structured error, preserved partial results, fetched console output, and stayed clean on overflow, fatal errors, and warnings. The remaining failure was evidence reviewability: the screenshot existed, but its visible artifact label was missing.

Why this sells Riddle Proof: This is proof-driven-product material: Riddle Proof caught an evidence-reviewability bug inside Riddle Playground itself after the underlying async artifact contract already worked.

Reusable profile seed: For hosted tool and agent UI audits: prove the service-returned artifact URL, artifact payload shape, partial-failure evidence, visible artifact labels, console evidence, forbidden fallback URLs, layout, fatal-console health, and warning hygiene together.

What the browser run checked

  • Loaded /playground/ with mocked Cognito-style authenticated state across desktop, phone, iPad Mini, and iPad.
  • Submitted Script async mode and verified the request body included the script, sync:false, console log text, and saveScreenshot call.
  • Required the UI to poll the explicit custom artifacts_url returned by /v1/run exactly once per viewport and never hit the forbidden guessed /artifacts fallback.
  • Required files[] artifacts with screenshot download_url and console url to render during a completed_error result.
  • Required the structured error, partial results copy, screenshot label rp482-explicit-before-error, and console output to be visible.
  • Required empty screenshot/console states, failed-step copy, app errors, horizontal overflow, fatal errors, and console warnings to stay absent.

Proof lesson

A proof surface is not done when it merely stores evidence. Failed or partial agent runs need reviewable artifact labels so a reader can connect screenshots, receipts, and console evidence without guessing.

ArtifactTypeWhat it proves
Failing Playground result screenshotPNG screenshot

Shows the failed production Playground result after the partial screenshot was preserved but the single screenshot label was not visible.

Failing Playground receipt job_188e7a69JSON metadata

Records the explicit artifacts_url, files[] artifact parsing, forbidden fallback URL hit count, completed_error evidence, and missing screenshot-label assertion across the viewport matrix.

Failing Playground console captureJSON logs

Shows the failing production proof did not depend on unrelated browser warning or fatal-console noise.

Fixed static Preview screenshotPNG screenshot

Shows the deploy candidate rendering the same completed_error result with the single partial screenshot label visible.

Fixed static Preview receipt job_ba0cfaf8JSON metadata

Shows the Preview fix passed all 24 artifact, evidence, network, layout, fatal-console, and warning checks.

Fixed static Preview console captureJSON logs

Shows the fixed Preview proof preserved clean console evidence while proving the artifact-label repair.

Final production screenshotPNG screenshot

Shows the fixed Playground behavior live in production after Amplify deployed PR #142.

Final production receipt job_8d08662dJSON metadata

Shows final production passed 24 checks across desktop, phone, iPad Mini, and iPad with the single screenshot label visible.

Final production console captureJSON logs

Shows the final production proof had zero warning events and no fatal console or page errors.

Catch 47

OpenClaw Moltbook article was referenced but unpublished

Back to top
OpenClaw Moltbook article was referenced but unpublished evidence screenshot
May 17, 2026< $0.01Riddle siteOpenClawblog route
Plain-English catch card

OpenClaw Moltbook article was referenced but unpublished

Public proof stories need route, markdown, sitemap, and placeholder checks together.

What went wrong
Riddle public blog and agent-facing surfaces referred to the OpenClaw/Moltbook article, but /blog/openclaw-moltbook returned 404 and the unpublished source still carried draft placeholder copy.
What Riddle caught
Production jobs job_9e485320 and job_15a4c2d6 failed /blog/openclaw-moltbook across desktop, phone, iPad Mini, and iPad: the route was missing, .blog-post was absent, expected article text was missing, and route/fatal evidence was captured.
Why it matters
This is a public proof-surface catch: Riddle Proof found a broken story route and draft-copy debt in the article explaining Riddle Proof/OpenClaw work, then proved the Preview and production fix with durable artifacts.
What changed
For public article and agent-surface audits: prove the rendered route, index/discovery link, raw markdown export, sitemap entry, placeholder absence, viewport layout, fatal-console health, and warning hygiene together.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A public blog article referenced by Riddle-owned discovery surfaces should exist as a rendered route and raw markdown export, and should not contain unfinished placeholder copy.

Claim: A public blog article referenced by Riddle-owned discovery surfaces should exist as a rendered route and raw markdown export, and should not contain unfinished placeholder copy.

Bug: Riddle public blog and agent-facing surfaces referred to the OpenClaw/Moltbook article, but /blog/openclaw-moltbook returned 404 and the unpublished source still carried draft placeholder copy.

Why normal checks missed it: The surrounding blog index, footer, sitemap, and existing article pages were healthy. The issue only surfaced when the proof treated the specific article route, its raw markdown export, and finished-copy markers as one public contract across the viewport matrix.

Why this sells Riddle Proof: This is a public proof-surface catch: Riddle Proof found a broken story route and draft-copy debt in the article explaining Riddle Proof/OpenClaw work, then proved the Preview and production fix with durable artifacts.

Reusable profile seed: For public article and agent-surface audits: prove the rendered route, index/discovery link, raw markdown export, sitemap entry, placeholder absence, viewport layout, fatal-console health, and warning hygiene together.

What the browser run checked

  • Loaded /blog/openclaw-moltbook across desktop, phone, iPad Mini, and iPad.
  • Required the blog post shell and expected OpenClaw/Moltbook article text to be visible.
  • Required the article raw markdown export to return 200 with finished article body text.
  • Rejected visible placeholder copy such as TBD, Replace with actual data, and List real issues here after migration.
  • Re-ran the same route, markdown, placeholder, overflow, fatal-console, and warning contract on static Preview and final production.

Proof lesson

Public proof stories need route, markdown, sitemap, and placeholder checks together. A referenced article is not real proof material until the rendered route and raw agent-facing markdown both load and carry finished copy.

ArtifactTypeWhat it proves
Failing OpenClaw Moltbook route screenshotPNG screenshot

Shows the public route before the article existed while the same run checked for the expected blog post content.

Failing OpenClaw Moltbook receipt job_9e485320JSON metadata

Records the missing route, missing blog post selector, missing article text, and fatal route evidence across the viewport matrix.

Failing OpenClaw Moltbook console captureJSON logs

Preserves the browser console/page evidence from the failing route proof.

Fixed static Preview screenshotPNG screenshot

Shows the fixed article in mounted Preview after the heading mismatch caught by the first Preview proof was corrected.

Fixed static Preview receipt job_561a87acJSON metadata

Shows the deploy candidate passed route, rendered text, raw markdown, placeholder, overflow, fatal-console, and warning checks.

Fixed static Preview console captureJSON logs

Shows the fixed Preview proof had no console warning or fatal-error noise.

Final production screenshotPNG screenshot

Shows the published OpenClaw/Moltbook article live on production after Amplify deployed the fix.

Final production receipt job_a9b4b56eJSON metadata

Shows final production passed all 14 article route and markdown checks across desktop, phone, iPad Mini, and iPad.

Final production console captureJSON logs

Shows the final production proof had zero warning events and no fatal console or page errors.

Catch 48

Good Catch Diary preloaded noisy route assets

Back to top
Good Catch Diary preloaded noisy route assets evidence screenshot
May 17, 2026< $0.01Riddle siteGood Catch Diarywarning hygiene
Plain-English catch card

Good Catch Diary preloaded noisy route assets

Public proof-storytelling pages should be quiet enough to inspect.

What went wrong
The human-facing Good Catch Diary rendered the right stories and artifact links, but the browser console still emitted unused CSS preload warnings from prefetched internal routes.
What Riddle caught
Initial production job job_3034ef9f proved /proof/good-catches/ had 39 cards, healthy screenshots, healthy evidence anchors, clean llms/docs checks, and 0 fatal errors, but failed no_console_warnings with four unused Next CSS preload warnings.
Why it matters
This is a public proof-surface catch: Riddle Proof found browser warning debt on the page that sells Riddle Proof catches, then proved the fix through Preview and production with durable artifacts.
What changed
For proof diaries, galleries, and marketing evidence pages: prove story count, newest story text, screenshot artifact health, evidence-link health, machine-readable discovery surfaces, layout, fatal-console health, and no_console_warnings in one reusable profile.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
The public Good Catch Diary should be current, artifact-backed, and clean enough to use as proof material: route, content, artifacts, evidence links, layout, fatal-console health, and warning hygiene should all pass together.

Claim: The public Good Catch Diary should be current, artifact-backed, and clean enough to use as proof material: route, content, artifacts, evidence links, layout, fatal-console health, and warning hygiene should all pass together.

Bug: The human-facing Good Catch Diary rendered the right stories and artifact links, but the browser console still emitted unused CSS preload warnings from prefetched internal routes.

Why normal checks missed it: The route loaded cleanly, all 39 catch cards rendered, the newest Profile Warnings story was visible, 39 screenshot artifacts and 39 evidence links were healthy, /llms.txt and Riddle Proof markdown were healthy, overflow stayed at 0px, and fatal console/page errors were clean. The failure lived in warning-level browser noise that only the no_console_warnings contract treats as product evidence.

Why this sells Riddle Proof: This is a public proof-surface catch: Riddle Proof found browser warning debt on the page that sells Riddle Proof catches, then proved the fix through Preview and production with durable artifacts.

Reusable profile seed: For proof diaries, galleries, and marketing evidence pages: prove story count, newest story text, screenshot artifact health, evidence-link health, machine-readable discovery surfaces, layout, fatal-console health, and no_console_warnings in one reusable profile.

What the browser run checked

  • Loaded /proof/good-catches/ across desktop, phone, iPad Mini, and iPad.
  • Asserted 39 catch cards rendered and the newest Profile Warnings story stayed visible.
  • Probed all 39 diary screenshot artifacts for same-origin, nonzero image responses.
  • Probed all 39 evidence-manifest anchors for healthy same-origin HTML responses.
  • Fetched /llms.txt and /docs/riddle-proof/markdown.md to keep agent discovery and Profile Warnings docs tied to the diary proof.
  • Required horizontal overflow, fatal console/page errors, and console warnings to stay at 0 after disabling route prefetch.

Proof lesson

Public proof-storytelling pages should be quiet enough to inspect. Warning hygiene catches performance and browser-noise debt on pages that otherwise look correct, and it keeps later proof runs from normalizing noisy evidence.

ArtifactTypeWhat it proves
Failing Good Catch Diary warning screenshotPNG screenshot

Shows the production Good Catch Diary while the same run caught unused CSS preload warning noise.

Failing Good Catch Diary receipt job_3034ef9fJSON metadata

Records that content, artifact, link, layout, and fatal-console checks passed while no_console_warnings failed.

Failing Good Catch Diary console captureJSON logs

Shows the exact unused CSS preload warning samples that turned the visible page into a reproducible product contract.

Fixed static Preview receipt job_f6aebd25JSON metadata

Shows the deploy candidate passed all 14 diary checks after route prefetch was disabled.

Fixed static Preview console captureJSON logs

Shows the fixed Preview had zero console warning events.

Fixed static Preview screenshotPNG screenshot

Shows the mounted Preview Good Catch Diary that passed the warning-hygiene profile before production deploy.

Final production receipt job_f448859cJSON metadata

Shows production passed all 14 diary checks after Amplify deployed the fix.

Final production console captureJSON logs

Shows final production proof had zero warning events and no fatal console or page errors.

Final production screenshotPNG screenshot

Shows the fixed public Good Catch Diary live in production.

Catch 49

Profile Warnings docs lagged behind the shipped surface

Back to top
Profile Warnings docs lagged behind the shipped surface evidence screenshot
May 17, 2026< $0.01Riddle siteRiddle ProofProfile Warnings
Plain-English catch card

Profile Warnings docs lagged behind the shipped surface

The proof product needs proof for its own proof-authoring contract.

What went wrong
Riddle Proof had shipped nonblocking profile warnings for ambiguous network mock response selectors, but the public Riddle Proof docs and agent-facing markdown did not mention Profile Warnings or the warnings result field.
What Riddle caught
Initial production job job_6d37e766 proved /docs/riddle-proof/ was otherwise healthy while rendered docs missed Profile Warnings and /docs/riddle-proof/markdown.md missed Profile Warnings plus warnings.
Why it matters
This is a self-audit catch with durable artifacts: Riddle Proof caught docs drift for its own newly shipped warning surface and proved the human page, agent markdown, Preview, and production fix with the same profile.
What changed
For proof-product docs: prove rendered terms, raw markdown body terms, sidebar/navigation exposure, viewport layout, fatal-console health, and post-deploy production state together whenever a proof package adds a new reusable contract.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Riddle Proof docs should expose Profile Warnings in both rendered and machine-consumable docs when the package can emit nonblocking warnings for ambiguous proof profiles.

Claim: Riddle Proof docs should expose Profile Warnings in both rendered and machine-consumable docs when the package can emit nonblocking warnings for ambiguous proof profiles.

Bug: Riddle Proof had shipped nonblocking profile warnings for ambiguous network mock response selectors, but the public Riddle Proof docs and agent-facing markdown did not mention Profile Warnings or the warnings result field.

Why normal checks missed it: The docs page loaded cleanly, the existing Profile Mode section and network mock terms were visible, /docs/riddle-proof/markdown.md returned 200 text/markdown with nonzero bytes, overflow stayed at 0px, and fatal console/page errors were clean. The gap was semantic: the newly shipped warning surface was absent from both human and machine-consumable docs.

Why this sells Riddle Proof: This is a self-audit catch with durable artifacts: Riddle Proof caught docs drift for its own newly shipped warning surface and proved the human page, agent markdown, Preview, and production fix with the same profile.

Reusable profile seed: For proof-product docs: prove rendered terms, raw markdown body terms, sidebar/navigation exposure, viewport layout, fatal-console health, and post-deploy production state together whenever a proof package adds a new reusable contract.

What the browser run checked

  • Loaded /docs/riddle-proof/ across desktop, phone, iPad Mini, and iPad.
  • Asserted existing Riddle Proof and Profile Mode terms remained visible, including network_mocks and request_body_contains.
  • Required rendered docs to expose Profile Warnings.
  • Fetched /docs/riddle-proof/markdown.md and required Profile Warnings, request_body_contains, network_mocks, and warnings in the raw markdown body.
  • Required horizontal overflow to stay within the 1px contract and fatal console/page errors to stay at 0.
  • Re-ran the same rendered and markdown contract on static Preview and production after adding the Profile Warnings section and guards.

Proof lesson

The proof product needs proof for its own proof-authoring contract. When a package adds evidence, warnings, or profile semantics, rendered docs and raw agent-facing markdown should be audited together so agents do not rediscover shipped behavior from changelogs or dogfood notes.

ArtifactTypeWhat it proves
Failing Profile Warnings docs screenshotPNG screenshot

Shows the production Riddle Proof docs before the Profile Warnings section existed, while the same run checked rendered and markdown terms.

Failing Profile Warnings receipt job_6d37e766JSON metadata

Records that the page was otherwise healthy while rendered docs and raw markdown missed the newly shipped warning surface.

Failing Profile Warnings console captureJSON logs

Shows the initial failure was not caused by fatal browser console or page errors.

Static Preview receipt job_cc796a28JSON metadata

Shows the deploy candidate passed the rendered docs and mounted markdown contract across desktop, phone, iPad Mini, and iPad.

Static Preview console captureJSON logs

Shows the Preview proof had no fatal console or page errors.

Static Preview screenshotPNG screenshot

Shows the Preview docs with Profile Warnings visible before production deploy.

Final production receipt job_ad1cc6ecJSON metadata

Shows production passed all 9 checks after Amplify deployed the docs update.

Final production console captureJSON logs

Shows final production proof had no fatal console or page errors.

Final production screenshotPNG screenshot

Shows the fixed public Riddle Proof docs with Profile Warnings live in production.

Catch 50

Builder accepted a saved preview path as a fresh build

Back to top
Builder accepted a saved preview path as a fresh build evidence screenshot
May 17, 2026< $0.05LilArcadeBuilderpreview boundary
Plain-English catch card

Builder accepted a saved preview path as a fresh build

Preview URL safety is contextual.

What went wrong
LilArcade Builder accepted a same-host saved preview URL from /saved/riddle-proof-v454-sneaky-existing/index.html as if it were a fresh build artifact, leaving Open in new tab and Save to Arcade available instead of rejecting the stale saved-artifact path.
What Riddle caught
Initial production job job_643e881d caught Builder rendering Open in new tab and Save to Arcade for the same-host saved preview URL, with forbidden-saved-preview-path-v454 hit twice.
Why it matters
This is a real product-boundary catch: normal same-host URL checks were not enough, but browser proof caught the unsafe Builder preview path before it became a reusable save/open escape.
What changed
For generated-app builders: prove auth setup, build response URL classification, forbidden artifact-path hit counts, visible failure copy, and a full recovery save/play path in one profile.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Builder build previews must accept only fresh temporary build preview URLs; same-host saved artifact paths must be rejected before Open in new tab or Save to Arcade can act on them.

Claim: Builder build previews must accept only fresh temporary build preview URLs; same-host saved artifact paths must be rejected before Open in new tab or Save to Arcade can act on them.

Bug: LilArcade Builder accepted a same-host saved preview URL from /saved/riddle-proof-v454-sneaky-existing/index.html as if it were a fresh build artifact, leaving Open in new tab and Save to Arcade available instead of rejecting the stale saved-artifact path.

Why normal checks missed it: The unsafe URL used the trusted preview bucket host, auth and chat both worked, and the page stayed visually stable. The defect only surfaced when proof treated Builder preview context as stricter than saved-player context and required the forbidden saved-preview-path mock to stay at zero hits.

Why this sells Riddle Proof: This is a real product-boundary catch: normal same-host URL checks were not enough, but browser proof caught the unsafe Builder preview path before it became a reusable save/open escape.

Reusable profile seed: For generated-app builders: prove auth setup, build response URL classification, forbidden artifact-path hit counts, visible failure copy, and a full recovery save/play path in one profile.

What the browser run checked

  • Logged into Builder through mocked auth across desktop, phone, iPad Mini, and iPad.
  • Returned a build response whose preview URL pointed at a trusted-host /saved/ artifact path.
  • Required Builder to show Build failed: Invalid preview URL instead of rendering Open in new tab or Save to Arcade for that path.
  • Tracked forbidden-saved-preview-path-v454 and required 0 network hits.
  • Then recovered with a fresh temporary build preview, saved the game, opened the saved player, and asserted the iframe route/content/overflow contract.
  • Re-ran the same contract on static Preview and production after the Builder-only preview URL gate shipped.

Proof lesson

Preview URL safety is contextual. A URL sanitizer that is correct for saved player pages can be too permissive for fresh Builder builds, so proof profiles should encode allowed artifact prefixes, forbidden network hits, and recovery behavior together.

ArtifactTypeWhat it proves
Failing saved-preview-path screenshotPNG screenshot

Shows production Builder accepted the same-host saved preview URL and exposed Open in new tab plus Save to Arcade.

Failing saved-preview-path receipt job_643e881dJSON metadata

Records the product regression: the forbidden saved-preview-path mock was hit twice and the recovery save/play checks could not complete.

Failing saved-preview-path console captureJSON logs

Shows the catch was not caused by fatal browser console or page errors.

Fixed static Preview rejection screenshotPNG screenshot

Shows the deploy candidate rejected the same-host saved preview URL with Build failed: Invalid preview URL.

Fixed static Preview receipt job_5e778318JSON metadata

Shows the fixed Preview passed the saved-path rejection and full recovery save/play contract across desktop, phone, iPad Mini, and iPad.

Fixed static Preview console captureJSON logs

Shows the deploy candidate had no fatal console or page errors during the recovery proof.

Final production saved-player screenshotPNG screenshot

Shows final production still loaded the saved recovery player after rejecting the unsafe build preview path.

Final production receipt job_bc56fa3cJSON metadata

Shows production passed all 16 checks with forbidden-saved-preview-path-v454 at 0 hits after deploy.

Final production console captureJSON logs

Shows final production proof had no fatal console or page errors.

Catch 51

Evidence Manifest preloaded noisy unused assets

Back to top
Evidence Manifest preloaded noisy unused assets evidence screenshot
May 16, 2026< $0.01Riddle siteGood Catch Diarywarning hygiene
Plain-English catch card

Evidence Manifest preloaded noisy unused assets

Warning hygiene deserves its own contract.

What went wrong
The public Good Catch Evidence Manifest looked healthy, but browser console evidence showed unused preload warnings from automatic route prefetch and eager below-the-fold proof screenshots.
What Riddle caught
Initial production job job_07d46452 used the new no_console_warnings contract and caught 9 unused Next CSS preload warnings while the page otherwise passed.
Why it matters
This is a proof-system dogfood catch: a newly promoted base Riddle Proof warning contract immediately found real warning/performance debt on Riddle public proof material.
What changed
For proof galleries, evidence manifests, and long artifact pages: prove route/content/layout/fatal-console health together with no_console_warnings so warning debt cannot accumulate under otherwise green browser proof.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
The public Good Catch Evidence Manifest should be clean enough to serve as proof material: route, content, layout, fatal-console health, and warning hygiene should all pass together.

Claim: The public Good Catch Evidence Manifest should be clean enough to serve as proof material: route, content, layout, fatal-console health, and warning hygiene should all pass together.

Bug: The public Good Catch Evidence Manifest looked healthy, but browser console evidence showed unused preload warnings from automatic route prefetch and eager below-the-fold proof screenshots.

Why normal checks missed it: The route loaded, all 36 manifest cards rendered, the Profile Mode docs catch and job IDs were visible, overflow stayed at 0px, and fatal console/page errors were clean. The issue lived in warning-level browser noise that the previous fatal-console contract intentionally ignored.

Why this sells Riddle Proof: This is a proof-system dogfood catch: a newly promoted base Riddle Proof warning contract immediately found real warning/performance debt on Riddle public proof material.

Reusable profile seed: For proof galleries, evidence manifests, and long artifact pages: prove route/content/layout/fatal-console health together with no_console_warnings so warning debt cannot accumulate under otherwise green browser proof.

What the browser run checked

  • Loaded /proof/good-catches/evidence/ across desktop, phone, iPad Mini, and iPad.
  • Asserted the manifest rendered at least 36 cards and still showed the Profile Mode docs catch plus v441 job IDs.
  • Required horizontal overflow to stay within the 1px contract.
  • Required fatal console/page errors to stay at 0.
  • Required the new no_console_warnings Profile Mode check to report 0 unallowed warnings.
  • Re-ran the same contract on static Preview and production after disabling route prefetch and lazy-loading evidence screenshots.

Proof lesson

Warning hygiene deserves its own contract. Nonfatal browser warnings can hide performance debt and make later proof runs noisy, so mature public evidence pages should be able to require zero unallowed warnings.

ArtifactTypeWhat it proves
Failing warning-hygiene screenshotPNG screenshot

Shows the production Evidence Manifest looked healthy while the same run caught warning-level preload noise.

Failing warning-hygiene receipt job_07d46452JSON metadata

Records that route, content, overflow, and fatal-console checks passed while no_console_warnings failed on 9 unused preload warnings.

Failing warning-hygiene console captureJSON logs

Shows the exact unused preload warning samples that turned the noisy page into a reproducible product contract.

Fixed static Preview receipt job_1a70de27JSON metadata

Shows the deploy candidate passed warning hygiene across desktop, phone, iPad Mini, and iPad before production deploy.

Fixed static Preview console captureJSON logs

Shows the fixed Preview had zero console warning events.

Fixed static Preview screenshotPNG screenshot

Shows the deploy candidate Evidence Manifest after route prefetch and image loading changes.

Final production receipt job_4339e21cJSON metadata

Shows production passed all 9 checks with zero warnings after Amplify deploy.

Final production console captureJSON logs

Shows final production warning hygiene stayed clean.

Final production screenshotPNG screenshot

Shows the final production Evidence Manifest with the warning-hygiene fix live.

Catch 52

Profile Mode docs lagged behind proof primitives

Back to top
Profile Mode docs lagged behind proof primitives evidence screenshot
May 16, 2026< $0.01Riddle siteRiddle ProofProfile Mode
Plain-English catch card

Profile Mode docs lagged behind proof primitives

The proof surface itself needs proof.

What went wrong
The public Riddle Proof docs explained profile text semantics, but they did not document the Profile Mode primitives that recent real proof runs were using: network mocks, repeated and delayed responses, request-body receipts, setup actions, and iframe checks.
What Riddle caught
Initial production job job_bb0aa65a proved /docs/riddle-proof/ was healthy while both rendered docs and /docs/riddle-proof/markdown.md missed Profile Mode, network_mocks, repeat_responses, delay_ms, request_body_contains, setup_actions, frame_text_visible, and frame_url_equals.
Why it matters
This is a self-audit catch: Riddle Proof found that Riddle Proof docs had fallen behind the exact reusable profile primitives being used for real audits.
What changed
For proof-product docs: prove rendered docs, raw markdown body terms, viewport layout, fatal-console health, and public proof-story promotion together so the human and agent surfaces stay aligned.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Riddle Proof docs should expose the Profile Mode contract in both rendered and machine-consumable docs so agents can reuse audit/proof profiles without rediscovering the primitives from dogfood notes.

Claim: Riddle Proof docs should expose the Profile Mode contract in both rendered and machine-consumable docs so agents can reuse audit/proof profiles without rediscovering the primitives from dogfood notes.

Bug: The public Riddle Proof docs explained profile text semantics, but they did not document the Profile Mode primitives that recent real proof runs were using: network mocks, repeated and delayed responses, request-body receipts, setup actions, and iframe checks.

Why normal checks missed it: The docs page loaded cleanly, the existing Profile Text Semantics section was visible, the raw markdown route returned 200, overflow stayed at 0px, and fatal console evidence was clean. The drift was semantic: the rendered and machine-consumable docs had not caught up with the reusable proof contract.

Why this sells Riddle Proof: This is a self-audit catch: Riddle Proof found that Riddle Proof docs had fallen behind the exact reusable profile primitives being used for real audits.

Reusable profile seed: For proof-product docs: prove rendered docs, raw markdown body terms, viewport layout, fatal-console health, and public proof-story promotion together so the human and agent surfaces stay aligned.

What the browser run checked

  • Loaded /docs/riddle-proof/ across desktop, phone, iPad Mini, and iPad.
  • Asserted the existing Profile Text Semantics section remained visible while Profile Mode and the newer profile primitives were missing.
  • Fetched /docs/riddle-proof/markdown.md and asserted the raw markdown carried Profile Mode, network_mocks, repeat_responses, delay_ms, request_body_contains, setup_actions, frame_text_visible, and frame_url_equals.
  • Re-ran the same rendered and markdown contract on static Preview and production after adding the Profile Mode section.

Proof lesson

The proof surface itself needs proof. When a package adds or relies on reusable audit primitives, public rendered docs and raw agent-facing markdown should be tested as product contracts.

ArtifactTypeWhat it proves
Failing Profile Mode docs screenshotPNG screenshot

Shows the production Riddle Proof docs before the Profile Mode section existed, while the same run checked rendered and markdown terms.

Failing Profile Mode receipt job_bb0aa65aJSON metadata

Records that the page was otherwise healthy while rendered docs and raw markdown missed the Profile Mode contract.

Failing Profile Mode console captureJSON logs

Shows the initial failure was not browser runtime noise; the public docs contract itself was incomplete.

Static Preview receipt job_88ad03aaJSON metadata

Shows the updated Profile Mode docs and markdown contract passed on static Preview before production deploy.

Static Preview screenshotPNG screenshot

Shows the deploy candidate docs with the Profile Mode section present.

Final production receipt job_22ee6a7cJSON metadata

Shows the fixed production docs passed all 14 checks across desktop, phone, iPad Mini, and iPad.

Final production console captureJSON logs

Shows the final production proof had no fatal console or page errors.

Final production screenshotPNG screenshot

Shows the fixed public Riddle Proof docs with Profile Mode live in production.

Catch 53

llms.txt hid the raw proof bundle

Back to top
llms.txt hid the raw proof bundle evidence screenshot
May 16, 2026< $0.01Riddle sitellms.txtproof receipts
Plain-English catch card

llms.txt hid the raw proof bundle

Agent indexes should point to raw receipts, not only review pages.

What went wrong
The public llms.txt agent index linked to the human proof example page, but it did not link directly to the raw machine-consumable proof bundle that agents should ingest.
What Riddle caught
Production job job_a5d4383b proved /llms.txt, /examples/riddle-proof/, and /examples/riddle-proof/docs-live-proof-bundle.json were healthy while llms.txt omitted the raw bundle URL.
Why it matters
This is an agent-discovery catch: the machine-consumable proof receipt was public and healthy, but the compact agent entrypoint hid it.
What changed
For llms.txt and agent indexes: prove the file route, raw linked bodies, neighboring human review page, viewport layout, fatal-console health, and direct links to machine-consumable artifacts together.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Riddle llms.txt should link agents directly to the raw proof example bundle, not only to the human proof example page.

Claim: Riddle llms.txt should link agents directly to the raw proof example bundle, not only to the human proof example page.

Bug: The public llms.txt agent index linked to the human proof example page, but it did not link directly to the raw machine-consumable proof bundle that agents should ingest.

Why normal checks missed it: The file itself returned 200 text/plain, the human proof example page returned 200, the raw JSON bundle returned 200 application/json, overflow stayed at 0px, and fatal console evidence was clean. The missing contract was discovery: agents had to infer the raw proof receipt from the human page.

Why this sells Riddle Proof: This is an agent-discovery catch: the machine-consumable proof receipt was public and healthy, but the compact agent entrypoint hid it.

Reusable profile seed: For llms.txt and agent indexes: prove the file route, raw linked bodies, neighboring human review page, viewport layout, fatal-console health, and direct links to machine-consumable artifacts together.

What the browser run checked

  • Loaded /llms.txt across desktop, phone, iPad Mini, and iPad.
  • Fetched /llms.txt and asserted it contained proof receipts, the Evidence Manifest, the proof example page, and the raw proof bundle URL.
  • Fetched /examples/riddle-proof/ and asserted the human review page still showed Proof Bundle Example, Raw bundle, and Bring your agent; Riddle brings the proof.
  • Fetched /examples/riddle-proof/docs-live-proof-bundle.json and asserted the raw JSON carried riddle-proof.example-bundle.v1, proof receipts, agent-proof, publicArtifactUrls, and the product promise.
  • Re-ran the same contract on static Preview and production after adding the raw bundle link and static llms guard.

Proof lesson

Agent indexes should point to raw receipts, not only review pages. If a product publishes machine-consumable proof artifacts, the compact discovery surface needs to expose them directly.

ArtifactTypeWhat it proves
Failing llms.txt screenshotPNG screenshot

Shows the production llms.txt page while the same proof run checked that the raw proof bundle URL was missing from the agent index.

Failing llms.txt receipt job_a5d4383bJSON metadata

Records that llms.txt, the proof example page, and the raw proof bundle were healthy while the raw bundle link was absent from the agent index.

Failing llms.txt console captureJSON logs

Shows the initial failure was not browser runtime noise; the discovery contract itself was incomplete.

Static Preview receipt job_df22fbc2JSON metadata

Shows the updated llms.txt and raw bundle contract passed on static Preview before production deploy.

Static Preview screenshotPNG screenshot

Shows the deploy candidate llms.txt with the Raw proof bundle JSON link present.

Final production receipt job_ceafae1bJSON metadata

Shows the fixed production llms.txt passed the proof-bundle discovery contract across desktop, phone, iPad Mini, and iPad.

Final production console captureJSON logs

Shows the final production proof had no fatal console or page errors.

Final production screenshotPNG screenshot

Shows production llms.txt after the raw proof bundle link shipped.

Catch 54

Proof example bundle drifted behind the agent-proof contract

Back to top
Proof example bundle drifted behind the agent-proof contract evidence screenshot
May 16, 2026< $0.01Riddle siteproof receiptsagent-proof
Plain-English catch card

Proof example bundle drifted behind the agent-proof contract

Proof examples are product surfaces too.

What went wrong
The public proof example page rendered cleanly and linked to healthy artifacts, but the raw JSON bundle that agents consume was stale and did not carry proof receipts, the Bring your agent; Riddle brings the proof promise, or the agent-proof contract.
What Riddle caught
Initial production job job_30609bc5 proved the page and seven artifact links were healthy while /examples/riddle-proof/docs-live-proof-bundle.json missed proof receipts, Bring your agent; Riddle brings the proof, and agent-proof.
Why it matters
This is a machine-consumable proof-surface catch: the public page looked healthy, but Riddle Proof found the raw agent-facing bundle had drifted behind the product contract.
What changed
For public proof examples: prove rendered status, artifact link quality, raw JSON body terms, viewport layout, and fatal-console health together so human and agent surfaces cannot drift independently.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Riddle public proof examples should keep the rendered review page, artifact links, and raw agent-facing proof bundle aligned with the current proof-loop contract.

Claim: Riddle public proof examples should keep the rendered review page, artifact links, and raw agent-facing proof bundle aligned with the current proof-loop contract.

Bug: The public proof example page rendered cleanly and linked to healthy artifacts, but the raw JSON bundle that agents consume was stale and did not carry proof receipts, the Bring your agent; Riddle brings the proof promise, or the agent-proof contract.

Why normal checks missed it: The human page looked trustworthy: the route loaded, the proof example status was passed, all seven artifact links were healthy, overflow was 0px, and fatal console evidence was clean. The drift lived in the machine-consumable proof contract behind the page.

Why this sells Riddle Proof: This is a machine-consumable proof-surface catch: the public page looked healthy, but Riddle Proof found the raw agent-facing bundle had drifted behind the product contract.

Reusable profile seed: For public proof examples: prove rendered status, artifact link quality, raw JSON body terms, viewport layout, and fatal-console health together so human and agent surfaces cannot drift independently.

What the browser run checked

  • Loaded /examples/riddle-proof/ across desktop, phone, iPad Mini, and iPad.
  • Asserted the proof example page rendered Proof Bundle Example, passed, live-url, and Raw bundle.
  • Checked that all seven public artifact links returned healthy, nonzero image or JSON responses.
  • Fetched /examples/riddle-proof/docs-live-proof-bundle.json and asserted the raw JSON carried proof receipts, Bring your agent; Riddle brings the proof, and agent-proof.
  • Re-ran the same contract on static Preview and production after refreshing the raw bundle and rendered Proof Contract section.

Proof lesson

Proof examples are product surfaces too. If agents are expected to consume a raw proof bundle, the proof should validate the raw JSON contract, not only the rendered page and artifact links.

ArtifactTypeWhat it proves
Failing proof example screenshotPNG screenshot

Shows the public proof example page looked healthy while the same proof run failed the raw bundle contract.

Failing proof example receipt job_30609bc5JSON metadata

Records the stale raw JSON bundle: page route, artifact links, overflow, and console health passed while proof receipts, agent-proof, and the product promise were absent.

Failing proof example console captureJSON logs

Shows the initial failure was not browser runtime noise; the public proof bundle contract itself was stale.

Static Preview receipt job_d91c3c67JSON metadata

Shows the refreshed proof example contract passed on static Preview before production deploy.

Static Preview screenshotPNG screenshot

Shows the rendered Proof Contract section on the deploy candidate.

Final production receipt job_002d95c1JSON metadata

Shows the fixed production page and raw bundle passed all 11 checks across desktop, phone, iPad Mini, and iPad.

Final production console captureJSON logs

Shows the final production proof had no fatal console or page errors.

Final production screenshotPNG screenshot

Shows the fixed public proof example with the current proof-loop contract visible to reviewers.

Catch 55

Agent Guide omitted the proof loop

Back to top
Agent Guide omitted the proof loop evidence screenshot
May 16, 2026< $0.01Riddle siteAgent GuideRiddle Proof
Plain-English catch card

Agent Guide omitted the proof loop

Agent-facing docs should connect low-level browser control to the reusable proof loop.

What went wrong
The public Agent Guide explained raw browser screenshot and /v1/run mechanics, but it did not connect agents to the reusable Riddle Proof loop, proof receipts, or the "Bring your agent; Riddle brings the proof." contract.
What Riddle caught
Production job job_77aceb4b proved the rendered Agent Guide and /ai-agents/guide/markdown.md missed Riddle Proof, proof receipts, and Bring your agent; Riddle brings the proof.
Why it matters
This is an agent-surface catch on Riddle itself: the docs were healthy as browser API docs, but proof found the missing bridge to the productized evidence loop agents should reuse.
What changed
For agent-facing docs: prove rendered copy, raw markdown copy, neighboring proof docs, responsive layout, fatal-console health, and public proof-story promotion together.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
The Agent Guide should point agents from raw browser API mechanics to the reusable Riddle Proof loop, and the public evidence surface should show the catch with durable artifacts.

Claim: The Agent Guide should point agents from raw browser API mechanics to the reusable Riddle Proof loop, and the public evidence surface should show the catch with durable artifacts.

Bug: The public Agent Guide explained raw browser screenshot and /v1/run mechanics, but it did not connect agents to the reusable Riddle Proof loop, proof receipts, or the "Bring your agent; Riddle brings the proof." contract.

Why normal checks missed it: The route loaded cleanly, the existing guide sections rendered, the raw markdown export returned 200, overflow stayed at 0px, and the neighboring Riddle Proof docs were healthy. The missing product contract was the bridge from browser primitives to durable proof workflow.

Why this sells Riddle Proof: This is an agent-surface catch on Riddle itself: the docs were healthy as browser API docs, but proof found the missing bridge to the productized evidence loop agents should reuse.

Reusable profile seed: For agent-facing docs: prove rendered copy, raw markdown copy, neighboring proof docs, responsive layout, fatal-console health, and public proof-story promotion together.

What the browser run checked

  • Loaded /ai-agents/guide/ across desktop, phone, iPad Mini, and iPad.
  • Asserted existing guide sections remained visible while Riddle Proof, proof receipts, and Bring your agent; Riddle brings the proof were missing.
  • Fetched /ai-agents/guide/markdown.md and /docs/riddle-proof/markdown.md with HTTP body/content-type checks.
  • Re-ran the contract on static Preview and production after adding the Proof Loop section and fixing invalid nested paragraph markup.

Proof lesson

Agent-facing docs should connect low-level browser control to the reusable proof loop. Otherwise every wrapper can rediscover the same pattern instead of sharing one inspectable evidence contract.

ArtifactTypeWhat it proves
Failing Agent Guide screenshotPNG screenshot

Shows the live Agent Guide before the fix while the same proof run checked rendered and markdown proof-loop copy.

Failing Agent Guide receipt job_77aceb4bJSON metadata

Records the missing Riddle Proof, proof receipts, and Bring your agent copy across rendered and markdown surfaces.

Failing Agent Guide console captureJSON logs

Shows the initial failure was not browser runtime noise; the public docs contract itself was incomplete.

Static Preview React receipt job_79a6afb6JSON metadata

Shows Riddle caught five fatal React #418 page errors from invalid nested paragraph markup before production.

Final static Preview receipt job_e8b53136JSON metadata

Shows the corrected static Preview passed after flattening the MDX paragraphs.

Final production receipt job_5d94bf48JSON metadata

Shows the fixed Agent Guide and markdown export passed the proof-loop contract in production.

Final production console captureJSON logs

Shows the final production proof had no fatal console or page errors.

Final Agent Guide screenshotPNG screenshot

Shows the fixed public Agent Guide with the Proof Loop section live in production.

Catch 56

Riddle had no llms.txt agent index

Back to top
Riddle had no llms.txt agent index evidence screenshot
May 16, 2026< $0.01Riddle sitellms.txtagent discovery
Plain-English catch card

Riddle had no llms.txt agent index

Agent-facing product surfaces need an index, not just scattered docs.

What went wrong
Riddle had markdown docs, Proof docs, Preview docs, MCP docs, OpenAPI YAML, and robots surfaces, but no public llms.txt entrypoint for agents to discover the set quickly.
What Riddle caught
Initial production job job_8fc84c72 proved /llms.txt returned 404 in desktop, phone, iPad Mini, and iPad while docs markdown, Riddle Proof markdown, Preview markdown, MCP markdown, OpenAPI YAML, and robots all stayed healthy.
Why it matters
This is an agent-discovery catch: the public docs were individually healthy, but browser proof found that agents had no compact starting point for the product surface.
What changed
For agent-facing products: prove llms.txt, raw markdown docs, OpenAPI specs, sitemap, robots, and public proof examples together so machine-consumable surfaces cannot drift independently.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Riddle should expose a compact llms.txt agent index that points agents to docs, Proof, Preview, MCP, OpenAPI, sitemap, robots, and public proof evidence.

Claim: Riddle should expose a compact llms.txt agent index that points agents to docs, Proof, Preview, MCP, OpenAPI, sitemap, robots, and public proof evidence.

Bug: Riddle had markdown docs, Proof docs, Preview docs, MCP docs, OpenAPI YAML, and robots surfaces, but no public llms.txt entrypoint for agents to discover the set quickly.

Why normal checks missed it: The neighboring machine-readable surfaces were healthy and the rendered site worked. The missing contract was the compact agent index itself, which only surfaced when the proof treated agent-readable docs and discovery links as first-class product behavior.

Why this sells Riddle Proof: This is an agent-discovery catch: the public docs were individually healthy, but browser proof found that agents had no compact starting point for the product surface.

Reusable profile seed: For agent-facing products: prove llms.txt, raw markdown docs, OpenAPI specs, sitemap, robots, and public proof examples together so machine-consumable surfaces cannot drift independently.

What the browser run checked

  • Loaded /llms.txt across desktop, phone, iPad Mini, and iPad.
  • Fetched the adjacent markdown docs, Proof docs, Preview docs, MCP docs, OpenAPI YAML, and robots policy with status, content-type, byte, and body checks.
  • Confirmed the neighboring machine-readable surfaces were healthy while /llms.txt itself returned a 404 page.
  • Re-ran the contract after adding public/llms.txt and a static guard that preserves required agent-index links.

Proof lesson

Agent-facing product surfaces need an index, not just scattered docs. A site can expose the right pieces individually while still making agents guess where to start.

ArtifactTypeWhat it proves
Failing llms.txt screenshotPNG screenshot

Shows the production 404 page for /llms.txt while the same proof run checked the adjacent agent-facing docs surfaces.

Failing llms.txt receipt job_8fc84c72JSON metadata

Records the /llms.txt 404, missing body fragments, healthy neighboring markdown/OpenAPI/robots checks, viewport matrix, overflow, and console evidence.

Failing llms.txt console captureJSON logs

Shows the catch was not browser runtime noise; the public agent index contract itself was absent.

Final llms.txt receipt job_b0dc37deJSON metadata

Shows the final production contract passed after deploy, including 200 text/plain, required agent-readable docs links, 0px overflow, and clean fatal-console evidence.

Catch 57

Sitemap hid public Riddle routes from crawlers

Back to top
Sitemap hid public Riddle routes from crawlers evidence screenshot
May 16, 2026< $0.01Riddle sitesitemapagent discovery
Plain-English catch card

Sitemap hid public Riddle routes from crawlers

Agent-facing contracts include sitemap and discovery surfaces.

What went wrong
Riddle public pages rendered correctly, but sitemap.xml omitted docs, MCP, blog, guide, proof, and Good Catch routes that crawlers and agents rely on for discovery.
What Riddle caught
Production jobs job_f39d58a4 and job_06fb59fc proved sitemap.xml was missing docs/MCP and public-content routes while the target pages themselves loaded cleanly.
Why it matters
This is a machine-consumable discovery catch: the product looked fine to humans, but browser proof found that crawler and agent discovery was stale.
What changed
For public docs and marketing sites: assert rendered page health, sitemap route coverage, robots policy, and raw agent artifacts together so human and machine surfaces cannot drift apart.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Riddle public docs, blog, guide, proof, and Good Catch routes should be discoverable through sitemap.xml, not only reachable by direct URL.

Claim: Riddle public docs, blog, guide, proof, and Good Catch routes should be discoverable through sitemap.xml, not only reachable by direct URL.

Bug: Riddle public pages rendered correctly, but sitemap.xml omitted docs, MCP, blog, guide, proof, and Good Catch routes that crawlers and agents rely on for discovery.

Why normal checks missed it: The visible pages were healthy: docs and blog routes loaded, route text was present, responsive overflow stayed at 0px, and console/page evidence was clean. The defect lived in a machine-consumable discovery file, not in the rendered page.

Why this sells Riddle Proof: This is a machine-consumable discovery catch: the product looked fine to humans, but browser proof found that crawler and agent discovery was stale.

Reusable profile seed: For public docs and marketing sites: assert rendered page health, sitemap route coverage, robots policy, and raw agent artifacts together so human and machine surfaces cannot drift apart.

What the browser run checked

  • Loaded the live docs and blog pages across desktop, phone, iPad Mini, and iPad.
  • Fetched https://riddledc.com/sitemap.xml from the browser proof and asserted required route entries.
  • Confirmed rendered page health, responsive overflow, and fatal console/page health stayed clean while sitemap coverage failed.
  • Re-ran the contract after converting the sitemap to a generated route manifest guarded by static validation.

Proof lesson

Agent-facing contracts include sitemap and discovery surfaces. A page can be perfectly healthy for a human visitor while still being invisible to crawlers, maps, and docs-ingestion workflows.

ArtifactTypeWhat it proves
Failing docs sitemap screenshotPNG screenshot

Shows the rendered docs page was healthy while the same proof run failed sitemap coverage for docs and MCP routes.

Failing docs sitemap receipt job_f39d58a4JSON metadata

Records the missing sitemap entries for docs tool routes and MCP while page route, overflow, and browser health checks passed.

Failing docs sitemap console captureJSON logs

Preserves the browser-console evidence from the same failing docs sitemap proof run.

Failing public-content sitemap receipt job_06fb59fcJSON metadata

Records the broader missing sitemap entries for about, blog, guide, and proof pages.

Failing public-content sitemap console captureJSON logs

Keeps console evidence attached to the broader public-content sitemap failure.

Final generated-sitemap guard receipt job_83c9c01cJSON metadata

Shows the generated sitemap guard held in production after deploy, with route coverage, overflow, and console checks passing.

Final generated-sitemap guard console captureJSON logs

Preserves the clean browser-console receipt from the final production sitemap proof.

Catch 58

Robots blocked agent markdown docs

Back to top
Robots blocked agent markdown docs evidence screenshot
May 16, 2026< $0.01Riddle siterobots.txtagent docs
Plain-English catch card

Robots blocked agent markdown docs

Agent-facing docs need both availability and crawlability.

What went wrong
robots.txt returned 200, advertised the sitemap, and allowed the site generally, but it explicitly disallowed the raw markdown exports that Riddle presents as agent-consumable docs.
What Riddle caught
Production job job_c268d7ce loaded robots.txt across desktop, phone, iPad Mini, and iPad and proved four stale markdown Disallow lines were present while the docs markdown endpoint returned text/markdown.
Why it matters
This is an agent-surface catch: the human docs existed, but crawler policy still discouraged the machine-readable versions that agents should consume.
What changed
For agent-facing docs: prove rendered pages, raw markdown content, robots policy, sitemap references, and forbidden crawl rules in one profile.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Riddle robots.txt should allow public raw markdown exports used by agents, while still advertising the sitemap.

Claim: Riddle robots.txt should allow public raw markdown exports used by agents, while still advertising the sitemap.

Bug: robots.txt returned 200, advertised the sitemap, and allowed the site generally, but it explicitly disallowed the raw markdown exports that Riddle presents as agent-consumable docs.

Why normal checks missed it: The docs markdown itself was fetchable as text/markdown and the robots file looked superficially healthy. The mismatch only surfaced because the proof asserted absence of the four stale markdown Disallow rules.

Why this sells Riddle Proof: This is an agent-surface catch: the human docs existed, but crawler policy still discouraged the machine-readable versions that agents should consume.

Reusable profile seed: For agent-facing docs: prove rendered pages, raw markdown content, robots policy, sitemap references, and forbidden crawl rules in one profile.

What the browser run checked

  • Loaded robots.txt as the primary browser target across desktop, phone, iPad Mini, and iPad.
  • Asserted User-agent, Allow, and Sitemap text were present.
  • Asserted the markdown Disallow rules for docs, blog, blog posts, and the agent guide were absent.
  • Fetched robots.txt and docs/markdown.md with HTTP body/content-type checks before and after the fix.

Proof lesson

Agent-facing docs need both availability and crawlability. A raw markdown endpoint can exist and still be discouraged by robots policy unless the proof treats robots.txt as part of the public contract.

ArtifactTypeWhat it proves
Failing robots screenshotPNG screenshot

Shows the public robots.txt file before the fix, including the stale markdown Disallow rules.

Failing robots receipt job_c268d7ceJSON metadata

Records the four failed absence checks plus the successful sitemap and markdown endpoint evidence.

Failing robots console captureJSON logs

Shows the catch was not caused by browser runtime noise; the issue was the public robots contract itself.

Final robots receipt job_f4674917JSON metadata

Shows the same robots and raw markdown contract passed in production after the fix.

Catch 60

Playground Batch curl hid async mode

Back to top
Playground Batch curl hid async mode evidence screenshot
May 16, 2026< $0.01Riddle sitePlaygroundgenerated code
Plain-English catch card

Playground Batch curl hid async mode

This is a generated-command contract catch: copy buttons and examples should preserve the same request semantics as the real UI action.

What went wrong
The authenticated Playground Batch flow correctly sent sync:false in the real /v1/run request body, returned a durable job receipt, and rendered screenshot artifacts, but the generated and copyable curl command omitted "sync": false.
What Riddle caught
The failing production run job_3155f0c1 hit the Batch submit and artifacts mocks 4/4 times, and the captured request body included "sync":false, but the required visible text "sync": false was absent from .result-state in every viewport.
Why it matters
This is a generated-command contract catch: Riddle Proof found that the UI worked but the integration command taught a subtly wrong async API call.
What changed
For API playgrounds and copy buttons: assert the visible generated command includes the fields proven in the actual request body, especially mode flags, endpoints, auth shape, and artifact polling contracts.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
An async-only Playground Batch result should show a generated curl command that includes "sync": false and matches the actual request body.

Claim: An async-only Playground Batch result should show a generated curl command that includes "sync": false and matches the actual request body.

Bug: The authenticated Playground Batch flow correctly sent sync:false in the real /v1/run request body, returned a durable job receipt, and rendered screenshot artifacts, but the generated and copyable curl command omitted "sync": false.

Why normal checks missed it: The product behavior worked: submit succeeded, artifacts rendered, job receipt was visible, loading cleared, layout stayed clean, and browser console/page evidence was healthy. The mismatch only surfaced because the proof treated the integration snippet as part of the product contract, not as passive docs.

Why this sells Riddle Proof: This is a generated-command contract catch: Riddle Proof found that the UI worked but the integration command taught a subtly wrong async API call.

Reusable profile seed: For API playgrounds and copy buttons: assert the visible generated command includes the fields proven in the actual request body, especially mode flags, endpoints, auth shape, and artifact polling contracts.

What the browser run checked

  • Seeded Cognito-style authenticated storage for the Playground.
  • Switched to Batch mode, filled two URLs, confirmed the Async only state, and clicked Run Batch across desktop, phone, iPad Mini, and iPad.
  • Mocked POST /v1/run with a required request body containing "urls", "sync":false, and the two batch URLs.
  • Mocked /v1/jobs/job_rp359_batch_async_curl/artifacts with two PNG artifacts.
  • Asserted Success, Job ID, job_rp359_batch_async_curl, both artifact labels, the visible generated "sync": false curl field, screenshot-item count, loading-state absence, overflow, and final console/page health.

Proof lesson

This is a generated-command contract catch: copy buttons and examples should preserve the same request semantics as the real UI action. A working result can still teach users the wrong API call.

ArtifactTypeWhat it proves
Failing Batch curl result screenshotPNG screenshot

Shows the successful Batch result surface before the fix: artifacts and job receipt were present, but the generated curl sample omitted the async mode field.

Failing run receipt job_3155f0c1JSON metadata

Records the 4/4 submit and artifact mock hits, the actual request body containing "sync":false, and the missing visible "sync": false assertion across all four viewports.

Failing run console capture job_3155f0c1JSON logs

Keeps browser warning and console context beside the generated-command contract failure.

Final passing run receipt job_c892e0c0JSON metadata

Shows the same generated-command contract passed after the Playground curl text included "sync": false in desktop, phone, iPad Mini, and iPad views.

Catch 61

Playground async results hid the job receipt

Back to top
Playground async results hid the job receipt evidence screenshot
May 15, 2026< $0.01Riddle sitePlaygroundartifact handling
Plain-English catch card

Playground async results hid the job receipt

This is a receipt-traceability catch: artifact UIs should show the durable job id whenever they show async results.

What went wrong
The authenticated Playground async Script result rendered screenshot and console artifacts from a successful mocked job, but did not expose the durable job id on the visible result screen.
What Riddle caught
The failing production run job_77bc1541 hit the Script submit and artifacts mocks 4/4 times, rendered screenshot and console evidence, and kept overflow at 0px, but failed because Job ID and job_rp356_script_receipt were absent from .result-state in every viewport.
Why it matters
This is a receipt-traceability catch: the artifact UI looked successful but omitted the identifier users, support, and agents need to connect the result back to a durable Riddle proof bundle.
What changed
For async artifact UIs: assert not only artifact rendering and terminal state, but also the visible job id or run receipt that makes the result shareable and supportable.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
An async Playground Script result should expose the durable Riddle job id alongside screenshot and console artifacts.

Claim: An async Playground Script result should expose the durable Riddle job id alongside screenshot and console artifacts.

Bug: The authenticated Playground async Script result rendered screenshot and console artifacts from a successful mocked job, but did not expose the durable job id on the visible result screen.

Why normal checks missed it: The submit request worked, artifact polling worked, the screenshot item rendered, console output rendered, loading cleared, layout stayed clean, and browser console/page evidence was healthy. The missing receipt only surfaced because the proof required the result UI to be traceable back to the exact Riddle job.

Why this sells Riddle Proof: This is a receipt-traceability catch: the artifact UI looked successful but omitted the identifier users, support, and agents need to connect the result back to a durable Riddle proof bundle.

Reusable profile seed: For async artifact UIs: assert not only artifact rendering and terminal state, but also the visible job id or run receipt that makes the result shareable and supportable.

What the browser run checked

  • Seeded Cognito-style authenticated storage for the Playground.
  • Switched to Script mode, chose Async, filled a script with saveScreenshot and console.log, and clicked Run Script across desktop, phone, iPad Mini, and iPad.
  • Mocked POST /v1/run as a job-id-only async submission and mocked /v1/jobs/job_rp356_script_receipt/artifacts with one PNG artifact plus console.json.
  • Asserted Success, Job ID, job_rp356_script_receipt, screenshot-item count, console section, loading-state absence, overflow, and final console/page health.

Proof lesson

This is a receipt-traceability catch: artifact UIs should show the durable job id whenever they show async results. Screenshots and logs are easier to trust when the user can connect them to the exact proof bundle.

ArtifactTypeWhat it proves
Failing Script result screenshotPNG screenshot

Shows the successful artifact result surface before the fix: screenshots and console output exist, but no visible job receipt ties the UI back to the Riddle job.

Failing run receipt job_77bc1541JSON metadata

Records the missing Job ID assertions, 4/4 submit and artifact mock hits, screenshot/console evidence, viewport matrix, and clean overflow/browser evidence.

Failing run console capture job_77bc1541JSON logs

Preserves the console-output context from the same async Script result proof.

Final passing run receipt job_c747c2ecJSON metadata

Shows the same async Script receipt profile passed after the UI exposed job_rp356_script_receipt in all four viewports.

Catch 62

Billing Stripe hydration failed invisibly

Back to top
Billing Stripe hydration failed invisibly evidence screenshot
May 15, 2026< $0.01Riddle sitebillinghydration
Plain-English catch card

Billing Stripe hydration failed invisibly

This is a screenshot-is-not-enough catch: a proof profile should pair visible business-state assertions with fatal/page-error evidence.

What went wrong
The authenticated Billing page recovered from a forced transient balance-load failure and rendered the expected account state, but browser page-error evidence still captured Minified React error #418 from the Stripe Elements surface.
What Riddle caught
The failing production run job_0a9320d5 passed the recovered Billing state but failed no_fatal_console_errors with page_error_count 1 for Minified React error #418.
Why it matters
This is a screenshot-is-not-enough catch: the UI looked healthy after recovery, but the browser still recorded a React hydration failure that would be easy to miss in manual review.
What changed
For payment and embedded-widget pages: prove the visible account state and also require page_error_count 0; then run the same profile after mount/defer fixes to confirm the hidden browser error is gone.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Billing should recover from a transient balance-load failure and render Stripe-backed payment UI without React hydration page errors.

Claim: Billing should recover from a transient balance-load failure and render Stripe-backed payment UI without React hydration page errors.

Bug: The authenticated Billing page recovered from a forced transient balance-load failure and rendered the expected account state, but browser page-error evidence still captured Minified React error #418 from the Stripe Elements surface.

Why normal checks missed it: The visible page looked recovered: Billing & Credits rendered, the Retry flow worked, balance and transaction text appeared, and responsive layout stayed clean. The issue was only visible because the proof treated page errors as first-class evidence instead of trusting the final screenshot alone.

Why this sells Riddle Proof: This is a screenshot-is-not-enough catch: the UI looked healthy after recovery, but the browser still recorded a React hydration failure that would be easy to miss in manual review.

Reusable profile seed: For payment and embedded-widget pages: prove the visible account state and also require page_error_count 0; then run the same profile after mount/defer fixes to confirm the hidden browser error is gone.

What the browser run checked

  • Seeded Cognito-style authenticated storage for the Billing page.
  • Mocked the first balance request as a 503, then reused a successful balance response while history and auto-recharge returned valid account data.
  • Attempted the visible Retry recovery path on desktop and proved final recovered state directly on phone, iPad Mini, and iPad.
  • Asserted final balance, transaction history, payment-method state, auto-recharge state, absence of load-error text, responsive overflow, and fatal console/page health.

Proof lesson

This is a screenshot-is-not-enough catch: a proof profile should pair visible business-state assertions with fatal/page-error evidence. Hydration failures can be invisible in a happy-path screenshot while still making the page brittle.

ArtifactTypeWhat it proves
Failing recovered Billing screenshotPNG screenshot

Shows why screenshot-only review could miss the issue: the recovered Billing page appears usable even though the proof captured a React page error.

Failing run receipt job_0a9320d5JSON metadata

Records page_error_count 1, the Minified React error #418 message, recovered Billing evidence, and the viewport matrix.

Failing run console capture job_0a9320d5JSON logs

Keeps the billing retry console evidence, including expected transient balance-load errors and embedded-payment browser warnings, beside the hidden hydration failure receipt.

Final passing run receipt job_a1a528afJSON metadata

Shows the same Billing retry profile passed after the Stripe mount fix with page_error_count 0 and recovered account state in all four viewports.

Catch 63

Playground Script failed jobs looked neutral

Back to top
Playground Script failed jobs looked neutral evidence screenshot
May 15, 2026< $0.01Riddle sitePlaygroundterminal states
Plain-English catch card

Playground Script failed jobs looked neutral

Async artifact UIs should treat every terminal failure status as a first-class visible state and preserve the service error message plus partial artifact evidence.

What went wrong
The Playground async Script path received a terminal status: failed artifact response with a service error and partial screenshot, but rendered a neutral Result state without .error-warning, the backend message, or the partial results available warning.
What Riddle caught
The failing production run job_0144ef09 hit the Script submit and failed-artifacts mocks 4/4 times and showed one partial screenshot item, but failed because Synthetic v347 script sandbox failed after preserving partial screenshot evidence, partial results available, and .error-warning were absent in all four viewports.
Why it matters
This is an async artifact-status catch: the product preserved partial evidence but hid the terminal failure semantics users and support need.
What changed
For async artifact UIs: include terminal failed, completed_error, completed_timeout, partial artifacts, exact service messages, and structural error-warning assertions.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
An async Script artifact response with status: failed should render an error warning, preserve the service error message, and keep partial screenshot evidence visible.

Claim: An async Script artifact response with status: failed should render an error warning, preserve the service error message, and keep partial screenshot evidence visible.

Bug: The Playground async Script path received a terminal status: failed artifact response with a service error and partial screenshot, but rendered a neutral Result state without .error-warning, the backend message, or the partial results available warning.

Why normal checks missed it: The route loaded, auth setup worked, submit and artifacts mocks hit exactly once per viewport, one partial screenshot rendered, loading cleared, layout stayed clean, and final console/page evidence was clean. The issue only surfaced when the profile asserted exact terminal-failure UI semantics.

Why this sells Riddle Proof: This is an async artifact-status catch: the product preserved partial evidence but hid the terminal failure semantics users and support need.

Reusable profile seed: For async artifact UIs: include terminal failed, completed_error, completed_timeout, partial artifacts, exact service messages, and structural error-warning assertions.

What the browser run checked

  • Seeded Cognito-style authenticated storage for the Playground.
  • Switched to Script mode, chose Async, filled a script with saveScreenshot and an intentional thrown error, and clicked Run Script across desktop, phone, iPad Mini, and iPad.
  • Mocked POST /v1/run as a job-id-only submission and mocked /v1/jobs/job_rp347_script_failed/artifacts as status: failed with a structured error and one PNG artifact.
  • Asserted the backend error message, partial results available warning, .error-warning count, one screenshot item, loading-state absence, overflow, and final console/page health.

Proof lesson

Async artifact UIs should treat every terminal failure status as a first-class visible state and preserve the service error message plus partial artifact evidence.

ArtifactTypeWhat it proves
Failing Script result screenshotPNG screenshot

Shows the neutral Result view with a partial screenshot but no error warning or backend failure message.

Run receipt job_0144ef09JSON metadata

Records mock hits, missing terminal-failure text assertions, .error-warning count 0, preserved partial screenshot, and the four-viewport matrix.

Console captureJSON logs

Shows the catch was not a browser crash or noisy console issue; final console/page evidence stayed clean while failure semantics were hidden.

Catch 64

Dashboard terminal jobs leaked raw service statuses

Back to top
Dashboard terminal jobs leaked raw service statuses evidence screenshot
May 15, 2026< $0.01Riddle sitedashboardstatus semantics
Plain-English catch card

Dashboard terminal jobs leaked raw service statuses

Account-state audits should verify service-contract translation, not just row presence.

What went wrong
The authenticated dashboard Recent Jobs table rendered service terminal statuses as Completed Timeout and Completed Error instead of human labels Timed Out and Failed.
What Riddle caught
The failing production run job_0c6e4f93 rendered job_rp345_timeout and job_rp345_error across four viewports, but failed because Timed Out and Failed were absent while Completed Timeout and Completed Error were visible everywhere.
Why it matters
This is a practical dashboard-audit catch: every row existed and the page was clean, but the product leaked backend vocabulary where users needed clear terminal-state meaning.
What changed
For account dashboards: mock representative service states, assert the visible human label for each state, add absence checks for raw service labels, and pair semantic checks with responsive and console evidence.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Dashboard Recent Jobs should translate completed_timeout to Timed Out and completed_error to Failed across authenticated desktop, phone, and tablet views.

Claim: Dashboard Recent Jobs should translate completed_timeout to Timed Out and completed_error to Failed across authenticated desktop, phone, and tablet views.

Bug: The authenticated dashboard Recent Jobs table rendered service terminal statuses as Completed Timeout and Completed Error instead of human labels Timed Out and Failed.

Why normal checks missed it: The dashboard route loaded, auth setup worked, balance data rendered, API keys rendered, recent job rows rendered, layout stayed clean, and final console/page evidence was clean. The issue only surfaced when the profile asserted exact status semantics for terminal service states.

Why this sells Riddle Proof: This is a practical dashboard-audit catch: every row existed and the page was clean, but the product leaked backend vocabulary where users needed clear terminal-state meaning.

Reusable profile seed: For account dashboards: mock representative service states, assert the visible human label for each state, add absence checks for raw service labels, and pair semantic checks with responsive and console evidence.

What the browser run checked

  • Seeded Cognito-style authenticated storage for the Riddle site dashboard.
  • Mocked balance, recent-jobs, and API-key responses across desktop, phone, iPad Mini, and iPad.
  • Required three job rows: completed_timeout, completed_error, and running.
  • Asserted Timed Out, Failed, and Running were visible while Completed Timeout and Completed Error stayed absent.
  • Measured responsive overflow and final console/page health after the authenticated account state rendered.

Proof lesson

Account-state audits should verify service-contract translation, not just row presence. A table can be healthy, populated, and responsive while still mislabeling the business meaning of a job.

ArtifactTypeWhat it proves
Failing dashboard screenshotPNG screenshot

Shows the authenticated Recent Jobs table with Completed Timeout and Completed Error visible before the fix.

Run receiptJSON metadata

Records the missing Timed Out and Failed assertions, raw completed-label failures, three mocked job rows, network mocks, and viewport matrix.

Console captureJSON logs

Shows the catch was not a frontend crash or noisy browser failure; console/page evidence stayed clean while the table semantics were wrong.

Catch 65

Playground Script assumed artifacts_url

Back to top
Playground Script assumed artifacts_url evidence screenshot
May 15, 2026< $0.01Riddle sitePlaygroundartifact handling
Plain-English catch card

Playground Script assumed artifacts_url

Async artifact UIs should treat job_id as the stable contract and artifacts_url as an optional convenience.

What went wrong
The Playground async Script path accepted a job-id-only response, but then polled https://api.riddledc.comundefined/ instead of the standard /v1/jobs/{job_id}/artifacts endpoint.
What Riddle caught
The failing production run job_50bcafca hit the Script submit mock 4/4 times, but hit the required /v1/jobs/job_rp342_script/artifacts mock 0/4 times while the browser repeatedly tried https://api.riddledc.comundefined/.
Why it matters
This is an async-contract catch: the UI looked ready to run, but a valid response shape left users polling a malformed URL instead of seeing the artifact they requested.
What changed
For async job UIs: return only job_id from submit, require /v1/jobs/{job_id}/artifacts, assert artifact rendering, and add a negative signal for malformed legacy poll URLs.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
An async Script response with only job_id should poll /v1/jobs/{job_id}/artifacts and render the returned screenshot artifact.

Claim: An async Script response with only job_id should poll /v1/jobs/{job_id}/artifacts and render the returned screenshot artifact.

Bug: The Playground async Script path accepted a job-id-only response, but then polled https://api.riddledc.comundefined/ instead of the standard /v1/jobs/{job_id}/artifacts endpoint.

Why normal checks missed it: The route loaded, auth setup worked, Script async controls were reachable, the submit request body was correct, and layout stayed clean. The issue only showed up when the proof required the artifacts endpoint hit count, screenshot result, loading-state cleanup, and final console health.

Why this sells Riddle Proof: This is an async-contract catch: the UI looked ready to run, but a valid response shape left users polling a malformed URL instead of seeing the artifact they requested.

Reusable profile seed: For async job UIs: return only job_id from submit, require /v1/jobs/{job_id}/artifacts, assert artifact rendering, and add a negative signal for malformed legacy poll URLs.

What the browser run checked

  • Seeded Cognito-style authenticated storage for the Playground.
  • Switched to Script mode, chose Async, filled a Playwright script that calls saveScreenshot, and clicked Run Script across desktop, phone, iPad Mini, and iPad.
  • Mocked POST /v1/run as a job-id-only async submission and required the standard job artifacts endpoint to return one PNG artifact.
  • Asserted the artifact endpoint hit count, Success result, rp342-script label, screenshot-item count, loading-state absence, overflow, and final console/page health.

Proof lesson

Async artifact UIs should treat job_id as the stable contract and artifacts_url as an optional convenience. Every async mode should converge on the same job artifacts polling behavior.

ArtifactTypeWhat it proves
Failing Script screenshotPNG screenshot

Shows the Playground after the Script run failed to reach a result because it was polling the malformed undefined artifacts URL.

Run receiptJSON metadata

Records the submit mock hits, missing standard artifacts mock hits, setup wait failure, loading-state failure, and viewport matrix.

Console captureJSON logs

Shows the repeated browser resource failures against https://api.riddledc.comundefined/ instead of a product crash hiding the issue.

Catch 66

Playground timeout hid the artifact reason

Back to top
Playground timeout hid the artifact reason evidence screenshot
May 15, 2026< $0.01Riddle sitePlaygroundartifact handling
Plain-English catch card

Playground timeout hid the artifact reason

Artifact UIs should preserve failure reasons, not just thumbnails.

What went wrong
The Playground async Workflow timeout path preserved a partial screenshot, but replaced the service timeout detail with generic "Workflow timed out after 120 seconds" copy.
What Riddle caught
The failing production run job_483da63f returned completed_timeout with Synthetic v340 workflow timed out waiting for purchase-confirmation, kept rp340-timeout-first visible across four viewports, but failed because the timeout detail was absent everywhere.
Why it matters
This is an artifact-trust catch: the proof showed the product kept a useful receipt but hid the reason the receipt mattered, making timeout triage weaker than the underlying Riddle evidence.
What changed
For async artifact UIs: mock completed_timeout with a structured timeout message and partial artifacts, then assert both the visible failure reason and the preserved artifact rows/thumbnails.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
An async Workflow timeout should show the timeout message from job artifacts while preserving any partial screenshot evidence.

Claim: An async Workflow timeout should show the timeout message from job artifacts while preserving any partial screenshot evidence.

Bug: The Playground async Workflow timeout path preserved a partial screenshot, but replaced the service timeout detail with generic "Workflow timed out after 120 seconds" copy.

Why normal checks missed it: The route loaded, auth setup worked, Workflow async controls were reachable, the submit and artifacts mocks hit exactly once per viewport, the timeout state rendered, the partial screenshot stayed visible, layout stayed clean, and final console/page evidence was clean. The issue was only visible when the proof asserted the exact timeout reason from the artifacts payload.

Why this sells Riddle Proof: This is an artifact-trust catch: the proof showed the product kept a useful receipt but hid the reason the receipt mattered, making timeout triage weaker than the underlying Riddle evidence.

Reusable profile seed: For async artifact UIs: mock completed_timeout with a structured timeout message and partial artifacts, then assert both the visible failure reason and the preserved artifact rows/thumbnails.

What the browser run checked

  • Seeded Cognito-style authenticated storage for the Playground.
  • Switched to Workflow mode, chose Async, filled a workflow JSON payload, and clicked Run Workflow across desktop, phone, iPad Mini, and iPad.
  • Mocked POST /v1/run as a successful async submission and mocked the artifacts endpoint as completed_timeout with one PNG artifact.
  • Asserted the artifact timeout message, Timed Out state, partial screenshot label, screenshot-item count, loading-state absence, overflow, and final console/page health.

Proof lesson

Artifact UIs should preserve failure reasons, not just thumbnails. A partial screenshot is useful, but the user still needs the service-provided explanation for why the run stopped.

ArtifactTypeWhat it proves
Failing timeout screenshotPNG screenshot

Shows the Playground timeout result with the partial screenshot present but the artifact timeout reason missing.

Run receiptJSON metadata

Records the missing timeout-message assertion, exact mock hits, preserved partial screenshot, viewport matrix, and clean layout evidence.

Console captureJSON logs

Shows this was not a frontend crash; console/page evidence stayed clean while the visible timeout reason was wrong.

Catch 67

Dashboard balance failure looked like zero credits

Back to top
Dashboard balance failure looked like zero credits evidence screenshot
May 15, 2026< $0.01Riddle sitedashboarderror handling
Plain-English catch card

Dashboard balance failure looked like zero credits

Dashboard proof should isolate partial backend failures: one widget can fail while the rest of the page stays healthy, and the user still needs the real reason.

What went wrong
The authenticated dashboard hid a structured balance-load backend failure and silently showed 0s / $0.00, making a dependency outage look like an empty account.
What Riddle caught
The failing production run job_519cdc28 mocked GET /billing/balance as a structured 503, kept jobs and API keys visible across four viewports, but failed because Synthetic v338 dashboard balance unavailable was absent everywhere.
Why it matters
This is a support-quality catch: the page looked healthy enough to trust, but the balance widget silently converted a backend failure into an apparent zero-credit account.
What changed
For dashboards with multiple backend dependencies: mock one read dependency as a structured failure, keep sibling reads successful, assert the exact human message, and prove unaffected data remains visible.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A dashboard balance-load failure should show the backend human message while preserving other account data such as recent jobs and API keys.

Claim: A dashboard balance-load failure should show the backend human message while preserving other account data such as recent jobs and API keys.

Bug: The authenticated dashboard hid a structured balance-load backend failure and silently showed 0s / $0.00, making a dependency outage look like an empty account.

Why normal checks missed it: The dashboard route loaded, auth setup worked, recent jobs rendered, the API-key row rendered, layout stayed clean, and final console/page evidence was clean. The issue was only visible when the proof asserted the exact backend message from the failed balance dependency.

Why this sells Riddle Proof: This is a support-quality catch: the page looked healthy enough to trust, but the balance widget silently converted a backend failure into an apparent zero-credit account.

Reusable profile seed: For dashboards with multiple backend dependencies: mock one read dependency as a structured failure, keep sibling reads successful, assert the exact human message, and prove unaffected data remains visible.

What the browser run checked

  • Seeded Cognito-style authenticated storage for the Riddle site dashboard.
  • Mocked billing balance as a structured 503 while jobs and API keys returned valid data.
  • Asserted the backend balance error text, Recent Jobs, job_rp338_completed, and Dashboard Balance Fallback Key across desktop, phone, iPad Mini, and iPad.
  • Asserted [object Object] and Application error stayed absent, exact row counts stayed stable, overflow stayed clean, and final console/page health stayed clean.

Proof lesson

Dashboard proof should isolate partial backend failures: one widget can fail while the rest of the page stays healthy, and the user still needs the real reason.

ArtifactTypeWhat it proves
Failing balance screenshotPNG screenshot

Shows the dashboard after the balance dependency failed: jobs and API keys remained visible, but the human balance error was missing.

Run receiptJSON metadata

Records the missing backend message assertion, 21 network mock hits, preserved jobs/API-key rows, and viewport matrix.

Console captureJSON logs

Shows the catch was not a runtime crash; final console/page evidence was clean while the visible account message was wrong.

Catch 68

Auto-recharge disable hid the backend error

Back to top
Auto-recharge disable hid the backend error evidence screenshot
May 15, 2026< $0.01Riddle sitebillingerror handling
Plain-English catch card

Auto-recharge disable hid the backend error

Settings rollback proof should verify both state integrity and message integrity.

What went wrong
The billing page correctly rolled a failed auto-recharge disable attempt back to (ON), but rendered [object Object] instead of the backend human rejection message.
What Riddle caught
The failing production run job_89d53b2f hit PUT /api/billing/auto-recharge four times with {"enabled":false}, left (ON) visible and (OFF) absent, but failed because Synthetic v333 auto-recharge disable rejected was absent while [object Object] rendered in all four viewports.
Why it matters
This is a billing-support catch: the account state stayed safe, but the UI hid the backend reason a user or support person would need to understand the failed settings change.
What changed
For account settings: combine failed-write mocks, request-body assertions, rollback-state assertions, exact backend-message checks, object-placeholder absence, and expected 4xx console handling.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A rejected auto-recharge disable request should roll the visible toggle back to the persisted on state and render the backend human message, never an object placeholder.

Claim: A rejected auto-recharge disable request should roll the visible toggle back to the persisted on state and render the backend human message, never an object placeholder.

Bug: The billing page correctly rolled a failed auto-recharge disable attempt back to (ON), but rendered [object Object] instead of the backend human rejection message.

Why normal checks missed it: The route loaded, authenticated billing data rendered, the failed PUT fired exactly once per viewport, the inline error existed, and the visible toggle rollback was correct. The regression was only obvious when the proof asserted the exact backend message and object-placeholder absence.

Why this sells Riddle Proof: This is a billing-support catch: the account state stayed safe, but the UI hid the backend reason a user or support person would need to understand the failed settings change.

Reusable profile seed: For account settings: combine failed-write mocks, request-body assertions, rollback-state assertions, exact backend-message checks, object-placeholder absence, and expected 4xx console handling.

What the browser run checked

  • Seeded Cognito-style authenticated storage for the billing page.
  • Mocked billing balance, history, saved payment method, and enabled auto-recharge settings.
  • Clicked the auto-recharge disable path while the mocked PUT returned a structured 400 rejection.
  • Captured the failed request body and required {"enabled":false}.
  • Asserted the human backend message, [object Object] absence, (ON) visible, (OFF) absent, exact inline error count, overflow, and final console/page health.

Proof lesson

Settings rollback proof should verify both state integrity and message integrity. A rejected write can preserve the old setting while still hiding the reason the user needs.

ArtifactTypeWhat it proves
Failing run screenshotPNG screenshot

Anchors the authenticated billing state from the failing run; the run receipt records the rejected disable assertion and object-placeholder failure.

Run receiptJSON metadata

Records the matching failed disable PUT bodies, missing human backend message, object-placeholder assertion failure, and viewport matrix.

Console captureJSON logs

Shows the expected mocked 400 resource event was separate from the product regression.

Catch 69

Playground hid structured workflow errors

Back to top
Playground hid structured workflow errors evidence screenshot
May 15, 2026< $0.01Riddle sitePlaygrounderror handling
Plain-English catch card

Playground hid structured workflow errors

Interactive API tools need fallback profiles for realistic structured errors, not just happy-path runs or generic error-element checks.

What went wrong
The Playground async Workflow path handled a structured validation failure but rendered the error as [object Object] instead of showing the backend human message.
What Riddle caught
The failing production run job_6a27f3cd submitted a workflow payload with steps, sync false, and screenshot rp330-structured, then failed because Synthetic v330 workflow validation rejected was absent while [object Object] appeared in all four viewports.
Why it matters
This is a support-facing API-tool catch: the workflow failure was handled, but the UI hid the reason a user or support person would need to debug the request.
What changed
For playgrounds and API consoles: drive the real mode controls, submit realistic request bodies, mock structured 4xx failures, and assert exact human backend messages instead of generic error boxes.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
An async Workflow validation failure should preserve the backend human message and never render a structured error object placeholder.

Claim: An async Workflow validation failure should preserve the backend human message and never render a structured error object placeholder.

Bug: The Playground async Workflow path handled a structured validation failure but rendered the error as [object Object] instead of showing the backend human message.

Why normal checks missed it: The route loaded, auth setup worked, the Workflow and Async controls were reachable, the request body was correct, and the mocked 400 fired exactly once per viewport. The regression was only visible when the proof asserted the exact structured backend message.

Why this sells Riddle Proof: This is a support-facing API-tool catch: the workflow failure was handled, but the UI hid the reason a user or support person would need to debug the request.

Reusable profile seed: For playgrounds and API consoles: drive the real mode controls, submit realistic request bodies, mock structured 4xx failures, and assert exact human backend messages instead of generic error boxes.

What the browser run checked

  • Seeded Cognito-style authenticated storage for the Playground.
  • Switched to Workflow mode, chose Async, filled a workflow JSON payload, and clicked Run Workflow across desktop, phone, iPad Mini, and iPad.
  • Mocked POST /v1/run as a structured 400 validation failure and captured the matching request body.
  • Asserted the backend error text, [object Object] absence, application-error absence, exact error element count, no result/loading state, overflow, and final console/page health.

Proof lesson

Interactive API tools need fallback profiles for realistic structured errors, not just happy-path runs or generic error-element checks.

ArtifactTypeWhat it proves
Failing workflow screenshotPNG screenshot

Shows the Playground after the async workflow failure rendered [object Object] instead of the backend message.

Run receiptJSON metadata

Records the matching workflow POST body, the missing human message, [object Object] in every viewport, and clean layout evidence.

Console captureJSON logs

Shows the expected mocked 400 resource event was separated from application failures while the visible error copy was wrong.

Catch 70

Payment-method setup hid the backend error

Back to top
Payment-method setup hid the backend error evidence screenshot
May 15, 2026< $0.01Riddle sitebillingerror handling
Plain-English catch card

Payment-method setup hid the backend error

Fallback profiles should assert the exact human message from structured backend errors, not just that some error element appears.

What went wrong
The billing page handled a rejected payment-method setup request but replaced the backend human message with generic "Failed to create setup intent" copy.
What Riddle caught
The failing production run job_2882931c clicked Save Payment Method in four viewports, hit the mocked setup failure four times, and failed because Synthetic v328 payment method setup rejected was absent everywhere.
Why it matters
This is a practical checkout/settings catch: the workflow failed safely, but the product hid the backend reason a user or support team would need.
What changed
For billing and checkout fallbacks: mock structured 4xx failures, assert the exact backend message, assert the unchanged account state, and keep expected resource errors separate from application failures.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A rejected payment-method setup request should preserve the backend human error while keeping the account in the no-payment-method state.

Claim: A rejected payment-method setup request should preserve the backend human error while keeping the account in the no-payment-method state.

Bug: The billing page handled a rejected payment-method setup request but replaced the backend human message with generic "Failed to create setup intent" copy.

Why normal checks missed it: The route loaded, the Stripe-backed form opened, the failed POST fired exactly once per viewport, the no-payment-method state remained, and an inline error rendered. Only the proof checked that the specific backend message survived to the user.

Why this sells Riddle Proof: This is a practical checkout/settings catch: the workflow failed safely, but the product hid the backend reason a user or support team would need.

Reusable profile seed: For billing and checkout fallbacks: mock structured 4xx failures, assert the exact backend message, assert the unchanged account state, and keep expected resource errors separate from application failures.

What the browser run checked

  • Seeded Cognito-style authenticated storage for the billing page.
  • Mocked billing balance, history, auto-recharge reads, and a structured payment-method setup failure.
  • Opened the Stripe-backed setup form and clicked Save Payment Method across desktop, phone, iPad Mini, and iPad.
  • Asserted the backend error text, success absence, no-payment-method state, exact inline error count, overflow, and final console/page health.

Proof lesson

Fallback profiles should assert the exact human message from structured backend errors, not just that some error element appears.

ArtifactTypeWhat it proves
Failing setup screenshotPNG screenshot

Shows the failed payment-method setup state where the user saw generic frontend copy instead of the backend reason.

Run receiptJSON metadata

Records the missing backend message assertion, four failed setup POST hits, no-payment-method state, and viewport matrix.

Console captureJSON logs

Shows the catch was not caused by an unhandled runtime crash; the product state was stable but the error copy was wrong.

Catch 71

Handled API-key revoke failure still logged as fatal

Back to top
Handled API-key revoke failure still logged as fatal evidence screenshot
May 15, 2026< $0.01Riddle sitedashboardconsole health
Plain-English catch card

Handled API-key revoke failure still logged as fatal

Negative-path proof should keep console/page health in scope after the visible UI looks right, because handled failures can still poison the browser evidence stream.

What went wrong
The dashboard visibly handled a rejected API-key revoke request, but still emitted an app-level console.error for the handled domain failure.
What Riddle caught
The failing production run job_64814348 accepted four revoke dialogs, hit the mocked DELETE four times, preserved the key row, showed Synthetic v327 API key revoke rejected, and failed on the unallowed Revoke API key failed console error.
Why it matters
This catch is useful because the user-visible fallback was already correct.
What changed
For destructive dashboard actions: accept/dismiss browser dialogs explicitly, cap the destructive request count, assert preserved rows, and treat app console errors separately from expected mocked 4xx resource events.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A rejected API-key revoke should show the user-facing failure and preserve the active key row without emitting an app-level fatal console error.

Claim: A rejected API-key revoke should show the user-facing failure and preserve the active key row without emitting an app-level fatal console error.

Bug: The dashboard visibly handled a rejected API-key revoke request, but still emitted an app-level console.error for the handled domain failure.

Why normal checks missed it: The visible fallback looked correct: the confirm dialog was accepted, the backend rejection message appeared, the active key row stayed present, and the revoked/empty states stayed absent. The bug was the hidden fatal console signal after an expected failure path.

Why this sells Riddle Proof: This catch is useful because the user-visible fallback was already correct. Riddle Proof still found the hidden app-level error that would make browser evidence noisy and mask real regressions later.

Reusable profile seed: For destructive dashboard actions: accept/dismiss browser dialogs explicitly, cap the destructive request count, assert preserved rows, and treat app console errors separately from expected mocked 4xx resource events.

What the browser run checked

  • Seeded Cognito-style authenticated storage for the dashboard.
  • Configured dialog acceptance and clicked the destructive Revoke path in four viewports.
  • Mocked the API-key DELETE as a structured conflict and required the active key row plus human rejection text to remain visible.
  • Asserted revoked/empty states stayed absent, exact row/button counts stayed stable, overflow stayed clean, and final console/page health remained clean.

Proof lesson

Negative-path proof should keep console/page health in scope after the visible UI looks right, because handled failures can still poison the browser evidence stream.

ArtifactTypeWhat it proves
Failing revoke screenshotPNG screenshot

Shows the visible failure state was handled even while the app emitted a fatal console signal.

Run receiptJSON metadata

Records the accepted dialogs, failed DELETE hits, preserved active key row, and failing console-health assertion.

Console captureJSON logs

Preserves the app-level Revoke API key failed console error that separated this catch from expected mocked-resource noise.

Catch 72

A structured API-key error crashed the dashboard

Back to top
A structured API-key error crashed the dashboard evidence screenshot
May 15, 2026< $0.01Riddle sitedashboarderror handling
Plain-English catch card

A structured API-key error crashed the dashboard

Dashboard and settings profiles should include structured failure payloads, not only string errors, and should prove that existing data remains visible after failed writes.

What went wrong
The authenticated dashboard tried to render a structured API-key validation error object directly, crashing the API Keys section instead of showing the human message.
What Riddle caught
The failing production run job_d622f658 submitted {"name":"Structured Error Key v324"}, then React error #31 removed the dashboard content while Synthetic v324 API key rejected stayed absent.
Why it matters
This is a strong authenticated-product catch: the request was right and the API failure was realistic, but the UI crashed because it rendered an object as a React child.
What changed
For admin/settings forms: mock structured error payloads, assert human copy, assert existing rows remain, and require [object Object] plus application-error text to stay absent.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A structured API-key validation error should render as human text without crashing the authenticated dashboard or dropping the existing key list.

Claim: A structured API-key validation error should render as human text without crashing the authenticated dashboard or dropping the existing key list.

Bug: The authenticated dashboard tried to render a structured API-key validation error object directly, crashing the API Keys section instead of showing the human message.

Why normal checks missed it: The dashboard loaded, auth setup worked, existing keys rendered, and the create request body was correct. The bug only appeared when the mocked backend returned a realistic nested error payload.

Why this sells Riddle Proof: This is a strong authenticated-product catch: the request was right and the API failure was realistic, but the UI crashed because it rendered an object as a React child.

Reusable profile seed: For admin/settings forms: mock structured error payloads, assert human copy, assert existing rows remain, and require [object Object] plus application-error text to stay absent.

What the browser run checked

  • Seeded Cognito-style authenticated storage for the Riddle site dashboard.
  • Mocked balance, recent jobs, existing API keys, and a structured API-key create failure.
  • Captured the create request body and required {"name":"Structured Error Key v324"}.
  • Asserted the human error, existing key row, dashboard content, modal absence, overflow, and fatal console/page health.

Proof lesson

Dashboard and settings profiles should include structured failure payloads, not only string errors, and should prove that existing data remains visible after failed writes.

ArtifactTypeWhat it proves
Failing dashboard screenshotPNG screenshot

Shows the dashboard after the structured API-key error crashed the UI instead of rendering the human message.

Run receiptJSON metadata

Records the correct POST body, missing human error text, vanished dashboard selectors, and React fatal-console evidence.

Console captureJSON logs

Preserves the React error #31 evidence that distinguished a product crash from an expected mocked 400 resource error.

Catch 73

Auto-recharge stayed on after a failed save

Back to top
Auto-recharge stayed on after a failed save evidence screenshot
May 15, 2026< $0.01Riddle sitebillingsettings integrity
Plain-English catch card

Auto-recharge stayed on after a failed save

Settings proof should verify rollback state after rejected saves, not only that an error message appears.

What went wrong
The billing page showed a rejected auto-recharge save error but left the toggle label at (ON), advertising a setting that the backend had not persisted.
What Riddle caught
The failing production run job_3bb5a0cf hit the failed auto-recharge PUT four times, showed Synthetic v322 auto-recharge rejected, but found (OFF) absent and (ON) still visible in every viewport.
Why it matters
This is a settings-integrity catch: a route and error toast can both be green while the UI lies about whether the account setting was actually saved.
What changed
For account settings: mock failed writes, assert rollback to the prior visible state, cap network hit counts, and keep expected mocked 4xx console evidence separate from product failures.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A rejected auto-recharge settings save should display the failure while restoring the previous off state.

Claim: A rejected auto-recharge settings save should display the failure while restoring the previous off state.

Bug: The billing page showed a rejected auto-recharge save error but left the toggle label at (ON), advertising a setting that the backend had not persisted.

Why normal checks missed it: The page loaded, payment-method state rendered, the expected error appeared, and the mocked PUT request fired. The regression was the stale optimistic UI state after the failed write.

Why this sells Riddle Proof: This is a settings-integrity catch: a route and error toast can both be green while the UI lies about whether the account setting was actually saved.

Reusable profile seed: For account settings: mock failed writes, assert rollback to the prior visible state, cap network hit counts, and keep expected mocked 4xx console evidence separate from product failures.

What the browser run checked

  • Seeded authenticated storage and mocked billing balance, history, saved payment method, and auto-recharge settings.
  • Clicked the auto-recharge enable path while the mocked PUT returned 400.
  • Required the synthetic rejection message and exactly one error element.
  • Asserted (OFF) stayed visible, (ON) stayed absent, overflow stayed clean, and no unallowed fatal console/page errors appeared.

Proof lesson

Settings proof should verify rollback state after rejected saves, not only that an error message appears.

ArtifactTypeWhat it proves
Failing billing screenshotPNG screenshot

Shows the rejected auto-recharge save state where the page still advertised (ON).

Run receiptJSON metadata

Records the failed (OFF)/(ON) assertions, matching failed PUT body, network mock hits, and viewport matrix.

Console captureJSON logs

Shows the only fatal console noise was the expected mocked 400 resource failure, not the source of the product regression.

Catch 74

A failed dashboard job looked queued

Back to top
A failed dashboard job looked queued evidence screenshot
May 15, 2026< $0.01Riddle sitedashboardstatus semantics
Plain-English catch card

A failed dashboard job looked queued

Authenticated dashboards need profile checks for negative and in-flight states, not only route health and happy-path data.

What went wrong
The authenticated Riddle site dashboard rendered a mocked failed job row as Queued, hiding the failed state from the recent-jobs table.
What Riddle caught
The failing production run job_6711719e showed job_rp317_failed on the dashboard while Failed was absent in every viewport; the same run also caught that the phone widened by 293px.
Why it matters
This is a product-quality dashboard catch: the page was alive and authenticated, but the business meaning of a failed browser job was wrong.
What changed
For dashboards: mock representative completed, failed, and running rows, assert the visible status label for each state, and pair semantic assertions with responsive overflow checks.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A dashboard recent-jobs row with status failed should visibly render as Failed across authenticated desktop, phone, and tablet views.

Claim: A dashboard recent-jobs row with status failed should visibly render as Failed across authenticated desktop, phone, and tablet views.

Bug: The authenticated Riddle site dashboard rendered a mocked failed job row as Queued, hiding the failed state from the recent-jobs table.

Why normal checks missed it: The dashboard route loaded, balance data rendered, API keys rendered, and auth storage was accepted. The issue only showed up when the proof asserted the exact status semantics of a non-happy job row.

Why this sells Riddle Proof: This is a product-quality dashboard catch: the page was alive and authenticated, but the business meaning of a failed browser job was wrong.

Reusable profile seed: For dashboards: mock representative completed, failed, and running rows, assert the visible status label for each state, and pair semantic assertions with responsive overflow checks.

What the browser run checked

  • Seeded Cognito-style authenticated storage for the Riddle site dashboard.
  • Mocked balance, recent-jobs, and API-key responses across desktop, phone, iPad Mini, and iPad.
  • Required job_6711719e-style failed job evidence by asserting job_rp317_failed and Failed were both visible.
  • Measured responsive overflow and found the phone widened by 293px before the product fix.

Proof lesson

Authenticated dashboards need profile checks for negative and in-flight states, not only route health and happy-path data.

ArtifactTypeWhat it proves
Failing phone dashboard screenshotPNG screenshot

Shows the authenticated dashboard state that paired with the failed status-label assertion and phone overflow finding.

Run receiptJSON metadata

Records the missing Failed text, the mocked job rows, and the phone widened by 293px responsive failure.

Console captureJSON logs

Shows the product catch was not caused by fatal browser noise.

Catch 75

Authenticated nav overflowed on billing

Back to top
Authenticated nav overflowed on billing evidence screenshot
May 15, 2026< $0.01Riddle sitebillingresponsive
Plain-English catch card

Authenticated nav overflowed on billing

Workflow proof should keep app-shell layout assertions active after auth setup, because the shell can break even when the page-level task succeeds.

What went wrong
The authenticated Riddle site billing page desktop nav overflowed after the signed-in email and Sign Out control were present.
What Riddle caught
The failing production run job_4eb1e278 reached the successful billing retry state but measured that the desktop nav overflowed by 16px.
Why it matters
This shows why browser proof should stay on after the business flow succeeds: the workflow was green, but the authenticated product shell was visibly broken.
What changed
For authenticated workflows: combine task success, retry cleanup, shell overflow, and expected-error console handling in the same profile.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
An authenticated billing workflow should complete without widening the page shell on desktop, phone, or tablet viewports.

Claim: An authenticated billing workflow should complete without widening the page shell on desktop, phone, or tablet viewports.

Bug: The authenticated Riddle site billing page desktop nav overflowed after the signed-in email and Sign Out control were present.

Why normal checks missed it: The billing route reached the final promo-code success state, all mocked billing calls behaved correctly, and the workflow looked usable. The regression was in the authenticated shell around the workflow.

Why this sells Riddle Proof: This shows why browser proof should stay on after the business flow succeeds: the workflow was green, but the authenticated product shell was visibly broken.

Reusable profile seed: For authenticated workflows: combine task success, retry cleanup, shell overflow, and expected-error console handling in the same profile.

What the browser run checked

  • Seeded authenticated storage and mocked billing balance, history, auto-recharge, and promo-code responses.
  • Forced a structured promo error, retried successfully, and asserted the stale error disappeared.
  • Verified human-readable success text and absence of raw JSON or [object Object] output.
  • Measured the app shell and found the desktop nav overflowed by 16px before the fix.

Proof lesson

Workflow proof should keep app-shell layout assertions active after auth setup, because the shell can break even when the page-level task succeeds.

ArtifactTypeWhat it proves
Failing desktop billing screenshotPNG screenshot

Shows the final authenticated billing page state where the shell overflow measurement mattered.

Run receiptJSON metadata

Records the desktop nav overflowed by 16px while the billing retry workflow itself completed.

Console captureJSON logs

Keeps the expected mocked 400 resource noise separate from the actual layout finding.

Catch 76

A malformed login token opened the builder

Back to top
A malformed login token opened the builder evidence screenshot
May 14, 2026< $0.01authtrust boundaryrequest proof
Plain-English catch card

A malformed login token opened the builder

Auth proof should assert both sides of the boundary: the login surface remains visible after malformed identity responses, and privileged UI stays absent.

What went wrong
The builder treated a successful Cognito response with an empty AuthenticationResult as a real authenticated session and mounted the builder UI without a usable token.
What Riddle caught
The failing browser run shows the malformed login opening the builder while request-body assertions prove the mocked Cognito login path was the one exercised.
Why it matters
This is the kind of auth-boundary bug that looks fine in the browser until the proof asks whether the privileged UI opened from a valid token or just a friendly HTTP shape.
What changed
For login surfaces: mock malformed success payloads, assert privileged selectors stay absent, and capture request bodies for the identity-provider exchange.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A malformed identity-provider success response must not create an authenticated builder session.

Claim: A malformed identity-provider success response must not create an authenticated builder session.

Bug: The builder treated a successful Cognito response with an empty AuthenticationResult as a real authenticated session and mounted the builder UI without a usable token.

Why normal checks missed it: The HTTP status was 200 and the screen changed to the authenticated builder. A smoke test that only checks the happy path would never inspect whether a valid token was actually present.

Why this sells Riddle Proof: This is the kind of auth-boundary bug that looks fine in the browser until the proof asks whether the privileged UI opened from a valid token or just a friendly HTTP shape.

Reusable profile seed: For login surfaces: mock malformed success payloads, assert privileged selectors stay absent, and capture request bodies for the identity-provider exchange.

What the browser run checked

  • Opened the builder login route in desktop, phone, iPad Mini, and iPad viewports.
  • Mocked Cognito USER_PASSWORD_AUTH with an empty AuthenticationResult object.
  • Asserted the login surface and Login failed copy stayed visible.
  • Asserted the authenticated builder prompt was absent and .builder-root count stayed zero.

Proof lesson

Auth proof should assert both sides of the boundary: the login surface remains visible after malformed identity responses, and privileged UI stays absent.

ArtifactTypeWhat it proves
Failing desktop screenshotPNG screenshot

Shows the malformed login incorrectly opened the authenticated builder interface.

Run receiptJSON metadata

Records the profile result, selector failures, request-body assertions, and viewport matrix for the failing run.

Console captureJSON logs

Shows the bug was not a noisy runtime crash; the boundary failed quietly with clean console/page-error state.

Catch 77

Logout worked, until the delayed build came back

Back to top
Logout worked, until the delayed build came back evidence screenshot
May 14, 2026< $0.01asyncsession isolationnetwork delay
Plain-English catch card

Logout worked, until the delayed build came back

Async session proof needs controlled network delays, logout/relogin actions, and final absence checks for stale previews and save controls.

What went wrong
A delayed build response was allowed to apply preview and save state after the user logged out and returned to a fresh builder session.
What Riddle caught
The failing run used a delayed build mock, then proved the fresh session still showed Open in new tab, Save to Arcade, and a preview iframe that should have been gone.
Why it matters
The browser proof makes race conditions reproducible: not by reading code, but by controlling timing and checking what the user sees after the stale response lands.
What changed
For async workflows: add delayed network mocks, abort/reset actions, post-reset selector absence checks, and optional mock-hit evidence for stale responses.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A delayed build completion from a prior session must not mutate a fresh logged-in builder session.

Claim: A delayed build completion from a prior session must not mutate a fresh logged-in builder session.

Bug: A delayed build response was allowed to apply preview and save state after the user logged out and returned to a fresh builder session.

Why normal checks missed it: A normal logout check can pass if no in-flight request completes late. The bug only appears when the browser keeps the delayed network response alive across a session reset.

Why this sells Riddle Proof: The browser proof makes race conditions reproducible: not by reading code, but by controlling timing and checking what the user sees after the stale response lands.

Reusable profile seed: For async workflows: add delayed network mocks, abort/reset actions, post-reset selector absence checks, and optional mock-hit evidence for stale responses.

What the browser run checked

  • Started a builder workflow with a delayed mocked build response.
  • Logged out and signed back in before the delayed response completed.
  • Asserted the empty preview copy returned in the fresh session.
  • Asserted Open in new tab, Save to Arcade, and preview iframe selectors were absent.

Proof lesson

Async session proof needs controlled network delays, logout/relogin actions, and final absence checks for stale previews and save controls.

ArtifactTypeWhat it proves
Failing phone screenshotPNG screenshot

Shows stale preview/save state visible after logout and re-login.

Run receiptJSON metadata

Records the delayed mock, selector absence failures, and cross-viewport session-isolation checks.

Console captureJSON logs

Shows the race did not rely on visible runtime errors; the UI state was the evidence.

Catch 78

Canceling save still leaked the draft

Back to top
Canceling save still leaked the draft evidence screenshot
May 14, 2026< $0.01request bodyform statebuilder
Plain-English catch card

Canceling save still leaked the draft

For builder flows, screenshot proof should be paired with request-body assertions so hidden stale form state cannot slip through.

What went wrong
After canceling a save form, stale optional name, emoji, and description fields leaked into a later save request even though the final player looked correct.
What Riddle caught
The browser run reached the clean player route, but the captured save request still contained the canceled emoji and stale description.
Why it matters
This is the perfect “visually green, semantically wrong” catch: the browser reached the right page, but the network receipt proved the app submitted stale user data.
What changed
For forms: capture request bodies, assert required and forbidden fields, and include cancel/reopen paths before the final submit.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Canceling a save draft should clear optional form state before a later save request is submitted.

Claim: Canceling a save draft should clear optional form state before a later save request is submitted.

Bug: After canceling a save form, stale optional name, emoji, and description fields leaked into a later save request even though the final player looked correct.

Why normal checks missed it: The route, iframe, layout, and console checks were all green. Only the captured /api/save request body showed the canceled draft values were still being submitted.

Why this sells Riddle Proof: This is the perfect “visually green, semantically wrong” catch: the browser reached the right page, but the network receipt proved the app submitted stale user data.

Reusable profile seed: For forms: capture request bodies, assert required and forbidden fields, and include cancel/reopen paths before the final submit.

What the browser run checked

  • Opened the builder, generated a preview, opened the save form, and canceled stale draft values.
  • Saved a later clean draft and clicked through to the player route.
  • Asserted the iframe loaded and layout stayed clean across four viewports.
  • Asserted the captured /api/save body did not include the canceled emoji or stale description.

Proof lesson

For builder flows, screenshot proof should be paired with request-body assertions so hidden stale form state cannot slip through.

ArtifactTypeWhat it proves
Failing desktop screenshotPNG screenshot

Shows the UI looked healthy while the hidden request-body assertion failed.

Run receiptJSON metadata

Contains the failed forbidden-body checks for the leaked emoji and stale description.

Console captureJSON logs

Confirms the finding was request integrity, not a console-error failure.

Catch 79

A rainbow flag was saved as a broken emoji

Back to top
A rainbow flag was saved as a broken emoji evidence screenshot
May 14, 2026< $0.01unicoderequest bodybuilder
Plain-English catch card

A rainbow flag was saved as a broken emoji

Browser proof can catch Unicode/data-boundary bugs by asserting exact request-body content, not just rendered page state.

What went wrong
The builder emoji input truncated a valid compound emoji, saving the rainbow flag as the broken partial sequence 🏳️‍.
What Riddle caught
The failing run captured save request bodies in all four viewports and proved they did not contain the full 🏳️‍🌈 value.
Why it matters
This is a crisp example of why proof receipts should include network payload evidence: the page looked fine, but the saved data was corrupt.
What changed
For user-generated content: include exact Unicode values, capture serialized request bodies, and assert both required and forbidden payload strings.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A valid compound emoji entered in the builder should survive serialization into the save request body.

Claim: A valid compound emoji entered in the builder should survive serialization into the save request body.

Bug: The builder emoji input truncated a valid compound emoji, saving the rainbow flag as the broken partial sequence 🏳️‍.

Why normal checks missed it: The builder preview, saved state, iframe, layout, and console checks all looked fine. The failure was inside the serialized save payload.

Why this sells Riddle Proof: This is a crisp example of why proof receipts should include network payload evidence: the page looked fine, but the saved data was corrupt.

Reusable profile seed: For user-generated content: include exact Unicode values, capture serialized request bodies, and assert both required and forbidden payload strings.

What the browser run checked

  • Filled the builder save form with a compound emoji value.
  • Captured the /api/save request body from the real browser workflow.
  • Required the body to include the full 🏳️‍🌈 emoji.
  • Verified the preview iframe, saved state, overflow, and console/page-error checks still passed.

Proof lesson

Browser proof can catch Unicode/data-boundary bugs by asserting exact request-body content, not just rendered page state.

ArtifactTypeWhat it proves
Failing desktop screenshotPNG screenshot

Shows the builder flow completed while the hidden payload check caught the truncation.

Run receiptJSON metadata

Records the missing full emoji and the truncated request-body samples.

Console captureJSON logs

Shows the Unicode bug did not announce itself as a runtime error.

Catch 80

The player ignored its own layout metadata

Back to top
The player ignored its own layout metadata evidence screenshot
May 14, 2026< $0.01layoutiframemetadata
Plain-English catch card

The player ignored its own layout metadata

Layout proof should inspect embedded frame dimensions and metadata-driven rendering, not just route success or document scroll width.

What went wrong
A saved game with safe wide-layout metadata rendered as a normal unscaled iframe, overflowing by 434px on phone and 56px on iPad Mini.
What Riddle caught
The failing phone screenshot and proof receipt show a playable saved game with a too-wide iframe that escaped page-level overflow checks.
Why it matters
The route was technically healthy, but the user experience was broken.
What changed
For embedded media/apps: combine route checks, frame text, frame overflow, page overflow, and metadata-specific rendering assertions.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Saved-player layout metadata should scale wide embedded games so they fit phone and tablet viewports.

Claim: Saved-player layout metadata should scale wide embedded games so they fit phone and tablet viewports.

Bug: A saved game with safe wide-layout metadata rendered as a normal unscaled iframe, overflowing by 434px on phone and 56px on iPad Mini.

Why normal checks missed it: The route loaded, the iframe existed, frame text was visible, page-level overflow was 0px, and there were no console errors. Only iframe overflow checks exposed the user-visible layout break.

Why this sells Riddle Proof: The route was technically healthy, but the user experience was broken. Riddle Proof turns that embedded-layout nuance into a concrete receipt.

Reusable profile seed: For embedded media/apps: combine route checks, frame text, frame overflow, page overflow, and metadata-specific rendering assertions.

What the browser run checked

  • Loaded the saved player route in desktop, phone, iPad Mini, and iPad viewports.
  • Confirmed the saved manifest and player HTML mocks were hit.
  • Asserted the player iframe existed and frame text was visible.
  • Measured iframe overflow and found 434px phone overflow plus 56px iPad Mini overflow.

Proof lesson

Layout proof should inspect embedded frame dimensions and metadata-driven rendering, not just route success or document scroll width.

ArtifactTypeWhat it proves
Failing phone screenshotPNG screenshot

Shows the wide player rendered in the constrained phone shell where overflow mattered most.

Run receiptJSON metadata

Records the exact iframe overflow measurements across the viewport matrix.

Console captureJSON logs

Shows the layout bug happened with clean runtime logs, making visual/frame evidence essential.

Catch 81

A manifest row rendered a broken saved game

Back to top
A manifest row rendered a broken saved game evidence screenshot
May 14, 2026< $0.01resource integrityiframefallback
Plain-English catch card

A manifest row rendered a broken saved game

Saved-resource proof should distinguish “listed in manifest” from “actually playable,” and should assert friendly no-iframe fallback states for missing resources.

What went wrong
The player trusted a saved-game manifest row enough to render an iframe even when the saved HTML resource was unavailable.
What Riddle caught
The failing run showed Game not found was absent, an iframe was present, and the browser emitted resource failures for the unavailable saved HTML.
Why it matters
This is a real product-integrity check: the page can look routable while the underlying artifact is gone.
What changed
For user-saved artifacts: probe backing resources, assert no broken iframe/object renders, and allow expected resource errors only when the fallback UI is correct.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A saved-game manifest entry should not render a player iframe when the backing saved HTML resource is missing.

Claim: A saved-game manifest entry should not render a player iframe when the backing saved HTML resource is missing.

Bug: The player trusted a saved-game manifest row enough to render an iframe even when the saved HTML resource was unavailable.

Why normal checks missed it: The manifest route existed and the app shell rendered. Without probing the saved resource and asserting iframe absence, the broken player looked like a normal loading edge case.

Why this sells Riddle Proof: This is a real product-integrity check: the page can look routable while the underlying artifact is gone. Riddle Proof verifies the browser can actually reach the thing users need.

Reusable profile seed: For user-saved artifacts: probe backing resources, assert no broken iframe/object renders, and allow expected resource errors only when the fallback UI is correct.

What the browser run checked

  • Loaded the saved player route across desktop, phone, iPad Mini, and iPad.
  • Mocked the manifest row while making the saved HTML resource unavailable.
  • Asserted Game not found appeared and .game-player-root iframe stayed absent.
  • Captured console/resource failures alongside DOM and screenshot evidence.

Proof lesson

Saved-resource proof should distinguish “listed in manifest” from “actually playable,” and should assert friendly no-iframe fallback states for missing resources.

ArtifactTypeWhat it proves
Failing phone screenshotPNG screenshot

Shows the broken saved-player state on a real phone viewport.

Run receiptJSON metadata

Records the iframe/fallback failures and resource-error evidence that made this a product regression.

Console captureJSON logs

Preserves the resource failures that explain why the iframe should not have rendered.

Catch 82

The game worked, but the iframe was clipped

Back to top
The game worked, but the iframe was clipped evidence screenshot
May 13, 2026penniesiframeresponsiveblack-box
Plain-English catch card

The game worked, but the iframe was clipped

Element bounds and screenshots catch user-visible clipping that scalar scroll-width checks miss.

What went wrong
Hot Path completed its two-player interaction in every viewport, but the embedded game surface was visibly clipped on phone and tablet.
What Riddle caught
A phone viewport screenshot from the browser run shows the embedded game still active, but visibly cropped inside its frame.
Why it matters
This is the kind of issue a product team can miss when automated checks only ask “did the route load?” Riddle Proof turns the browser screenshot into a reviewable receipt.
What changed
For embedded apps: run post-interaction screenshots, iframe bounds checks, and visible-canvas assertions across phone/tablet/desktop.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A playable embedded game should remain visible and inside its frame on phone-sized screens after interaction.

Claim: A playable embedded game should remain visible and inside its frame on phone-sized screens after interaction.

Bug: Hot Path completed its two-player interaction in every viewport, but the embedded game surface was visibly clipped on phone and tablet.

Why normal checks missed it: The saved game used overflow hidden, so document scroll width stayed clean. A simple page overflow check would have passed.

Why this sells Riddle Proof: This is the kind of issue a product team can miss when automated checks only ask “did the route load?” Riddle Proof turns the browser screenshot into a reviewable receipt.

Reusable profile seed: For embedded apps: run post-interaction screenshots, iframe bounds checks, and visible-canvas assertions across phone/tablet/desktop.

What the browser run checked

  • Opened the player in a real browser at responsive viewports.
  • Drove the two-player interaction far enough to prove the game was active.
  • Captured screenshots after interaction instead of only at first load.
  • Compared visible element bounds against the viewport and iframe frame.

Proof lesson

Element bounds and screenshots catch user-visible clipping that scalar scroll-width checks miss.

ArtifactTypeWhat it proves
Phone screenshot after interactionPNG screenshot

The human-readable artifact: it shows the game active, but clipped in the viewport.

Run receiptJSON metadata

Records the browser run timestamp, duration, worker, and whether the proof script itself errored.

Console captureJSON logs

Preserves browser console output so visual findings can be read alongside runtime errors or warnings.

Catch 84

A fixed nav made full-screen routes one nav-height too tall

Back to top
A fixed nav made full-screen routes one nav-height too tall evidence screenshot
May 13, 2026~$0.01layoutroute inventoryresponsive
Plain-English catch card

A fixed nav made full-screen routes one nav-height too tall

A generic app-shell profile can find repeated layout classes: fixed nav offset, route root height, scroll policy, and top offenders.

What went wrong
Multiple older full-screen routes were exactly one fixed navigation bar too tall on desktop/tablet/phone.
What Riddle caught
A desktop screenshot from the route inventory run captures the game route inside the app shell while the measured route-root bounds reveal the repeated nav-height overflow pattern.
Why it matters
One proof profile can find a whole class of layout bugs across a site.
What changed
For app shells: measure route-root bounds, fixed-nav offsets, document scroll policy, and top layout offenders across a route inventory.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
Full-screen game routes should fit the visible app shell instead of exceeding the viewport by the fixed navigation height.

Claim: Full-screen game routes should fit the visible app shell instead of exceeding the viewport by the fixed navigation height.

Bug: Multiple older full-screen routes were exactly one fixed navigation bar too tall on desktop/tablet/phone.

Why normal checks missed it: Each route still loaded and looked mostly functional. The bug only became obvious when route-root bounds were measured across a viewport matrix.

Why this sells Riddle Proof: One proof profile can find a whole class of layout bugs across a site. The result is not just a screenshot; it is a reusable test idea.

Reusable profile seed: For app shells: measure route-root bounds, fixed-nav offsets, document scroll policy, and top layout offenders across a route inventory.

What the browser run checked

  • Inventoried representative game routes instead of checking one page.
  • Measured route-root bounds across desktop, tablet, and phone shapes.
  • Captured route screenshots so the numeric overflow had visual context.
  • Grouped repeated offenders into a single app-shell pattern.

Proof lesson

A generic app-shell profile can find repeated layout classes: fixed nav offset, route root height, scroll policy, and top offenders.

ArtifactTypeWhat it proves
Representative route screenshotPNG screenshot

Gives a concrete example from the route inventory run where app-shell sizing mattered.

Run receiptJSON metadata

Shows this was a longer route-inventory browser run, not a static screenshot pasted into a page.

Console captureJSON logs

Keeps route-load console evidence with the same browser run.

Catch 85

A green semantic state still hid the win result

Back to top
A green semantic state still hid the win result evidence screenshot
May 13, 2026penniesvisual evidencecanvasquality
Plain-English catch card

A green semantic state still hid the win result

Screenshots are not just decoration.

What went wrong
Gem Mine reached the escaped state semantically, but the user-facing terminal panel did not clearly show ESCAPED!
What Riddle caught
The after-continue screenshot shows the game state after escape, making it possible to judge whether the win/result was actually visible to a player.
Why it matters
A test can be technically green while the customer experience is still unclear.
What changed
For canvas/game flows: pair state assertions with visible end-state screenshots and human-readable HUD/result checks.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
When a game reaches a win/escape state, the visible UI should clearly communicate that result to the player.

Claim: When a game reaches a win/escape state, the visible UI should clearly communicate that result to the player.

Bug: Gem Mine reached the escaped state semantically, but the user-facing terminal panel did not clearly show ESCAPED! until the screenshot review caught it.

Why normal checks missed it: State assertions proved the game outcome. They did not prove that the outcome was visible and understandable to a player.

Why this sells Riddle Proof: A test can be technically green while the customer experience is still unclear. Riddle Proof keeps semantic checks and screenshots in the same receipt.

Reusable profile seed: For canvas/game flows: pair state assertions with visible end-state screenshots and human-readable HUD/result checks.

What the browser run checked

  • Drove the game through the continue/escape path in a browser.
  • Captured the post-state screenshot rather than stopping at a green semantic assertion.
  • Compared machine-readable state against what a human could see in the terminal panel.

Proof lesson

Screenshots are not just decoration. They catch places where machine-readable state is stronger than the actual user experience.

ArtifactTypeWhat it proves
After-continue screenshotPNG screenshot

Shows the actual end-state UI that a player would review.

Run receiptJSON metadata

Records the run as browser evidence with timing and script status.

Console captureJSON logs

Keeps runtime logs attached to the same end-state proof.

Catch 86

Restart-only texture errors after gameplay looked fine

Back to top
Restart-only texture errors after gameplay looked fine evidence screenshot
May 13, 2026< $0.03console errorsgame lifecyclerestart
Plain-English catch card

Restart-only texture errors after gameplay looked fine

Terminal/recovery proof finds defects that only appear after users finish, restart, replay, or revisit a route.

What went wrong
Classic Slalom passed its behavior checks, but repeated scene restart generated 180 duplicate Phaser texture-key console errors.
What Riddle caught
The restart screenshot captures the post-restart state; the same browser run also recorded the duplicate Phaser texture-key errors that did not appear on first load.
Why it matters
Recovery paths and repeat use are where many browser bugs hide.
What changed
For games and rich apps: include finish/restart/revisit flows, console-error budgets, and post-recovery screenshots.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
A game should be able to finish and restart without accumulating duplicate runtime errors.

Claim: A game should be able to finish and restart without accumulating duplicate runtime errors.

Bug: Classic Slalom passed its behavior checks, but repeated scene restart generated 180 duplicate Phaser texture-key console errors.

Why normal checks missed it: A first-load smoke test would stop before restart. The bug lived in the lifecycle, not the happy-path load.

Why this sells Riddle Proof: Recovery paths and repeat use are where many browser bugs hide. The manifest shows the screenshot and the log trail in one place.

Reusable profile seed: For games and rich apps: include finish/restart/revisit flows, console-error budgets, and post-recovery screenshots.

What the browser run checked

  • Loaded and played beyond the first render.
  • Restarted the game scene instead of stopping after initial success.
  • Captured the post-restart screenshot and console output together.
  • Flagged lifecycle errors that were invisible in a first-load smoke test.

Proof lesson

Terminal/recovery proof finds defects that only appear after users finish, restart, replay, or revisit a route.

ArtifactTypeWhat it proves
After-restart screenshotPNG screenshot

Shows the post-restart browser state that paired with the console-error finding.

Run receiptJSON metadata

Records the longer lifecycle run and script status.

Console captureJSON logs

The important non-visual artifact: it preserves the duplicate texture-key errors.

Catch 87

The homepage rendered games, but hid community games

Back to top
The homepage rendered games, but hid community games evidence screenshot
May 13, 2026< $0.01manifest driftroute inventoryintegration
Plain-English catch card

The homepage rendered games, but hid community games

Route inventory should prove both direct route health and source-page clickthrough/discovery health.

What went wrong
Saved community games loaded directly by URL, but the homepage did not list them because the manifest schema had drifted.
What Riddle caught
The homepage screenshot shows the discovered community-games area after the manifest fix, tying the bug to route discovery rather than the direct player route.
Why it matters
A site can have healthy destination pages while the conversion/discovery path is broken.
What changed
For content or generated-route apps: check direct routes, source-page inventories, and clickthroughs from the pages where users discover those routes.
What this does not prove
It does not prove the whole product is perfect; it proves this specific browser/app claim with the listed artifacts.
Technical receipt
If generated/community routes load directly, the homepage should also expose them through discovery and clickthrough paths.

Claim: If generated/community routes load directly, the homepage should also expose them through discovery and clickthrough paths.

Bug: Saved community games loaded directly by URL, but the homepage did not list them because the manifest schema had drifted.

Why normal checks missed it: The player route was healthy and built-in routes were healthy. The bug was in discovery: the source page failed to expose a valid route.

Why this sells Riddle Proof: A site can have healthy destination pages while the conversion/discovery path is broken. Riddle Proof treats source-page exposure as part of the contract.

Reusable profile seed: For content or generated-route apps: check direct routes, source-page inventories, and clickthroughs from the pages where users discover those routes.

What the browser run checked

  • Compared direct route health with homepage discovery health.
  • Inspected the rendered homepage state after the manifest changed.
  • Captured the page area that should expose the generated routes.

Proof lesson

Route inventory should prove both direct route health and source-page clickthrough/discovery health.

ArtifactTypeWhat it proves
Homepage discovery screenshotPNG screenshot

Shows the source page where generated routes should be discoverable.

Run receiptJSON metadata

Records the browser run that checked the discovery path.

Console captureJSON logs

Keeps page-load logs attached to the same discovery proof.