Riddle Proof

Turn a URL into evidence an agent can cite.

Riddle Proof runs real browser checks against previews, production URLs, and black-box sites, then returns the proof receipt: what was checked, what passed, what failed, and the artifacts a human can review.

Read the Good Catch Diary Read the docs Try the Playground

<$0.10A ten-minute hosted browser proof is under nine cents of browser time at $0.50/hour.

Source optionalDetection can be black-box. Source helps fix bugs, but many issues are visible from the browser alone.

ProfilesRepeatable contracts for route inventory, setup actions, storage, mocks, selectors, and responsive evidence.

ReceiptsScreenshots, console logs, route checks, network/resource evidence, and explicit caveats.

Not another vague QA score.

browser proof receipts

Black-box browser audits

Point Riddle at a URL and catch browser-visible issues without source access: fatal console errors, broken resources, bad routes, mobile overflow, clipped elements, and stale clickthroughs.

Proof receipts for PRs

Attach what was checked, which preview ran, which routes passed, which screenshots were captured, and what remains unmeasured. Less “trust me,” more evidence.

Agent-native profiles

Profiles express repeatable contracts: setup actions, storage seeding, network mocks, route inventories, selector text order, responsive bounds, and expected UI states.

The wedge: agents already write code and summaries. Riddle Proof gives them browser evidence: a reviewable receipt that says exactly what was proved, what failed, and what was not measured.

The loop is simple.

claim → browser → receipt

Step 1

State the claim

Name the behavior: this route loads, this form reaches success, these docs cards click through, this preview link keeps its base path.

Step 2

Run the browser

Riddle executes the flow in a hosted Playwright browser across the viewports and states that matter.

Step 3

Save the evidence

The result is a compact receipt: screenshots, route snapshots, console/page errors, resource failures, and structured JSON checks.

What a profile can express

Expected routes and real clickthroughs
Phone, tablet, and desktop viewports
Auth/localStorage/sessionStorage setup
Network mocks for frontend-only flows
Text, selector counts, and ordered visible rows
Fatal console/page errors and failed resources

{
  "target": "https://preview.example.com",
  "viewports": ["phone", "ipad", "desktop"],
  "checks": [
    {
      "type": "route_inventory",
      "source_selector": ".docs-cards",
      "expected_routes": ["/docs/api", "/docs/preview"],
      "run_direct_routes": true,
      "run_clickthroughs": true
    },
    {
      "type": "selector_text_order",
      "selector": ".pricing-row",
      "expected_texts": ["Free", "Pro", "Team"]
    }
  ]
}

Recent proof catches show the shape.

from real browser runs

Good catch

A route can work on production and still fail in a mounted preview when a link escapes the preview basename.

Good catch

Document scroll width can be clean while a fixed-width iframe is visibly clipped on phone.

Good catch

Build checks can pass while the browser throws runtime or hydration errors.

Good catch

A semantic game state can be correct while the visible terminal panel hides the important result text.

Good catch

Full-screen route roots can be exactly one fixed-nav height too tall across older surfaces.

Good catch

Restart-heavy proof can catch duplicate generated assets that first-load smoke tests miss.

Open the diary Evidence over summaries