Black-box browser audits
Point Riddle at a URL and catch browser-visible issues without source access: fatal console errors, broken resources, bad routes, mobile overflow, clipped elements, and stale clickthroughs.
Riddle Proof runs real browser checks against previews, production URLs, and black-box sites, then returns the proof receipt: what was checked, what passed, what failed, and the artifacts a human can review.
Point Riddle at a URL and catch browser-visible issues without source access: fatal console errors, broken resources, bad routes, mobile overflow, clipped elements, and stale clickthroughs.
Attach what was checked, which preview ran, which routes passed, which screenshots were captured, and what remains unmeasured. Less “trust me,” more evidence.
Profiles express repeatable contracts: setup actions, storage seeding, network mocks, route inventories, selector text order, responsive bounds, and expected UI states.
The wedge: agents already write code and summaries. Riddle Proof gives them browser evidence: a reviewable receipt that says exactly what was proved, what failed, and what was not measured.
Name the behavior: this route loads, this form reaches success, these docs cards click through, this preview link keeps its base path.
Riddle executes the flow in a hosted Playwright browser across the viewports and states that matter.
The result is a compact receipt: screenshots, route snapshots, console/page errors, resource failures, and structured JSON checks.
{
"target": "https://preview.example.com",
"viewports": ["phone", "ipad", "desktop"],
"checks": [
{
"type": "route_inventory",
"source_selector": ".docs-cards",
"expected_routes": ["/docs/api", "/docs/preview"],
"run_direct_routes": true,
"run_clickthroughs": true
},
{
"type": "selector_text_order",
"selector": ".pricing-row",
"expected_texts": ["Free", "Pro", "Team"]
}
]
}A route can work on production and still fail in a mounted preview when a link escapes the preview basename.
Document scroll width can be clean while a fixed-width iframe is visibly clipped on phone.
Build checks can pass while the browser throws runtime or hydration errors.
A semantic game state can be correct while the visible terminal panel hides the important result text.
Full-screen route roots can be exactly one fixed-nav height too tall across older surfaces.
Restart-heavy proof can catch duplicate generated assets that first-load smoke tests miss.