← All Posts

E2E Testing Tips: Lessons from Testing with Riddle

Practical tips learned from extensive browser automation—including using Riddle to test Riddle itself.

We recently ran a comprehensive test suite against our own playground UI—using Riddle to test Riddle. Along the way, we learned (and re-learned) several lessons about effective browser automation.

Here are the practical tips that will save you time and money.

1. Know When to Use Sync vs Async

Riddle offers two modes, and choosing correctly matters:

Sync Mode

  • 28-second hard limit—your workflow must complete within this window
  • Returns results directly in the response
  • Great for quick screenshots and simple workflows
  • Use include: ['screenshots'] to get JSON with all screenshots
// Sync mode - results come back directly
const response = await fetch("https://api.riddledc.com/v1/run", {
  method: "POST",
  headers: {
    "Authorization": "Bearer " + token,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    steps: [
      { goto: "https://example.com" },
      { screenshot: "homepage" }
    ],
    sync: true,
    timeout_sec: 28,
    include: ["screenshots"]  // Important! Gets JSON with all screenshots
  })
});

const data = await response.json();
// data.screenshots = [{ name: "homepage.png", data: "base64..." }]

Async Mode

  • No time limit (up to 30 minutes)
  • Returns a job ID immediately, poll for results
  • Required for complex, multi-page workflows
  • Better for production pipelines
// Async mode - poll for results
const submitResponse = await fetch("https://api.riddledc.com/v1/run", {
  method: "POST",
  headers: { "Authorization": "Bearer " + token, "Content-Type": "application/json" },
  body: JSON.stringify({
    steps: [...],
    sync: false,  // Async mode
    timeout_sec: 120
  })
});

const { job_id } = await submitResponse.json();

// Poll for completion
const artifacts = await pollForArtifacts(job_id);

Rule of thumb: Use sync for anything under 20 seconds, async for everything else.

2. The include Parameter Changes Everything

Without include, sync mode returns a raw PNG of just the first screenshot. With it, you get rich JSON:

{
  include: ["screenshots", "console", "har"]
}
  • screenshots—All captured screenshots as base64 (or URLs if response is large)
  • console—Browser console logs with timestamps
  • har—Full network request/response data

Pro tip: If you're capturing multiple screenshots in one job, always include screenshotsor you'll only get the first one.

3. Handling Large Responses Gracefully

Multi-page workflows with HAR can generate massive responses. A workflow navigating through several pages might produce a 10MB+ HAR file.

Riddle handles this automatically: when the response would exceed size limits, screenshots and HAR are returned as CDN URLs instead of base64 data:

// Normal response (small)
{
  screenshots: [
    { name: "page1.png", data: "data:image/png;base64,..." }
  ]
}

// Large response (auto-converted to URLs)
{
  screenshots: [
    { name: "page1.png", url: "https://cdn.riddledc.com/..." }
  ],
  har: {
    url: "https://cdn.riddledc.com/...",
    _note: "HAR returned as URL due to size limits"
  },
  _note: "Screenshots returned as URLs instead of base64 due to response size limits"
}

Your code should handle both formats:

function getScreenshotUrl(screenshot) {
  if (screenshot.data) {
    // Base64 data - use directly as image src
    return screenshot.data.startsWith('data:')
      ? screenshot.data
      : `data:image/png;base64,${screenshot.data}`;
  } else if (screenshot.url) {
    // CDN URL - fetch or use directly
    return screenshot.url;
  }
}

4. Always Wait After Navigation

This is the most common source of flaky tests. After any action that triggers navigation, wait for the page to stabilize:

// BAD - screenshot might capture loading state
await page.click("a.next-page");
await saveScreenshot("results");

// GOOD - wait for page to fully load
await page.click("a.next-page");
await page.waitForLoadState("networkidle");
await saveScreenshot("results");

In steps mode (JSON), use the waitForLoadState step:

{
  "steps": [
    { "click": "a.next-page" },
    { "waitForLoadState": "networkidle" },
    { "screenshot": "results" }
  ]
}

networkidle waits until there are no network requests for 500ms—usually what you want for SPAs and dynamic content.

5. Write Reliable Selectors

Generic selectors are the enemy of reliable tests:

// FRAGILE - could match anything
await page.click("button");
await page.click("a");
await page.locator("div").first();

// ROBUST - specific and intentional
await page.click('button:has-text("Submit")');
await page.click('a[href="/dashboard"]');
await page.locator('[data-testid="user-menu"]');

Selector priority (most to least reliable):

  1. data-testid attributes (if available)
  2. Unique IDs: #submit-btn
  3. Text content: button:has-text("Submit")
  4. Specific classes: .primary-action-btn
  5. Attribute selectors: input[name="email"]

6. Use Console Logging for Debugging

When things go wrong, console logs are your best friend:

await page.goto("https://example.com");

// Log what you find
const itemCount = await page.locator(".item").count();
console.log("Found items:", itemCount);

const buttonVisible = await page.locator('button:has-text("Submit")').isVisible();
console.log("Submit button visible:", buttonVisible);

// Log current URL after navigation
await page.click("a.next");
await page.waitForLoadState("networkidle");
console.log("Current URL:", page.url());

await saveScreenshot("debug");

Include console in your request to retrieve these logs:

{
  include: ["screenshots", "console"]
}

// Response includes:
{
  console: {
    summary: { total_entries: 3, log_count: 3 },
    entries: {
      log: [
        { timestamp: 1702..., message: "Found items: 5" },
        { timestamp: 1702..., message: "Submit button visible: true" },
        { timestamp: 1702..., message: "Current URL: https://example.com/results" }
      ]
    }
  }
}

7. Optimize for the 30-Second Minimum

Riddle bills per second with a 30-second minimum per job. This has two implications:

Pack Multiple Screenshots Into One Job

A 5-second job costs the same as a 30-second job. So this:

// 5 separate jobs = 5 × $0.004 = $0.02
await runJob({ url: "https://example.com/page1" });
await runJob({ url: "https://example.com/page2" });
await runJob({ url: "https://example.com/page3" });
await runJob({ url: "https://example.com/page4" });
await runJob({ url: "https://example.com/page5" });

Is much more expensive than this:

// 1 job with 5 screenshots = $0.004
await runJob({
  steps: [
    { goto: "https://example.com/page1" },
    { screenshot: "page1" },
    { goto: "https://example.com/page2" },
    { screenshot: "page2" },
    { goto: "https://example.com/page3" },
    { screenshot: "page3" },
    { goto: "https://example.com/page4" },
    { screenshot: "page4" },
    { goto: "https://example.com/page5" },
    { screenshot: "page5" }
  ]
});

5x cheaper for the same result.

Don't Fear Longer Jobs

After the 30-second minimum, you're only paying ~$0.008/minute. A complex 2-minute workflow costs about $0.02—don't artificially split it into multiple jobs just to "keep things short."

8. The E2E Testing Pattern

Here's the pattern we used to test our own UI:

async function testFeature(token, testName, script) {
  // Submit job
  const job = await submitJob(token, script);
  console.log(`[${testName}] Job: ${job.job_id}`);

  // Poll for completion
  const result = await pollForCompletion(token, job.job_id);
  console.log(`[${testName}] Status: ${result.status}`);

  // Check console output for test result
  const consoleData = await fetchConsole(result.artifacts);
  const logs = consoleData?.entries?.log || [];

  const resultLog = logs.find(l => l.message.includes("RESULT:"));
  const testResult = resultLog?.message.replace("RESULT:", "").trim();

  return {
    test: testName,
    status: testResult === "SUCCESS" ? "PASS" : "FAIL",
    screenshots: result.artifacts.filter(a => a.name.endsWith(".png")).length
  };
}

// The test script reports its own result
const script = `
await page.goto("https://myapp.com/login");
await page.fill("#email", "test@example.com");
await page.fill("#password", "password");
await page.click('button[type="submit"]');
await page.waitForLoadState("networkidle");

const loggedIn = await page.locator(".dashboard").isVisible();
if (loggedIn) {
  console.log("RESULT: SUCCESS");
} else {
  console.log("RESULT: FAIL");
}
await saveScreenshot("final-state");
`;

This pattern lets you run comprehensive test suites and programmatically check results.

9. Give Screenshots Unique, Descriptive Names

Each saveScreenshot() call needs a unique name:

// BAD - overwrites previous screenshot
await saveScreenshot("screenshot");
await page.click(".next");
await saveScreenshot("screenshot");  // Overwrites!

// GOOD - clear, sequential names
await saveScreenshot("1-homepage");
await page.click(".next");
await saveScreenshot("2-after-click");
await page.fill("#search", "query");
await saveScreenshot("3-search-filled");

Descriptive names make debugging much easier when reviewing results.

Quick Reference

SituationSolution
Quick screenshot (<20s)Sync mode
Complex workflowAsync mode
Multiple screenshotsinclude: ["screenshots"]
Debugging failuresinclude: ["console"] + console.log()
Flaky after clickswaitForLoadState("networkidle")
Cost optimizationBatch screenshots into one job

Try It Yourself

The best way to learn is by doing. Start with a simple workflow and build up from there.