Autonomous Regression

ID: UC-5 Phase: Phase 2 (requires agent runner infrastructure) Compute: heavy — browser sandbox + LLM inference per run

Problem

After every deploy, someone needs to manually verify that key flows still work. This is repetitive, time-consuming, and usually the first thing skipped under pressure — leading to regressions reaching users.

Solution

Hackorda's agent runner autonomously exercises a target URL following the cycle's test plan. It navigates the app, observes state, and files structured issues for anything that looks broken or anomalous — with full context (screenshot, console errors, network failures, reproduction steps).

How it works

Admin in Hackorda:
  1. Creates a "regression" cycle for the product
  2. Adds a test plan doc with the flows to cover
  3. Sets a deploy trigger: "run after every staging deploy"

Agent runner (per triggered run):
  1. Reads the cycle's test plan via get_cycle_doc()
  2. Spawns a sandboxed Playwright browser
  3. Drives an agent loop:
     a. LLM reads next test step from the plan
     b. LLM decides what action to take (click, type, navigate)
     c. Playwright executes the action
     d. Screenshots the result
     e. LLM evaluates: "did this step pass or reveal a bug?"
     f. On anomaly: captures full context → file_issue()
     g. Repeat until all steps done or max-steps reached
  4. Completes the run with a summary
  5. Admin sees new issues in the triage queue

What an agent-filed issue looks like

Title: "Checkout flow broken — payment step shows blank page"
Type: bug
Severity: critical (agent-assessed)
Description: |
  During regression run on staging.app.com after deploy #482.
  
  Step 4 of checkout flow: after entering card details and clicking
  "Pay Now", the page went blank instead of showing the confirmation.
  
  Console error: "TypeError: Cannot read property 'orderId' of undefined"
  Network: POST /api/checkout returned 500

Steps to reproduce:
  1. Add item to cart at /products/123
  2. Click "Checkout"
  3. Enter test card: 4242 4242 4242 4242
  4. Click "Pay Now"
  → blank page + console error

URL: https://staging.app.com/checkout
Reporter: Hackorda Agent Runner (run-abc123)
Screenshot: [attached]
Console log: [attached]

Compute requirements

Resource	Per run	Notes
Browser sandbox	2 vCPU / 2 GB RAM	Playwright Chromium
LLM calls	~20–50 per run	Anthropic Sonnet
Duration	5–15 min	Depends on test plan length
Artifacts	50–200 MB	Screenshots + traces
Cost	~$0.10–0.30	See agent-platform.md §6.2

What this isn't

Not a replacement for unit/integration tests (those are faster, cheaper, run in CI)
Not guaranteed to find every bug (it follows the test plan; exploratory coverage is limited)
Not a human tester (it can't judge "this looks ugly but works")

Setup (Phase 2)

Create a cycle of type regression
Add a test plan doc with step-by-step flows
Set a webhook trigger from your deploy system → POST /api/runner/trigger
Agent runner picks up the job from the queue