Use cases
Autonomous Regression
ID: UC-5 Phase: Phase 2 (requires agent runner infrastructure) Compute: heavy — browser sandbox + LLM inference per run
Problem
After every deploy, someone needs to manually verify that key flows still work. This is repetitive, time-consuming, and usually the first thing skipped under pressure — leading to regressions reaching users.
Solution
Hackorda's agent runner autonomously exercises a target URL following the cycle's test plan. It navigates the app, observes state, and files structured issues for anything that looks broken or anomalous — with full context (screenshot, console errors, network failures, reproduction steps).
How it works
Admin in Hackorda:
1. Creates a "regression" cycle for the product
2. Adds a test plan doc with the flows to cover
3. Sets a deploy trigger: "run after every staging deploy"
Agent runner (per triggered run):
1. Reads the cycle's test plan via get_cycle_doc()
2. Spawns a sandboxed Playwright browser
3. Drives an agent loop:
a. LLM reads next test step from the plan
b. LLM decides what action to take (click, type, navigate)
c. Playwright executes the action
d. Screenshots the result
e. LLM evaluates: "did this step pass or reveal a bug?"
f. On anomaly: captures full context → file_issue()
g. Repeat until all steps done or max-steps reached
4. Completes the run with a summary
5. Admin sees new issues in the triage queueWhat an agent-filed issue looks like
Title: "Checkout flow broken — payment step shows blank page"
Type: bug
Severity: critical (agent-assessed)
Description: |
During regression run on staging.app.com after deploy #482.
Step 4 of checkout flow: after entering card details and clicking
"Pay Now", the page went blank instead of showing the confirmation.
Console error: "TypeError: Cannot read property 'orderId' of undefined"
Network: POST /api/checkout returned 500
Steps to reproduce:
1. Add item to cart at /products/123
2. Click "Checkout"
3. Enter test card: 4242 4242 4242 4242
4. Click "Pay Now"
→ blank page + console error
URL: https://staging.app.com/checkout
Reporter: Hackorda Agent Runner (run-abc123)
Screenshot: [attached]
Console log: [attached]Compute requirements
| Resource | Per run | Notes |
|---|---|---|
| Browser sandbox | 2 vCPU / 2 GB RAM | Playwright Chromium |
| LLM calls | ~20–50 per run | Anthropic Sonnet |
| Duration | 5–15 min | Depends on test plan length |
| Artifacts | 50–200 MB | Screenshots + traces |
| Cost | ~$0.10–0.30 | See agent-platform.md §6.2 |
What this isn't
- Not a replacement for unit/integration tests (those are faster, cheaper, run in CI)
- Not guaranteed to find every bug (it follows the test plan; exploratory coverage is limited)
- Not a human tester (it can't judge "this looks ugly but works")
Setup (Phase 2)
- Create a cycle of type
regression - Add a test plan doc with step-by-step flows
- Set a webhook trigger from your deploy system →
POST /api/runner/trigger - Agent runner picks up the job from the queue