Agent Platform

Strategic document. Defines how Hackorda becomes accessible to and runnable by AI agents — the MCP/connector surface, the compute map, the permissions model extended to agents, and the two-phase roadmap. Direction set with the owner on 2026-05-30.

Companion docs: Core Feature Roadmap · Infra Roadmap · System Overview · Flows · Feature Matrix

1. The reference model: Firecrawl + AgentMail

Both Firecrawl and AgentMail made the same strategic bet: expose your domain capability as infrastructure that AI agents can call, not just as a UI humans click. They did this by:

Wrapping existing API endpoints as typed MCP tools with clear names, inputs, and outputs.
Adding agent-native auth — scoped API keys with permission bounds, so an agent can be given exactly read:cycles + file:issues and nothing more.
Designing outputs for LLM consumption — structured JSON, not HTML; summaries + IDs, not full blobs; pagination that agents can walk.
Publishing an MCP server that any MCP-compatible host (Claude Desktop, Cursor, custom agent) can install in seconds.

The result: their product became a verb in AI workflows. A user doesn't go to Firecrawl's UI to scrape; they tell their agent "scrape this site" and the agent calls Firecrawl under the hood.

Hackorda can do the same — and the domain is even better suited: QA workflows are repetitive, structured, and benefit from automation at every step.

2. The two-phase strategy

Phase 1 — Agents USE Hackorda (the MCP surface)

External AI agents (Claude, GPT, custom agents in customer workflows, or Hackorda's own workflow automation) call Hackorda as a service:

Customer agent or AI workflow
        │
        │  MCP tools / REST API + API key
        ▼
  Hackorda MCP Server
        │
        ▼
  Existing Next.js API (/api/test-cycles/*, /api/admin/*)
        │
        ▼
  Postgres (existing data model)

Compute: light. A stateless MCP server wrapping the existing API — same compute footprint as an extra API route.

Business model: usage-based API (per tool call) or included in per-seat SaaS. Agents calling Hackorda = more activity = more value lock-in.

Phase 2 — Hackorda RUNS agents (agentic QA)

Hackorda itself operates autonomous AI testers that drive real browsers, find bugs, and file them as a service. Humans assign test targets; agents do the testing.

Admin assigns target URL + test scenario
        │
        ▼
  Hackorda Agent Runner
  ├── Spawns sandboxed browser (Playwright)
  ├── Drives agent loop (LLM → actions → observations)
  ├── Detects anomalies / unexpected states
  └── Files structured issues via the same API
        │
        ▼
  Normal triage/payout flow

Compute: heavy. Sandboxed browser processes per concurrent run, LLM inference per test session, artifact storage (screenshots, traces). Billed separately per run.

Business model: per-test-run or per-bug-found. Replaces (or augments) human testers for regression / smoke test coverage.

3. Full permissions model — agents as actors

The existing model has two axes: system role × per-cycle role (see system-overview.md §4). Agents add a third actor type that must be mapped into both axes cleanly.

3.1 The five actor types

Actor	Identity	System role	Per-cycle role	Notes
Human admin	Clerk user	`ADMIN` (1) or `SUPER_ADMIN` (5)	—	Global access
Human tester	Clerk user	`QA` (4)	`tester` or `lead`	Cycle-scoped
Human observer	Clerk user	`QA` (4)	`observer`	Read-only in cycle
Agent (external)	API key	Bound to key's permission set	Optional cycle scope	Programmatic access
Agent (runner)	Internal system token	`SUPER_ADMIN` (5) internally	Writes to any cycle	Only the Runner service, not exposed externally

3.2 Agent API key model

External agents authenticate via a scoped API key — a new auth layer sitting alongside Clerk. Each key has:

ApiKey {
  id            uuid
  orgId         uuid           -- which org this key belongs to
  name          text           -- e.g. "Claude workflow - staging"
  keyHash       text           -- SHA-256 of the actual key
  scopes        text[]         -- e.g. ['cycles:read', 'issues:write', 'triage:read']
  cycleIds      uuid[]         -- optional: restrict to specific cycles; empty = all in org
  rateLimit     int            -- calls per minute
  expiresAt     timestamptz    -- optional expiry
  lastUsedAt    timestamptz
  createdBy     uuid           -- user who created it
}

The key is passed as Authorization: Bearer hk_live_... — distinct prefix from Clerk tokens so the middleware can route correctly.

3.3 Permission scopes catalog

Scopes follow <resource>:<action> naming, matching the MCP tool catalog:

Scope	What it allows
`cycles:read`	List + get cycles, docs, members
`cycles:write`	Create/update cycles (admin scope)
`issues:read`	Read issues, comments, attachments
`issues:write`	File issues, add comments
`issues:triage`	Approve/reject/reclassify — admin scope
`runs:write`	Start/end test runs
`payouts:read`	Read payout status + balance
`payouts:write`	Trigger batch + mark paid — super-admin scope
`analytics:read`	Usage events, cycle reports
`ai:write`	Trigger AI re-analysis on an issue
`admin:read`	Read org/product/user data — admin scope

Scopes are additive + least-privilege. A key for a triage automation gets issues:read + issues:triage; a key for a filing agent gets cycles:read + issues:write + runs:write.

A key with cycleIds: [uuid1, uuid2] can only interact with those cycles — the API enforces the same checkTestCycleAccess() gate that human testers hit, but resolves via the key's cycleIds instead of the testers table. An empty cycleIds means all cycles in the key's org.

3.5 How the middleware changes

Incoming request
   │
   ├── Authorization: Bearer ey...  (Clerk token)  → existing Clerk flow
   │
   └── Authorization: Bearer hk_...  (API key)     → new path:
            │
            ├── Resolve key from DB (cache in Redis/memory)
            ├── Validate scopes vs route requirements
            ├── Validate cycleId restriction if present
            ├── Rate-limit check
            └── Inject ApiKeyContext (replaces AuthContext)

No change to existing human flows — the new path is additive.

4. The MCP tool catalog

The MCP server exposes Hackorda's domain as typed tools. Each tool maps to one or more existing API routes. Initial catalog — Phase 1 launch set:

4.1 Cycle tools

Tool name	Maps to	Scope	Description
`list_cycles`	`GET /api/test-cycles/browse`	`cycles:read`	List cycles the key's org has access to. Returns id, name, status, product.
`get_cycle`	`GET /api/test-cycles/[id]`	`cycles:read`	Full cycle detail: docs, members, payout rates, status.
`create_cycle`	`POST /api/admin/test-cycles`	`cycles:write`	Create a cycle for an org/product.
`update_cycle_status`	`PATCH /api/admin/test-cycles/[id]`	`cycles:write`	Advance status: planned → active → review → closed.
`list_cycle_docs`	`GET /api/test-cycles/[id]/documents`	`cycles:read`	List docs (briefs, runbooks, reports) for a cycle.
`get_cycle_doc`	`GET /api/test-cycles/[id]/documents/[docId]`	`cycles:read`	Full markdown content of a cycle doc.

4.2 Issue tools

Tool name	Maps to	Scope	Description
`list_issues`	`GET /api/test-cycles/issues`	`issues:read`	Cross-cycle issue list. Filterable by severity, status, payout status, cycle.
`get_issue`	`GET /api/test-cycles/[id]/issues/[issueId]`	`issues:read`	Full issue: description, steps, attachments, AI suggestions, payout.
`file_issue`	`POST /api/test-cycles/[id]/issues`	`issues:write`	File a new bug. Accepts title, description, steps, expected/actual, severity, attachments. Returns issueId.
`comment_on_issue`	`POST /api/test-cycles/[id]/issues/[issueId]/comments`	`issues:write`	Add a comment (markdown).
`get_issue_comments`	`GET /api/test-cycles/[id]/issues/[issueId]/comments`	`issues:read`	Thread of comments on an issue.
`trigger_ai_analysis`	`POST /api/test-cycles/[id]/issues/[issueId]/intake`	`ai:write`	Re-run AI intake on an issue (get title/severity/type suggestions).

4.3 Triage tools (admin scope)

Tool name	Maps to	Scope	Description
`list_triage_queue`	`GET /api/admin/triage`	`issues:triage`	All pending issues awaiting a payout decision.
`decide_issue`	`POST /api/admin/triage/decide`	`issues:triage`	Approve or reject a payout. Accepts issueId, decision, optional new severity + amount.
`get_payout_status`	`GET /api/admin/test-cycles/[id]/payouts/by-tester`	`payouts:read`	Payout breakdown per tester for a cycle.

4.4 Run tools

Tool name	Maps to	Scope	Description
`start_run`	`POST /api/test-cycles/[id]/runs`	`runs:write`	Start a test run in a cycle. Returns runId.
`complete_run`	`PATCH /api/test-cycles/[id]/runs/[runId]`	`runs:write`	Mark a run complete with notes.

4.5 Resource / read tools

Tool name	Maps to	Scope	Description
`get_balance`	`GET /api/me/balance`	`payouts:read`	Tester's earnings breakdown (pending verification, available, paid).
`list_organizations`	`GET /api/admin/organizations`	`admin:read`	Orgs the key has access to.
`get_cycle_report`	`GET /api/admin/test-cycles/[id]`	`cycles:read`	Cycle summary: issue counts by severity/status, payout totals.

4.6 Tool output design principles

All tools return agent-readable JSON — not paginated HTML or UI-shaped responses:

IDs always present for follow-up calls (issueId, cycleId, runId).
Status as enum strings ("open", "approved") not display labels.
Truncate long markdown bodies by default; pass full=true to get the full content.
List responses include total + cursor for agents that need to walk pages.
Error responses always include code (machine-readable) + message (human-readable).

5. Key use cases

UC-1: Triage automation agent

An agent in an admin's workflow runs each morning, reviews the triage queue, applies consistent severity standards, and pre-approves low-risk issues.

list_triage_queue()
  → for each issue:
      get_issue(issueId)
      trigger_ai_analysis(issueId)   # ensure fresh suggestions
      if issue.aiSuggestions.confidence > 0.9 and severity == 'low':
          decide_issue(issueId, decision='approve', severity='low')
      else:
          # leave for human review

Scope needed: issues:read + issues:triage + ai:write Value: Admin saves 60–80% of triage time on low-severity backlog.

UC-2: Automated bug filing from CI

A CI pipeline (GitHub Actions, Jenkins) catches a failing test and automatically files a structured bug report in the active cycle.

# In CI workflow, on test failure:
list_cycles(status='active', productId=env.PRODUCT_ID)
  → get the active cycle id
start_run(cycleId)
  → runId
file_issue(cycleId, {
  title: test.name,
  description: test.failureMessage,
  stepsToReproduce: test.steps,
  severity: 'high',
  url: deployUrl,
  type: 'bug'
})
complete_run(cycleId, runId)

Scope needed: cycles:read + issues:write + runs:write Value: Zero-latency bug reporting from automated test suites. Every CI failure becomes a tracked, payable bug if a tester confirms it.

UC-3: QA agent in Claude/Cursor

A developer uses Claude Desktop with the Hackorda MCP server installed. "What bugs are open in the v0.9 cycle?" → agent calls list_issues and returns a structured summary. "File that as a bug" → agent calls file_issue with the conversation context.

Scope needed: cycles:read + issues:read + issues:write Value: QA workflow lives inside the developer's existing AI assistant. No context switch to a separate tool.

UC-4: Linear → Hackorda sync agent (pairs with roadmap F)

An agent polls Linear for status changes and updates the corresponding Hackorda issue's externalStatus, keeping the payout pipeline accurate without manual intervention.

Scope needed: issues:read + issues:write (to update external status) Value: Closes the "deferred" Linear webhook gap without a full webhook infrastructure build.

UC-5 (Phase 2): Autonomous regression tester

Admin schedules "run a smoke test against staging.product.com after every deploy." Hackorda's agent runner boots a sandboxed browser, navigates the app following the cycle's test plan, and files any anomalies it finds.

Scope needed: Internal runner token (not external API key). Compute: See §6.2.

6. Compute map

6.1 Phase 1 — MCP surface (light)

Component	Compute	Where
MCP server process	~50 MB, stateless	Same DO droplet as app, or tiny dedicated
API key table + scope check	Postgres query	Existing DB (Neon)
Rate limiter	Existing `rate_limit_buckets` table	Existing DB
Key cache	In-process LRU (< 1 MB)	MCP server memory

No new infra needed for Phase 1. The MCP server is a thin gateway process deployable on the existing droplet.

6.2 Phase 2 — Agent runner (heavy)

Component	Compute	Scale
Browser sandbox	1–2 CPU + 2 GB RAM per concurrent run (Playwright)	1 container per run
LLM inference	Anthropic API calls per agent step (~10–50 steps/run)	Per-run cost
Artifact storage	Screenshots, traces, video (~50–200 MB/run)	DO Spaces / S3
Runner orchestrator	1 small process	Shared with worker (Phase 1)
Sandbox isolation	Docker-in-Docker or separate container per run	1 run = 1 container

Rough per-run cost estimate:

Browser container: ~$0.01–0.05 (5–15 min of a 2 vCPU/2 GB droplet)
Anthropic calls: ~$0.05–0.20 (10–50 steps × ~$0.003/step, Sonnet)
Storage: ~$0.001–0.005 (50–200 MB at DO Spaces pricing)
Total: ~$0.10–0.30 per run

Infra shape for Phase 2:

Runner VM (separate from app, 4 vCPU / 8 GB):
  ├── Runner orchestrator process (pg-boss worker)
  ├── Docker daemon
  ├── Container pool: up to N concurrent browser sandboxes
  └── Artifact uploader → DO Spaces

  N concurrent runs = N × (2 CPU + 2 GB)
  A 4 vCPU / 8 GB droplet → 2 concurrent runs
  Scale: add runner VMs horizontally

7. The full wiki structure

Based on the Firecrawl/Linear model — docs organized for both humans and AI agents (agents increasingly read docs to understand how to use a platform).

docs/
├── README.md                        ← index / "start here"
├── system-overview.md               ← architecture, stack, permissions
├── roadmap.md                       ← infra roadmap (Phase 0→4)
├── feature-roadmap.md               ← product roadmap (buckets A→I)
├── feature-matrix.md                ← current feature state
├── agent-platform.md                ← this doc: agent strategy
│
├── guides/                          ← NEW: task-oriented how-tos
│   ├── agent-quickstart.md          ← "file your first bug via MCP in 5 min"
│   ├── api-key-setup.md             ← create + scope an API key
│   ├── mcp-server-install.md        ← Claude Desktop, Cursor, custom agent
│   └── ci-integration.md           ← GitHub Actions + Hackorda
│
├── reference/                       ← NEW: exhaustive reference (agent-readable)
│   ├── mcp-tools.md                 ← every MCP tool, inputs/outputs, examples
│   ├── api-routes.md                ← existing (update with agent endpoints)
│   ├── permissions.md               ← NEW: full scope/role/key model
│   ├── webhooks.md                  ← NEW (Phase 1B): event webhooks
│   └── errors.md                    ← error codes catalog
│
├── flows/                           ← canonical user journeys (F-01→F-16)
│   └── (existing 16 flow files)
│
├── use-cases/                       ← NEW: UC-1→UC-N (this doc §5, expanded)
│   ├── triage-automation.md
│   ├── ci-bug-filing.md
│   ├── claude-desktop-qa.md
│   └── autonomous-regression.md    ← Phase 2
│
├── authentication.md                ← existing (update with API key model)
├── deployment.md                    ← existing
└── ops/
    ├── database.md                  ← existing
    └── self-hosted-runner.md        ← existing

8. The build roadmap for agent features

Phase 1A — Foundation (prerequisite, no user-visible features)

api_keys table + Drizzle schema
API key middleware (sits alongside Clerk middleware)
Admin UI: create / revoke / scope API keys per org
Rate-limiting reuse of existing rate_limit_buckets

Phase 1B — MCP server

packages/mcp-server/ — standalone Node process using @modelcontextprotocol/sdk
Implements the Phase 1 launch set (§4.1–4.5)
Deployed alongside the app
Published to npm for self-hosting (optional, low cost)
Agent-readable error messages + structured outputs

Phase 1C — Connector ecosystem

REST API documentation (OpenAPI spec → usable in any connector platform)
Zapier / Make.com connector (file issue, list issues, update status)
Webhook outbound events (issue filed, triage decided, payout released)

Phase 2A — Runner infrastructure

runner/ service: pg-boss worker + Docker orchestrator
Sandbox container image (Playwright + Anthropic SDK + artifact uploader)
Admin UI: schedule a run, set target URL + test plan
Run result → structured issues + tester attribution

Phase 2B — Agentic test plans

AI agent uses cycle's doc (test plan/runbook) as the instruction set
Executes steps, takes screenshots at each, compares to expected state
Files anomalies with full context: screenshot, page URL, console errors

9. Business value summary

Capability	Value to customers	Revenue model
MCP server	QA workflow in their AI assistant (no context switch)	Included in SaaS plan or per-seat API tier
API keys	Integrate Hackorda into CI/CD, internal tools	Unblocks enterprise customers who won't use OAuth
Webhooks	Real-time sync to Slack, Linear, Jira without polling	Sticky integration = retention
Agent triage	Reduces admin triage time by 60–80%	Feature premium or included in growth tier
Agent runner	Continuous regression testing without human testers	Per-run billing — new revenue stream
Connector marketplace	Lower barrier to adoption	Marketplace distribution = acquisition

Why this matters competitively: QA platforms that become composable infrastructure (callable by agents) have a different retention curve than those that are just UI tools. Every AI workflow that depends on Hackorda is a workflow that can't easily be ripped out.

10. Open decisions (owner's call)

API key billing tier — included in existing seats, or a separate API plan? (Affects pricing architecture before Phase 1A ships.)
MCP server distribution — hosted only (SaaS), self-hosted (open source npm package), or both? Firecrawl does both.
Webhook delivery — in-process (fire-and-forget, same durability gap as current AI calls) vs durable (through the Phase 1 job queue). Should wait for Phase 1 queue.
Runner isolation — Docker-in-Docker on a shared VM vs dedicated per-run Firecracker/Fly Machines. DinD is simpler; Firecracker is more secure for untrusted targets.
Agent runner pricing — per-run flat fee vs per-minute vs per-bug-found? Per-bug-found is most aligned but hard to meter.