Hackorda Docs

System Overview

Read this first. A single front-door to the Hackorda QA platform: what it is, the stack, how it's wired, who can do what, and how each layer scales. Detailed docs are linked inline — this page is the map, not the territory.


1. What Hackorda is

Hackorda is a QA bug-bounty platform. The loop:

Admin sets up a cycle  →  testers join & file bugs (with screenshots)
   →  AI pre-triages each bug  →  admin approves a payout by severity
   →  (optional) verification gate + Linear export  →  testers get paid

An organization puts a product version under test as a test cycle. Testers run through it, file issues with attachments, an AI intake agent suggests title/severity/type, admins/leads triage and set payouts, and testers are paid in period batch runs — with an optional verification gate before money goes out and optional export to Linear.

The product is substantially shipped — see the full registry in Feature Matrix. Snapshot:

AreaState
Identity, role management, admin impersonation✅ shipped (multi-tenant deferred)
Catalog: orgs → products → versions → cycles✅ shipped
Tester flow: runs, file issue, drafts, comments, docs✅ shipped
Triage: cross-cycle queue, reclassify, mobile, lightbox✅ shipped
Payouts: period batch run, per-tester, verification gate✅ shipped (void/correct UI pending)
Tester earnings: balance buckets, CSV export✅ shipped
Notifications: in-app bell, 11 event types✅ (email/Telegram deferred)
AI agents: intake, run-summary, cycle-close, provenance✅ (cost-cap not enforced; dup-detect deferred)
Integrations: Linear export + admin UI✅ (webhook/Jira/GitHub deferred)

2. Tech stack

LayerChoiceNotes
FrameworkNext.js 15 (App Router, RSC)API routes + server components in one app
UIReact 19, TailwindCSS 4, Radix + shadcn, Lucide, Sonnercomponents/ui/ are shadcn primitives
LanguageTypeScript 5 (strict)tsc --noEmit gates every PR
DataPostgreSQL + Drizzle ORMone pg.Pool → Drizzle (src/db/index.ts)
AuthClerksessions + social login; app maps Clerk → users row
AIAnthropic (Sonnet)intake / run-summary / cycle-close agents, vision via DO Spaces URLs
TestsVitest (unit) + Playwright (e2e)unit gates every PR; e2e gated on E2E_ENABLED
InfraDigitalOcean droplet + Caddy + Docker + GHCRself-managed; details below

Full inventory: Tech Stack.


3. How it's all set up

3.1 Runtime architecture (production)

                          Internet (https://hackorda.kz)
                                     │  TLS
                          ┌──────────▼───────────┐
                          │  Caddy (reverse proxy)│   :443 → 127.0.0.1:3001
                          └──────────┬───────────┘
   ┌─────────────────────────────────┼─────────────────────────────────┐
   │  DigitalOcean droplet           │                                  │
   │                      ┌──────────▼───────────┐                      │
   │                      │ hackorda-app (Docker) │  Next.js :3000       │
   │                      │  /api/health          │  (mapped to :3001)   │
   │                      └──────────┬───────────┘                      │
   │                                 │ pg.Pool (DB_POOL_MAX)             │
   │                      ┌──────────▼───────────┐                      │
   │                      │ Postgres (self-managed)│                     │
   │                      └──────────────────────┘                      │
   │                                                                     │
   │   ⚠️ also on this box today: the self-hosted GitHub Actions runner │
   │      + 4 GB swap + disk guards (scripts/setup-swap.sh)             │
   └─────────────────────────────────────────────────────────────────────┘

Everything (app + DB + CI runner) is co-located on one droplet today. That density is the root of the deploy incident in §5.4 and the #1 thing to unbundle as you scale.

3.2 Code architecture — the 3-tier service layer

Backend domains follow a strict 3-tier shape (canonical example: src/lib/payouts/):

route.ts          parse → authorize → call service → format response   (no SQL)

service (lib/<domain>)   business logic, transactions, zod inputs, typed errors

db/schema        drizzle tables + enum constants only

Rules and rationale: Service Layer. The migration to this layout is in progress (some routes still touch @/db directly — a per-PR ratchet is moving them).

Code-quality gates (CI-enforced via scripts/audit-frontend.ts): file-size caps per layer (routes 600 / components 400 / hooks 500 / lib 600 / db 600), ratcheted against a baseline so files can only shrink. See Agent guide (CLAUDE.md) and the team playbook ops-vault/kos/playbooks/code-quality.md.

3.3 Data model

Schema is split by domain under src/db/schema/: users, test-cycles (cycles + testers membership), issues (issues + payouts + comments + attachments), events/teams (legacy), ai_runs (AI provenance). Full reference: Database Schema. DB connection/SSL/pooling: Database.

3.4 Deployment pipeline

Push to main → GitHub Actions (.github/workflows/workflows.yml):

check        (self-hosted)   lint · tsc · vitest · file-size audit

build-image  (ubuntu-latest) docker build + push → GHCR        ← hosted on purpose

deploy       (self-hosted)   SSH → droplet → deploy.sh → health-check.sh
  • build-image runs on a GitHub-hosted runner (ephemeral, ~16 GB) so the heavy Next.js build can't OOM the droplet — see §5.4.
  • deploy runs on the self-hosted runner and only SSHes to the droplet: docker compose pulldownupmigrate → health-check.
  • e2e (Playwright) is gated on the E2E_ENABLED repo variable.
  • Concurrency: PR runs cancel-on-superseded; main runs never cancel (deploys serialize so they don't race the droplet).

Operational detail: Deployment · Self Hosted Runner.


4. Permissions

Two distinct systems: in-app authorization (who can do what in the product) and repo/ops governance (who can ship and touch infra).

4.1 In-app RBAC — a two-axis model

Axis 1 — system role (users.role_id, global), from ROLES (src/db/schema/_enums.ts):

RoleidMeaning
SUPER_ADMIN5Admin plus gated money/role actions
ADMIN1Everything; can impersonate any user
QA4"Tester" in the UI — a label only, not the access gate
STUDENT2Legacy LMS role
GUEST3Public landing only

Axis 2 — per-cycle role (testers.role, scoped to one cycle), from TESTER_ROLES: lead / tester / observer.

Enforcement (src/lib/auth/):

  • isAdminRoleId() — admin or super-admin clears any admin gate (roles.ts).
  • checkTestCycleAccess() — admins pass globally; everyone else needs a testers row on that specific cycle (test-cycle-auth.ts). The QA system role does not by itself grant cycle access — the testers row does.
  • API routes wrap handlers in requireAdmin* / requireSuperAdmin* / createProtectedRoute → 401/403 (api-middleware.ts).
  • requireSuperAdmin narrows money + role-change actions to the super-admin tier — a deliberate privilege separation.
  • Clerk middleware rewrites unauth /app/* → 404 (why a logged-out request to /app/issues returns 404, not a redirect).

Reference: Authentication. Known cleanups: a legacy 'student' | 'admin' string model coexists with the numeric roles (getUserRole() returns null for QA — harmless today but a trap); and app/api/_utils/validation.ts (637 lines) is over the lib cap.

4.2 Repo & ops governance

SurfaceTodayProper-management target
Branch protectionmain requires an approving review → solo merges need gh pr merge --admin to bypassSolo: require status checks, drop required-review (no --admin needed). 2+ people: required review + CODEOWNERS, never self-merge
Required checkscheck + build run on PRsMark them Required in branch protection so nothing merges red
GHCR auth✅ ephemeral per-run GITHUB_TOKEN (expires with the run)keep as-is
Droplet accessDROPLET_SSH_KEY/HOST/USER in GitHub secrets, deploy user onlyrotate the SSH key; consider a production GitHub Environment with required reviewers to gate deploys
Self-hosted runnerruns on the prod box; private repo so PRs are trustedseparate build VM (never next to prod); never attach to a public repo (fork PRs = RCE) — see Self Hosted Runner
App audit trailrole/payout actions exist; user.role_changed event plannedcapture role-change + payout events so privilege use is traceable

5. How we scale it

The app is stateless (sessions live in Clerk; all state is in Postgres + object storage), so most scaling is "add capacity to a layer." The current bottleneck is everything-on-one-droplet, not the code. Tackle in this order.

5.1 App tier — horizontal scale

The container holds no local state, so it scales horizontally cleanly:

  1. Run N app containers behind a load balancer (DO Load Balancer or Caddy upstreams) instead of one.
  2. One caveat — the scheduler. Production Node runs an in-process auto-transition scheduler. Running it in every replica would double-fire. Set SCHEDULER_DISABLED=true on all but one replica (kill-switch already exists — see Deployment), or move the scheduler to a single worker (§5.3).
  3. Move the app off the DB box first — even before multiple replicas, separating app and Postgres onto different droplets removes the largest contention (and the OOM-near-prod risk).

5.2 Database — the real ceiling

All reads + writes hit one primary today. Scale path (already designed for in Database):

  1. Move to managed Postgres (DO Managed DB) — daily backups + PITR, the single highest-leverage reliability win. SSL is already env-driven (DB_CA_CERT/DB_SSL_MODE), so cutover is config, not code.
  2. Front with a connection pooler (PgBouncer / provider pooler) — point DATABASE_URL at the pooler rather than raising DB_POOL_MAX. This is what lets many app replicas share a bounded connection count.
  3. Read replicaDATABASE_URL_REPLICA is reserved but not yet wired into src/db/index.ts. When report/export/leaderboard reads dominate, route them to a replica (a future PR).

5.3 Background / async work — durability gap

There is no queue and no worker today. The three AI agents (intake, run-summary, cycle-close) are fire-and-forget in the request process (void runIssueIntake(...)). They don't block responses, but if the process restarts mid-call the ai_runs row is stuck running and the result is lost — no retry. Phase 1 (shipped) added idempotency + a Postgres rate limiter on the payout/AI paths. Full analysis + the proposed queue migration: Background Jobs.

Scale path: extract a durable job queue (Postgres-backed table → a single worker process, or a managed queue) so AI work, batch payouts, and the scheduler move off the HTTP path and survive restarts. This also unblocks running multiple app replicas (the worker owns the scheduler).

5.4 CI/CD — capacity & isolation

Today's incident-driven state: the Docker build was OOM-killing the self-hosted runner because it ran on the small prod-adjacent box. Fix shipped (#273): the heavy build now runs on a GitHub-hosted runner; the droplet only does the lightweight SSH deploy; #274 added swap + disk guards as backstop.

Scale path:

  1. Dedicated build VM — the proper end state per Self Hosted Runner: a separate s-2vcpu-4gb+ droplet for the runner, never co-located with prod. Then you can move the build back self-hosted (fixed cost, warm caches) without risking the app.
  2. Runner pool — the workflow already uses labels (self-hosted-build, self-hosted-svc); add runners to a label to parallelize PR checks as the team grows.

5.5 Frontend bundle

Every route pays a ~102 kB shared baseline; the heaviest routes hit ~240 kB First Load JS (admin cycle detail, issue detail). Code-splitting is backlog — the audit + targets are captured in Frontend Bundle. The file-size refactor train (controller-hook + subfolder splits) is the groundwork that makes lazy-loading these routes tractable.

5.6 AI cost

Rate-limiting per user/org is shipped; a per-cycle cost cap is logged but not enforced (ai_runs.cost_usd_cents). Before opening AI usage wide, wire the cap to actually short-circuit calls.

5.7 Staged roadmap

StageTriggerMoves
Nowsingle droplet, all-in-one✅ build on hosted runner · swap + disk guards
Hardenfirst paying cyclesmanaged Postgres (backups/PITR) · split app ↔ DB ↔ runner onto separate VMs
10×sustained loadconnection pooler · durable job queue + worker · multiple app replicas behind LB (SCHEDULER_DISABLED on extras)
Beyondread-heavy reportingread replica (DATABASE_URL_REPLICA) · route code-splitting · enforce AI cost caps

6. Where to find detail

TopicDoc
Full feature registry + flowsFeature Matrix, Flows
Tech stack inventoryTech Stack
Backend architecture patternService Layer
Database schemaDatabase Schema
DB ops (topology, backups, restore)Database
Auth & permissionsAuthentication
Deploy pipeline + host runbookDeployment
Self-hosted runnerSelf Hosted Runner
Scaling: background jobsBackground Jobs
Scaling: frontend bundleFrontend Bundle
Code-quality gatesAgent guide (CLAUDE.md)

On this page