System Overview

Read this first. A single front-door to the Hackorda QA platform: what it is, the stack, how it's wired, who can do what, and how each layer scales. Detailed docs are linked inline — this page is the map, not the territory.

1. What Hackorda is

Hackorda is a QA bug-bounty platform. The loop:

Admin sets up a cycle  →  testers join & file bugs (with screenshots)
   →  AI pre-triages each bug  →  admin approves a payout by severity
   →  (optional) verification gate + Linear export  →  testers get paid

An organization puts a product version under test as a test cycle. Testers run through it, file issues with attachments, an AI intake agent suggests title/severity/type, admins/leads triage and set payouts, and testers are paid in period batch runs — with an optional verification gate before money goes out and optional export to Linear.

The product is substantially shipped — see the full registry in Feature Matrix. Snapshot:

Area	State
Identity, role management, admin impersonation	✅ shipped (multi-tenant deferred)
Catalog: orgs → products → versions → cycles	✅ shipped
Tester flow: runs, file issue, drafts, comments, docs	✅ shipped
Triage: cross-cycle queue, reclassify, mobile, lightbox	✅ shipped
Payouts: period batch run, per-tester, verification gate	✅ shipped (void/correct UI pending)
Tester earnings: balance buckets, CSV export	✅ shipped
Notifications: in-app bell, 11 event types	✅ (email/Telegram deferred)
AI agents: intake, run-summary, cycle-close, provenance	✅ (cost-cap not enforced; dup-detect deferred)
Integrations: Linear export + admin UI	✅ (webhook/Jira/GitHub deferred)

2. Tech stack

Layer	Choice	Notes
Framework	Next.js 15 (App Router, RSC)	API routes + server components in one app
UI	React 19, TailwindCSS 4, Radix + shadcn, Lucide, Sonner	`components/ui/` are shadcn primitives
Language	TypeScript 5 (strict)	`tsc --noEmit` gates every PR
Data	PostgreSQL + Drizzle ORM	one `pg.Pool` → Drizzle (`src/db/index.ts`)
Auth	Clerk	sessions + social login; app maps Clerk → `users` row
AI	Anthropic (Sonnet)	intake / run-summary / cycle-close agents, vision via DO Spaces URLs
Tests	Vitest (unit) + Playwright (e2e)	unit gates every PR; e2e gated on `E2E_ENABLED`
Infra	DigitalOcean droplet + Caddy + Docker + GHCR	self-managed; details below

Full inventory: Tech Stack.

3. How it's all set up

3.1 Runtime architecture (production)

                          Internet (https://hackorda.kz)
                                     │  TLS
                          ┌──────────▼───────────┐
                          │  Caddy (reverse proxy)│   :443 → 127.0.0.1:3001
                          └──────────┬───────────┘
   ┌─────────────────────────────────┼─────────────────────────────────┐
   │  DigitalOcean droplet           │                                  │
   │                      ┌──────────▼───────────┐                      │
   │                      │ hackorda-app (Docker) │  Next.js :3000       │
   │                      │  /api/health          │  (mapped to :3001)   │
   │                      └──────────┬───────────┘                      │
   │                                 │ pg.Pool (DB_POOL_MAX)             │
   │                      ┌──────────▼───────────┐                      │
   │                      │ Postgres (self-managed)│                     │
   │                      └──────────────────────┘                      │
   │                                                                     │
   │   ⚠️ also on this box today: the self-hosted GitHub Actions runner │
   │      + 4 GB swap + disk guards (scripts/setup-swap.sh)             │
   └─────────────────────────────────────────────────────────────────────┘

Everything (app + DB + CI runner) is co-located on one droplet today. That density is the root of the deploy incident in §5.4 and the #1 thing to unbundle as you scale.

3.2 Code architecture — the 3-tier service layer

Backend domains follow a strict 3-tier shape (canonical example: src/lib/payouts/):

route.ts          parse → authorize → call service → format response   (no SQL)
   │
service (lib/<domain>)   business logic, transactions, zod inputs, typed errors
   │
db/schema        drizzle tables + enum constants only

Rules and rationale: Service Layer. The migration to this layout is in progress (some routes still touch @/db directly — a per-PR ratchet is moving them).

Code-quality gates (CI-enforced via scripts/audit-frontend.ts): file-size caps per layer (routes 600 / components 400 / hooks 500 / lib 600 / db 600), ratcheted against a baseline so files can only shrink. See Agent guide (CLAUDE.md) and the team playbook ops-vault/kos/playbooks/code-quality.md.

3.3 Data model

Schema is split by domain under src/db/schema/: users, test-cycles (cycles + testers membership), issues (issues + payouts + comments + attachments), events/teams (legacy), ai_runs (AI provenance). Full reference: Database Schema. DB connection/SSL/pooling: Database.

3.4 Deployment pipeline

Push to main → GitHub Actions (.github/workflows/workflows.yml):

check        (self-hosted)   lint · tsc · vitest · file-size audit
   │
build-image  (ubuntu-latest) docker build + push → GHCR        ← hosted on purpose
   │
deploy       (self-hosted)   SSH → droplet → deploy.sh → health-check.sh

build-image runs on a GitHub-hosted runner (ephemeral, ~16 GB) so the heavy Next.js build can't OOM the droplet — see §5.4.
deploy runs on the self-hosted runner and only SSHes to the droplet: docker compose pull → down → up → migrate → health-check.
e2e (Playwright) is gated on the E2E_ENABLED repo variable.
Concurrency: PR runs cancel-on-superseded; main runs never cancel (deploys serialize so they don't race the droplet).

Operational detail: Deployment · Self Hosted Runner.

4. Permissions

Two distinct systems: in-app authorization (who can do what in the product) and repo/ops governance (who can ship and touch infra).

4.1 In-app RBAC — a two-axis model

Axis 1 — system role (users.role_id, global), from ROLES (src/db/schema/_enums.ts):

Role	id	Meaning
`SUPER_ADMIN`	5	Admin plus gated money/role actions
`ADMIN`	1	Everything; can impersonate any user
`QA`	4	"Tester" in the UI — a label only, not the access gate
`STUDENT`	2	Legacy LMS role
`GUEST`	3	Public landing only

Axis 2 — per-cycle role (testers.role, scoped to one cycle), from TESTER_ROLES: lead / tester / observer.

Enforcement (src/lib/auth/):

isAdminRoleId() — admin or super-admin clears any admin gate (roles.ts).
checkTestCycleAccess() — admins pass globally; everyone else needs a testers row on that specific cycle (test-cycle-auth.ts). The QA system role does not by itself grant cycle access — the testers row does.
API routes wrap handlers in requireAdmin* / requireSuperAdmin* / createProtectedRoute → 401/403 (api-middleware.ts).
requireSuperAdmin narrows money + role-change actions to the super-admin tier — a deliberate privilege separation.
Clerk middleware rewrites unauth /app/* → 404 (why a logged-out request to /app/issues returns 404, not a redirect).

Reference: Authentication. Known cleanups: a legacy 'student' | 'admin' string model coexists with the numeric roles (getUserRole() returns null for QA — harmless today but a trap); and app/api/_utils/validation.ts (637 lines) is over the lib cap.

4.2 Repo & ops governance

Surface	Today	Proper-management target
Branch protection	`main` requires an approving review → solo merges need `gh pr merge --admin` to bypass	Solo: require status checks, drop required-review (no `--admin` needed). 2+ people: required review + `CODEOWNERS`, never self-merge
Required checks	`check` + `build` run on PRs	Mark them Required in branch protection so nothing merges red
GHCR auth	✅ ephemeral per-run `GITHUB_TOKEN` (expires with the run)	keep as-is
Droplet access	`DROPLET_SSH_KEY/HOST/USER` in GitHub secrets, deploy user only	rotate the SSH key; consider a `production` GitHub Environment with required reviewers to gate deploys
Self-hosted runner	runs on the prod box; private repo so PRs are trusted	separate build VM (never next to prod); never attach to a public repo (fork PRs = RCE) — see Self Hosted Runner
App audit trail	role/payout actions exist; `user.role_changed` event planned	capture role-change + payout events so privilege use is traceable

5. How we scale it

The app is stateless (sessions live in Clerk; all state is in Postgres + object storage), so most scaling is "add capacity to a layer." The current bottleneck is everything-on-one-droplet, not the code. Tackle in this order.

5.1 App tier — horizontal scale

The container holds no local state, so it scales horizontally cleanly:

Run N app containers behind a load balancer (DO Load Balancer or Caddy upstreams) instead of one.
One caveat — the scheduler. Production Node runs an in-process auto-transition scheduler. Running it in every replica would double-fire. Set SCHEDULER_DISABLED=true on all but one replica (kill-switch already exists — see Deployment), or move the scheduler to a single worker (§5.3).
Move the app off the DB box first — even before multiple replicas, separating app and Postgres onto different droplets removes the largest contention (and the OOM-near-prod risk).

5.2 Database — the real ceiling

All reads + writes hit one primary today. Scale path (already designed for in Database):

Move to managed Postgres (DO Managed DB) — daily backups + PITR, the single highest-leverage reliability win. SSL is already env-driven (DB_CA_CERT/DB_SSL_MODE), so cutover is config, not code.
Front with a connection pooler (PgBouncer / provider pooler) — point DATABASE_URL at the pooler rather than raising DB_POOL_MAX. This is what lets many app replicas share a bounded connection count.
Read replica — DATABASE_URL_REPLICA is reserved but not yet wired into src/db/index.ts. When report/export/leaderboard reads dominate, route them to a replica (a future PR).

5.3 Background / async work — durability gap

There is no queue and no worker today. The three AI agents (intake, run-summary, cycle-close) are fire-and-forget in the request process (void runIssueIntake(...)). They don't block responses, but if the process restarts mid-call the ai_runs row is stuck running and the result is lost — no retry. Phase 1 (shipped) added idempotency + a Postgres rate limiter on the payout/AI paths. Full analysis + the proposed queue migration: Background Jobs.

Scale path: extract a durable job queue (Postgres-backed table → a single worker process, or a managed queue) so AI work, batch payouts, and the scheduler move off the HTTP path and survive restarts. This also unblocks running multiple app replicas (the worker owns the scheduler).

5.4 CI/CD — capacity & isolation

Today's incident-driven state: the Docker build was OOM-killing the self-hosted runner because it ran on the small prod-adjacent box. Fix shipped (#273): the heavy build now runs on a GitHub-hosted runner; the droplet only does the lightweight SSH deploy; #274 added swap + disk guards as backstop.

Scale path:

Dedicated build VM — the proper end state per Self Hosted Runner: a separate s-2vcpu-4gb+ droplet for the runner, never co-located with prod. Then you can move the build back self-hosted (fixed cost, warm caches) without risking the app.
Runner pool — the workflow already uses labels (self-hosted-build, self-hosted-svc); add runners to a label to parallelize PR checks as the team grows.

5.5 Frontend bundle

Every route pays a ~102 kB shared baseline; the heaviest routes hit ~240 kB First Load JS (admin cycle detail, issue detail). Code-splitting is backlog — the audit + targets are captured in Frontend Bundle. The file-size refactor train (controller-hook + subfolder splits) is the groundwork that makes lazy-loading these routes tractable.

5.6 AI cost

Rate-limiting per user/org is shipped; a per-cycle cost cap is logged but not enforced (ai_runs.cost_usd_cents). Before opening AI usage wide, wire the cap to actually short-circuit calls.

5.7 Staged roadmap

Stage	Trigger	Moves
Now	single droplet, all-in-one	✅ build on hosted runner · swap + disk guards
Harden	first paying cycles	managed Postgres (backups/PITR) · split app ↔ DB ↔ runner onto separate VMs
10×	sustained load	connection pooler · durable job queue + worker · multiple app replicas behind LB (`SCHEDULER_DISABLED` on extras)
Beyond	read-heavy reporting	read replica (`DATABASE_URL_REPLICA`) · route code-splitting · enforce AI cost caps

6. Where to find detail

Topic	Doc
Full feature registry + flows	Feature Matrix, Flows
Tech stack inventory	Tech Stack
Backend architecture pattern	Service Layer
Database schema	Database Schema
DB ops (topology, backups, restore)	Database
Auth & permissions	Authentication
Deploy pipeline + host runbook	Deployment
Self-hosted runner	Self Hosted Runner
Scaling: background jobs	Background Jobs
Scaling: frontend bundle	Frontend Bundle
Code-quality gates	Agent guide (CLAUDE.md)

System Overview

On this page