Deployment
CI handles everything. After merging to main, the GitHub Actions pipeline:
- Builds the Docker image and pushes it to GHCR
- SSHs to the droplet and pulls the new image
- Migrations + seeds run automatically on container start (via
scripts/docker-entrypoint.sh) - The auto-transition scheduler runs automatically in production Node (no env knob)
So for a fresh deploy of the test cycles vertical, the answer is nothing manual. Push to main and wait.
Host resources (swap, disk) + runner recovery
The droplet runs the app container and self-managed Postgres and (since the May-2026 migration) the self-hosted GitHub Actions runner. That's a lot on one box, so two failure modes are worth knowing.
Out of memory → "lost communication". A cold next build / Docker image
build spikes past 2 GB RAM. Without swap, the kernel OOM-killer kills
processes under pressure — and when it kills the runner agent, the job fails
with "The self-hosted runner lost communication with the server" and the
step logs come back empty (the runner died before flushing them). Fix once,
idempotently:
sudo ./scripts/setup-swap.sh 4G # creates /swapfile, persists to fstab, swappiness=10
free -h # confirm Swap line is non-zerosetup-server.sh now runs this automatically, so freshly-provisioned boxes
get swap. A box provisioned before this existed picks it up on the next
setup-server.sh run, or just run the script directly.
Disk full → stuck deploy. Image pulls fail halfway and containers won't
start. deploy.sh logs a warning when the Docker data root drops below 3 GB
free. To reclaim space safely (never touches volumes — Postgres data lives in
one):
docker system prune -af && docker builder prune -afRecovering a dead runner. If a deploy failed with the "lost communication"
error, the runner agent may still be offline — and since check (and
everything downstream) runs on it, nothing will deploy until it's back:
ssh deploy@<droplet>
systemctl status 'actions.runner.*' # is it active?
sudo systemctl restart 'actions.runner.*' # if not
free -h && df -h / # confirm RAM/disk headroomDatabase
The Postgres database is currently a self-managed instance on the droplet with no automated backups and no verified restore path. A move to managed Postgres (daily backups + point-in-time recovery) is planned.
For connection topology, backup policy, the restore drill, the cutover plan,
and all DB-related env vars (DATABASE_URL, DB_CA_CERT, DB_SSL_MODE,
DB_POOL_MAX, DB_STATEMENT_TIMEOUT_MS), see the operational runbook:
Database.
When you might still SSH
-
The
seed:tc-onboardingseed needs at least one user withrole_id = 1(ADMIN). If the database has no admin yet, the seed fails non-fatally on container start. After someone signs in to create theirusersrow, promote them:ssh deploy@<droplet> docker exec hackorda-app sh -c \ "psql \"\$DATABASE_URL\" -c \"UPDATE users SET role_id=1 WHERE email='you@example.com';\"" docker compose -f ~/hackorda-mvp/docker-compose.prod.yml up -d --force-recreateThe recreate triggers the seed again, and now it succeeds.
AI triage agent
The issue intake agent (src/lib/ai/intake.ts) is fire-and-forget — when a tester submits an issue, we call Anthropic in the background and write back aiSuggestions (suggested title / severity / bug type). The UI shows them as soft hints with one-click Apply buttons. Drafts skip the analysis.
Enable it by setting one env var on the droplet:
ANTHROPIC_API_KEY=sk-ant-…Without it, isAiEnabled() returns false and every intake call no-ops. Issues still file fine; the AI card just never appears.
Cost: ~$0.005–0.015 per issue (Sonnet, ≤3 attached images). Logged to ai_runs.cost_usd_cents for monitoring. To see what the agent has been doing:
select kind, status, model, cost_usd_cents, latency_ms, created_at, error_message
from ai_runs order by created_at desc limit 20;Provenance: every call appends one row to ai_runs. Failed runs land with status='failed' and the error message captured. The issue gets ai_intake_run_id pointing at the row that produced its current suggestions.
Disable for one cycle: unset the env var on the droplet, restart the container.
Optional kill-switch
If you ever scale to multiple containers and want only one to run the scheduler:
SCHEDULER_DISABLED=true…on the containers you want quiet. Default is on in production — you only set this if you specifically want to disable it.
Smoke test
After a deploy:
https://hackorda.kz/app/test-cyclesshows the Hackorda Onboarding — QA shake-down cycle- File an issue with a screenshot → it appears in All Issues
docker logs hackorda-app | grep schedulershows[scheduler] registered
That's it. Hand the QA the onboarding handbook and test cases.