Database
Operational reference for the Hackorda QA Postgres database: connection topology, backup policy, restore drills, and the cutover from the self-managed droplet DB to managed Postgres.
Audience: ops / on-call. This describes a target managed-Postgres setup. Provisioning the managed instance and configuring backups are human ops tasks — see Human ops actions.
1. Connection topology
The app connects through a single pg.Pool built in
src/db/index.ts and wrapped by Drizzle.
Next.js app (Node)
│
│ pg.Pool (max=DB_POOL_MAX, idle 30s, connect 10s,
│ per-session statement_timeout)
▼
Postgres primary ← DATABASE_URL- Primary — all reads and writes go to the primary today. There is one writable endpoint.
- Read replica (future) — managed providers offer read replicas for
offloading heavy read traffic (reports, exports). The env var
DATABASE_URL_REPLICAis reserved for this. It is not yet wired intosrc/db/index.ts; adding a replica is a future PR, not part of the cutover. Until then, leave it unset. - Pooling — the app keeps connection count bounded via
DB_POOL_MAX. If connection pressure grows, front the database with the provider's built-in connection pooler (e.g. DigitalOcean / PgBouncer) rather than raising the app pool — pointDATABASE_URLat the pooler endpoint.
SSL behaviour
src/db/index.ts resolves SSL without needing a code change at cutover:
| Condition | Result |
|---|---|
DB_CA_CERT set | ssl: { ca, rejectUnauthorized: true } (verified against the bundle) |
DB_SSL_MODE=verify / require | ssl: { rejectUnauthorized: true } |
DB_SSL_MODE=no-verify / allow | ssl: { rejectUnauthorized: false } |
DB_SSL_MODE=disable | no SSL |
nothing set, NODE_ENV=production | ssl: { rejectUnauthorized: true } (safe default) |
| nothing set, non-production | no SSL (local Postgres) |
Preferred for managed Postgres: provide DB_CA_CERT (the provider's CA
bundle). If the provider does not issue a downloadable CA bundle, set
DB_SSL_MODE=verify and rely on the OS trust store (the Docker entrypoint
already exports NODE_EXTRA_CA_CERTS when DB_CA_CERT is present). Use
no-verify only as a last resort.
2. Backup policy
The self-managed droplet DB has no automated backups and no verified restore path. The managed target must close that gap:
- Automated daily backups — enable the provider's daily backup. Retain at least 7 days.
- Point-in-time recovery (PITR) — enable PITR / continuous WAL archiving so the database can be restored to any moment within the retention window (typically the last 7 days). This bounds worst-case data loss to minutes, not a full day.
- Off-box copy — managed backups live with the provider. For a
defence-in-depth copy, schedule a periodic
pg_dumpto object storage (separate from the provider account). Weekly is sufficient given PITR. - Verification — backups are only real once a restore has succeeded. Run the restore drill below at least quarterly and after any major schema change.
3. Restore drill
A rehearsal that proves the backup is restorable without touching production. Restore into a scratch database, migrate it, and smoke-check row counts. Run quarterly.
Prerequisites: a recent backup or snapshot, a scratch Postgres instance
or empty database you can write to, and psql + pg_restore / pg_dump
available locally.
-
Obtain a backup artifact.
- From a provider snapshot: create a new database from snapshot via the provider console/CLI — this gives you a fresh scratch instance.
- Or from a logical dump:
pg_dump "$DATABASE_URL" --format=custom --no-owner --no-privileges \ --file=hackorda-$(date +%Y%m%d).dump
-
Create a scratch target database. Never restore over production.
createdb -h <scratch-host> -U <user> hackorda_restore_test export SCRATCH_URL="postgres://<user>:<pw>@<scratch-host>:5432/hackorda_restore_test?sslmode=require" -
Restore into the scratch database.
pg_restore --dbname="$SCRATCH_URL" --no-owner --no-privileges \ --clean --if-exists hackorda-YYYYMMDD.dump(If you restored from a provider snapshot instead, skip this — the snapshot database is already populated.)
-
Run migrations against the scratch database. Confirms the migration chain still applies cleanly on top of the restored data.
DATABASE_URL="$SCRATCH_URL" npm run migrate -
Smoke-check row counts on key tables. Compare against production (read-only) — counts should be equal or very close (allowing for writes since the backup was taken).
psql "$SCRATCH_URL" -c " SELECT 'users' AS table, count(*) FROM users UNION ALL SELECT 'test_cycles', count(*) FROM test_cycles UNION ALL SELECT 'issues', count(*) FROM issues UNION ALL SELECT 'issue_payouts', count(*) FROM issue_payouts;"Spot-check that a recent row exists, e.g.
SELECT max(created_at) FROM issues;. -
Record the result and tear down. Note the drill date, backup timestamp, and row counts in the ops log. Drop the scratch database:
dropdb -h <scratch-host> -U <user> hackorda_restore_test
A drill fails if any step errors, if migrations do not apply, or if key table counts are unexpectedly zero or wildly off. Investigate before relying on that backup.
4. Cutover plan: self-managed droplet → managed Postgres
One-time migration. Requires a short maintenance window (expect 15–30 min for a database of current size; measure with a dry-run dump first). Announce the window in advance.
Before the window (no downtime):
- Provision the managed Postgres instance; enable daily backups + PITR.
- Capture its connection string, CA bundle, and pooler endpoint.
- Do a dry-run:
pg_dumpthe droplet DB andpg_restoreinto the managed instance to measure timing and catch errors. Discard this test data afterward (or restore again fresh during the real window).
Maintenance window:
-
Quiesce. Stop the app so no writes hit the droplet DB:
ssh deploy@<droplet> docker compose -f ~/hackorda-mvp/docker-compose.prod.yml stop app(Optionally put up a maintenance page first.)
-
Final dump of the droplet DB:
docker exec hackorda-postgres pg_dump "$DATABASE_URL" \ --format=custom --no-owner --no-privileges \ --file=/tmp/hackorda-cutover.dump docker cp hackorda-postgres:/tmp/hackorda-cutover.dump ./hackorda-cutover.dump -
Restore into the managed instance:
pg_restore --dbname="$MANAGED_DATABASE_URL" --no-owner --no-privileges \ --clean --if-exists hackorda-cutover.dump -
Repoint
DATABASE_URL. Update the production env to the managed connection string. SetDB_CA_CERTto the provider's CA bundle (orDB_SSL_MODE=verifyif no bundle). No code change is needed — the SSL logic insrc/db/index.tsalready handles both. -
Run migrations against the managed instance to confirm the schema is at head:
DATABASE_URL="$MANAGED_DATABASE_URL" npm run migrate(Migrations also run on container start via
scripts/docker-entrypoint.sh, so this also happens in step 6 — running it here surfaces problems before traffic resumes.) -
Verify. Bring the app back up pointed at the managed DB:
docker compose -f ~/hackorda-mvp/docker-compose.prod.yml up -d --force-recreate appThen run the smoke checks:
- Row counts on
users,test_cycles,issues,issue_payoutsmatch the final droplet dump (see restore drill step 5). https://hackorda.kz/app/test-cyclesloads the onboarding cycle.- File a test issue end-to-end.
docker logs hackorda-appshows no SSL or connection errors.
- Row counts on
-
Resume. Remove the maintenance page. Announce completion.
Rollback: if verification fails, repoint DATABASE_URL back to the
droplet DB and recreate the app container. The droplet DB was only
quiesced, not deleted — it is still authoritative. Keep the droplet DB
running (read-only / untouched) for at least a few days post-cutover before
decommissioning.
4a. Neon specifics (our chosen provider)
Neon is the managed Postgres we're cutting over to (see Roadmap Phase 0). These notes layer provider- and codebase-specific detail on top of the generic plan above.
Start on the DIRECT endpoint (Phase 0), move to the pooler later
Neon exposes two hostnames per database:
| Endpoint | Host | Use for |
|---|---|---|
| Direct | ep-xxx.<region>.aws.neon.tech | migrations, psql/admin, the restore drill — full session support |
| Pooled | ep-xxx-**pooler**.<region>.aws.neon.tech | the app's DATABASE_URL once you run multiple replicas — PgBouncer transaction pooling, high connection count |
For Phase 0 (single app instance), point DATABASE_URL at the DIRECT
endpoint. Everything just works — the statement_timeout guard, migrations,
session GUCs. Switch the app to the pooled endpoint in Phase 3 when adding
replicas creates connection fan-out; that's when the caveats below apply.
⚠️ statement_timeout does NOT survive the pooler
src/db/index.ts sets statement_timeout via
pool.on('connect') — a session SET. On the pooled endpoint (transaction
pooling resets session state between transactions) that SET silently won't
apply, so the runaway-query guard is lost. When you move to the pooler
(Phase 3), set it at the database level so it holds regardless of pooling —
run once against the direct endpoint:
ALTER DATABASE neondb SET statement_timeout = '15s'; -- match DB_STATEMENT_TIMEOUT_MSThe pool.on('connect') SET keeps working on the direct endpoint, so
migrations/admin retain the guard for free. Same rule for any future session
GUC: prefer database/role-level ALTER for the pooled path.
SSL — no CA bundle needed
Neon's server cert chains to a public CA already trusted by the OS, so you do
not need DB_CA_CERT. Just set:
DB_SSL_MODE=require(Neon's DATABASE_URL already carries ?sslmode=require; the env var documents
intent and drives the SSL branch in src/db/index.ts.)
Restore drill + e2e: use a Neon branch, not a scratch instance
Replace restore-drill step 2 ("create a scratch database") with a Neon
branch — an instant copy-on-write clone of the primary. Branch main →
restore-test, point SCRATCH_URL at the branch, run the §3 drill, delete the
branch. Near-instant and isolated; the CI e2e job can do the same instead of
seeding a fresh Postgres each run.
Migrations run against DIRECT
npm run migrate (and the container-start migration in
scripts/docker-entrypoint.sh) need the direct endpoint — DDL + advisory
locks don't play well with transaction pooling. In Phase 0 this is automatic
(DATABASE_URL is direct). After the Phase-3 pooler switch, keep a separate
direct URL for migrations (e.g. DIRECT_DATABASE_URL) or run them manually
against direct during deploys.
Autosuspend
Neon scale-to-zero suspends idle compute. Harmless in prod — the always-on worker (roadmap Phase 1) keeps it warm. On low-traffic preview/staging, expect a ~1s cold start on the first query after idle.
Read replica (Phase 3 hook)
Neon read replicas are a console toggle. When Phase 3 wires
DATABASE_URL_REPLICA into src/db/index.ts, point it at a Neon read
replica endpoint to offload reports / exports / leaderboard reads from the
primary.
5. Environment variables
| Variable | Required | Purpose |
|---|---|---|
DATABASE_URL | yes | Postgres connection string. Point at the managed primary (or its pooler endpoint) after cutover. |
DB_CA_CERT | recommended (prod) | PEM CA bundle from the managed provider. When set, the connection is verified against it. Written to a file and exported as NODE_EXTRA_CA_CERTS by scripts/docker-entrypoint.sh. |
DB_SSL_MODE | optional | Overrides SSL behaviour when DB_CA_CERT is unset: verify/require, no-verify/allow, or disable. See the SSL table. |
DB_POOL_MAX | optional (def. 20) | Max pooled connections. Keep below the provider's max_connections. |
DB_STATEMENT_TIMEOUT_MS | optional (def. 15000) | Per-session statement_timeout; runaway queries are killed by Postgres. |
DATABASE_URL_REPLICA | reserved | Read replica endpoint. Not yet consumed by the app — leave unset until replica support ships. |
In production, set either DB_CA_CERT or DB_SSL_MODE=verify. With
neither set, the app still enables SSL with rejectUnauthorized: true, but
an explicit value documents intent.
Human ops actions still required
The following are not done by code and must be performed by a human:
- Provision the managed Postgres instance with the provider.
- Configure backups — enable daily automated backups and PITR; set
retention; (optionally) schedule an off-box
pg_dump. - Run the cutover (section 4) during a maintenance window.
- Run the restore drill (section 3) to verify backups are restorable — then repeat quarterly.
See also Deployment for the deploy pipeline.