Database

Operational reference for the Hackorda QA Postgres database: connection topology, backup policy, restore drills, and the cutover from the self-managed droplet DB to managed Postgres.

Audience: ops / on-call. This describes a target managed-Postgres setup. Provisioning the managed instance and configuring backups are human ops tasks — see Human ops actions.

1. Connection topology

The app connects through a single pg.Pool built in src/db/index.ts and wrapped by Drizzle.

Next.js app (Node)
      │
      │  pg.Pool  (max=DB_POOL_MAX, idle 30s, connect 10s,
      │            per-session statement_timeout)
      ▼
 Postgres primary   ← DATABASE_URL

Primary — all reads and writes go to the primary today. There is one writable endpoint.
Read replica (future) — managed providers offer read replicas for offloading heavy read traffic (reports, exports). The env var DATABASE_URL_REPLICA is reserved for this. It is not yet wired into src/db/index.ts; adding a replica is a future PR, not part of the cutover. Until then, leave it unset.
Pooling — the app keeps connection count bounded via DB_POOL_MAX. If connection pressure grows, front the database with the provider's built-in connection pooler (e.g. DigitalOcean / PgBouncer) rather than raising the app pool — point DATABASE_URL at the pooler endpoint.

SSL behaviour

src/db/index.ts resolves SSL without needing a code change at cutover:

Condition	Result
`DB_CA_CERT` set	`ssl: { ca, rejectUnauthorized: true }` (verified against the bundle)
`DB_SSL_MODE=verify` / `require`	`ssl: { rejectUnauthorized: true }`
`DB_SSL_MODE=no-verify` / `allow`	`ssl: { rejectUnauthorized: false }`
`DB_SSL_MODE=disable`	no SSL
nothing set, `NODE_ENV=production`	`ssl: { rejectUnauthorized: true }` (safe default)
nothing set, non-production	no SSL (local Postgres)

Preferred for managed Postgres: provide DB_CA_CERT (the provider's CA bundle). If the provider does not issue a downloadable CA bundle, set DB_SSL_MODE=verify and rely on the OS trust store (the Docker entrypoint already exports NODE_EXTRA_CA_CERTS when DB_CA_CERT is present). Use no-verify only as a last resort.

2. Backup policy

The self-managed droplet DB has no automated backups and no verified restore path. The managed target must close that gap:

Automated daily backups — enable the provider's daily backup. Retain at least 7 days.
Point-in-time recovery (PITR) — enable PITR / continuous WAL archiving so the database can be restored to any moment within the retention window (typically the last 7 days). This bounds worst-case data loss to minutes, not a full day.
Off-box copy — managed backups live with the provider. For a defence-in-depth copy, schedule a periodic pg_dump to object storage (separate from the provider account). Weekly is sufficient given PITR.
Verification — backups are only real once a restore has succeeded. Run the restore drill below at least quarterly and after any major schema change.

3. Restore drill

A rehearsal that proves the backup is restorable without touching production. Restore into a scratch database, migrate it, and smoke-check row counts. Run quarterly.

Prerequisites: a recent backup or snapshot, a scratch Postgres instance or empty database you can write to, and psql + pg_restore / pg_dump available locally.

Obtain a backup artifact.
- From a provider snapshot: create a new database from snapshot via the provider console/CLI — this gives you a fresh scratch instance.
- Or from a logical dump:
```
pg_dump "$DATABASE_URL" --format=custom --no-owner --no-privileges \
  --file=hackorda-$(date +%Y%m%d).dump
```

Create a scratch target database. Never restore over production.

createdb -h <scratch-host> -U <user> hackorda_restore_test
export SCRATCH_URL="postgres://<user>:<pw>@<scratch-host>:5432/hackorda_restore_test?sslmode=require"

Restore into the scratch database.
```
pg_restore --dbname="$SCRATCH_URL" --no-owner --no-privileges \
  --clean --if-exists hackorda-YYYYMMDD.dump
```
(If you restored from a provider snapshot instead, skip this — the snapshot database is already populated.)
Run migrations against the scratch database. Confirms the migration chain still applies cleanly on top of the restored data.
```
DATABASE_URL="$SCRATCH_URL" npm run migrate
```

Smoke-check row counts on key tables. Compare against production (read-only) — counts should be equal or very close (allowing for writes since the backup was taken).

psql "$SCRATCH_URL" -c "
  SELECT 'users'         AS table, count(*) FROM users
  UNION ALL SELECT 'test_cycles',    count(*) FROM test_cycles
  UNION ALL SELECT 'issues',         count(*) FROM issues
  UNION ALL SELECT 'issue_payouts',  count(*) FROM issue_payouts;"

Spot-check that a recent row exists, e.g. SELECT max(created_at) FROM issues;.

Record the result and tear down. Note the drill date, backup timestamp, and row counts in the ops log. Drop the scratch database:
```
dropdb -h <scratch-host> -U <user> hackorda_restore_test
```

A drill fails if any step errors, if migrations do not apply, or if key table counts are unexpectedly zero or wildly off. Investigate before relying on that backup.

4. Cutover plan: self-managed droplet → managed Postgres

One-time migration. Requires a short maintenance window (expect 15–30 min for a database of current size; measure with a dry-run dump first). Announce the window in advance.

Before the window (no downtime):

Provision the managed Postgres instance; enable daily backups + PITR.
Capture its connection string, CA bundle, and pooler endpoint.
Do a dry-run: pg_dump the droplet DB and pg_restore into the managed instance to measure timing and catch errors. Discard this test data afterward (or restore again fresh during the real window).

Maintenance window:

Quiesce. Stop the app so no writes hit the droplet DB:
```
ssh deploy@<droplet>
docker compose -f ~/hackorda-mvp/docker-compose.prod.yml stop app
```
(Optionally put up a maintenance page first.)

Final dump of the droplet DB:

docker exec hackorda-postgres pg_dump "$DATABASE_URL" \
  --format=custom --no-owner --no-privileges \
  --file=/tmp/hackorda-cutover.dump
docker cp hackorda-postgres:/tmp/hackorda-cutover.dump ./hackorda-cutover.dump

Restore into the managed instance:

pg_restore --dbname="$MANAGED_DATABASE_URL" --no-owner --no-privileges \
  --clean --if-exists hackorda-cutover.dump

Repoint DATABASE_URL. Update the production env to the managed connection string. Set DB_CA_CERT to the provider's CA bundle (or DB_SSL_MODE=verify if no bundle). No code change is needed — the SSL logic in src/db/index.ts already handles both.
Run migrations against the managed instance to confirm the schema is at head:
```
DATABASE_URL="$MANAGED_DATABASE_URL" npm run migrate
```
(Migrations also run on container start via scripts/docker-entrypoint.sh, so this also happens in step 6 — running it here surfaces problems before traffic resumes.)
Verify. Bring the app back up pointed at the managed DB:
```
docker compose -f ~/hackorda-mvp/docker-compose.prod.yml up -d --force-recreate app
```
Then run the smoke checks:
- Row counts on users, test_cycles, issues, issue_payouts match the final droplet dump (see restore drill step 5).
- https://hackorda.kz/app/test-cycles loads the onboarding cycle.
- File a test issue end-to-end.
- docker logs hackorda-app shows no SSL or connection errors.
Resume. Remove the maintenance page. Announce completion.

Rollback: if verification fails, repoint DATABASE_URL back to the droplet DB and recreate the app container. The droplet DB was only quiesced, not deleted — it is still authoritative. Keep the droplet DB running (read-only / untouched) for at least a few days post-cutover before decommissioning.

4a. Neon specifics (our chosen provider)

Neon is the managed Postgres we're cutting over to (see Roadmap Phase 0). These notes layer provider- and codebase-specific detail on top of the generic plan above.

Start on the DIRECT endpoint (Phase 0), move to the pooler later

Neon exposes two hostnames per database:

Endpoint	Host	Use for
Direct	`ep-xxx.<region>.aws.neon.tech`	migrations, `psql`/admin, the restore drill — full session support
Pooled	`ep-xxx-pooler.<region>.aws.neon.tech`	the app's `DATABASE_URL` once you run multiple replicas — PgBouncer transaction pooling, high connection count

For Phase 0 (single app instance), point DATABASE_URL at the DIRECT endpoint. Everything just works — the statement_timeout guard, migrations, session GUCs. Switch the app to the pooled endpoint in Phase 3 when adding replicas creates connection fan-out; that's when the caveats below apply.

⚠️ `statement_timeout` does NOT survive the pooler

src/db/index.ts sets statement_timeout via pool.on('connect') — a session SET. On the pooled endpoint (transaction pooling resets session state between transactions) that SET silently won't apply, so the runaway-query guard is lost. When you move to the pooler (Phase 3), set it at the database level so it holds regardless of pooling — run once against the direct endpoint:

ALTER DATABASE neondb SET statement_timeout = '15s';   -- match DB_STATEMENT_TIMEOUT_MS

The pool.on('connect') SET keeps working on the direct endpoint, so migrations/admin retain the guard for free. Same rule for any future session GUC: prefer database/role-level ALTER for the pooled path.

SSL — no CA bundle needed

Neon's server cert chains to a public CA already trusted by the OS, so you do not need DB_CA_CERT. Just set:

DB_SSL_MODE=require

(Neon's DATABASE_URL already carries ?sslmode=require; the env var documents intent and drives the SSL branch in src/db/index.ts.)

Restore drill + e2e: use a Neon branch, not a scratch instance

Replace restore-drill step 2 ("create a scratch database") with a Neon branch — an instant copy-on-write clone of the primary. Branch main → restore-test, point SCRATCH_URL at the branch, run the §3 drill, delete the branch. Near-instant and isolated; the CI e2e job can do the same instead of seeding a fresh Postgres each run.

Migrations run against DIRECT

npm run migrate (and the container-start migration in scripts/docker-entrypoint.sh) need the direct endpoint — DDL + advisory locks don't play well with transaction pooling. In Phase 0 this is automatic (DATABASE_URL is direct). After the Phase-3 pooler switch, keep a separate direct URL for migrations (e.g. DIRECT_DATABASE_URL) or run them manually against direct during deploys.

Variable	Required	Purpose
`DATABASE_URL`	yes	Postgres connection string. Point at the managed primary (or its pooler endpoint) after cutover.
`DB_CA_CERT`	recommended (prod)	PEM CA bundle from the managed provider. When set, the connection is verified against it. Written to a file and exported as `NODE_EXTRA_CA_CERTS` by `scripts/docker-entrypoint.sh`.
`DB_SSL_MODE`	optional	Overrides SSL behaviour when `DB_CA_CERT` is unset: `verify`/`require`, `no-verify`/`allow`, or `disable`. See the SSL table.
`DB_POOL_MAX`	optional (def. 20)	Max pooled connections. Keep below the provider's `max_connections`.
`DB_STATEMENT_TIMEOUT_MS`	optional (def. 15000)	Per-session `statement_timeout`; runaway queries are killed by Postgres.
`DATABASE_URL_REPLICA`	reserved	Read replica endpoint. Not yet consumed by the app — leave unset until replica support ships.

In production, set either DB_CA_CERT or DB_SSL_MODE=verify. With neither set, the app still enables SSL with rejectUnauthorized: true, but an explicit value documents intent.

Human ops actions still required

The following are not done by code and must be performed by a human:

Provision the managed Postgres instance with the provider.
Configure backups — enable daily automated backups and PITR; set retention; (optionally) schedule an off-box pg_dump.
Run the cutover (section 4) during a maintenance window.
Run the restore drill (section 3) to verify backups are restorable — then repeat quarterly.

See also Deployment for the deploy pipeline.