Files
processor/.planning/phase-3-hardening/05-retire-migration-runner.md
T
julian 3c2c5cf50e docs(planning): file phase-3 task 3.5 — retire processor migration runner
Replaces the original "migration advisory lock" sketch. Once processor
doesn't run DDL, the lock concern delegates to Directus's db-init runner.

Context: positions hypertable + faulty column DDL currently exists in
both processor (src/db/migrations/0001 + 0002) and directus
(db-init/001/002/003). Two sources of truth for the same schema is a
known hazard — adding a column means editing two files in two repos,
and silent drift between them is invisible until runtime.

Fix: directus becomes the sole DDL owner. Processor's migration runner
is retired; only INSERT/SELECT/UPDATE remain.

Task spec covers:
- Pre-flight diff between processor migrations and directus db-init
  (must be byte/semantically equivalent before deletion)
- File-by-file deletion list
- Test infra migration (integration test moves to fixture-based schema
  setup, matching the established Phase 1.5 task 1.5.6 pattern)
- Wiki + ROADMAP updates
- compose.yaml depends_on directus: service_healthy
- Operational notes (existing migrations_applied table is left in place)

Sequence: ideally lands AFTER Phase 1.5 ships so the agent shipping the
WS endpoint isn't pulled into a side quest mid-flight.
2026-05-02 18:37:47 +02:00

12 KiB
Raw Blame History

Task 3.5 — Retire processor migration runner

Phase: 3 — Production hardening Status: Not started Depends on: Phase 1.5 ideally landed (avoid mid-flight code churn for the agent shipping the WS endpoint). No hard code dependency. Replaces: the original 3.5 sketch ("Migration advisory lock"). Once the processor doesn't run migrations, the lock concern is delegated to Directus's db-init runner — outside this service's surface. Wiki refs: docs/wiki/entities/processor.md §"Schema ownership vs. write access" (the line that needs to change), docs/wiki/entities/directus.md §"Schema management — snapshot/apply pipeline", docs/wiki/entities/postgres-timescaledb.md

Goal

Establish directus as the single owner of all DDL against the shared Postgres database. Retire the processor's migration runner. After this task, the only DDL paths are:

  • trm/directus/db-init/*.sql (pre-schema: extensions, hypertables, raw tables Directus's snapshot-yaml format can't express).
  • trm/directus/snapshots/schema.yaml (Directus-managed user collections).
  • trm/directus/db-init-post/*.sql (post-schema: composite UNIQUE constraints on Directus-managed tables).

Processor exclusively does INSERT / SELECT / UPDATE. No CREATE, ALTER, CREATE EXTENSION, or any other DDL.

Context — why this exists

The current state has both services creating the positions hypertable and the faulty column:

  • trm/processor/src/db/migrations/0001_positions.sql and 0002_positions_faulty.sql (processor's runner, from Phase 1 task 1.4 + the recent 1.5.5 prep).
  • trm/directus/db-init/001_extensions.sql, 002_positions_hypertable.sql, 003_faulty_column.sql (directus's runner, added later when the destructive-apply incident showed positions had to exist before directus schema apply runs or it would get wiped).

Both runners are idempotent (IF NOT EXISTS, etc.) so the runtime collision is benign at the moment, but the architectural risks are real:

  • Two sources of truth. Adding a column means editing two files in two repos; either one can drift silently.
  • Schema divergence. A processor migration that adds a column the directus side doesn't know about is silently invisible to the admin UI.
  • Two migrations_applied tables, which already caused the ghost-collection apply conflict earlier in Phase 1 of directus.
  • Operator confusion. The wiki says "Directus owns the schema" but processor runs migrations — newcomers can't tell which is canonical.

The fix is the wiki's stated intent: directus owns DDL. Processor was the historical owner before directus's db-init story matured; the legacy runner survived the transition because nobody retired it.

Pre-flight (before deleting anything)

1. Confirm directus's db-init/ covers the full processor schema surface

Check that trm/directus/db-init/'s SQL is byte-equivalent (or semantically equivalent) to processor's migrations. As of writing, directus has:

  • 001_extensions.sqlCREATE EXTENSION IF NOT EXISTS timescaledb + postgis.
  • 002_positions_hypertable.sqlCREATE TABLE positions (...) + create_hypertable(...) + indexes.
  • 003_faulty_column.sqlALTER TABLE positions ADD COLUMN IF NOT EXISTS faulty ... + positions_device_ts_idx.

Processor has:

  • src/db/migrations/0001_positions.sql — extensions + table + hypertable + positions_device_ts (UNIQUE on (device_id, ts)) + positions_ts (DESC).
  • src/db/migrations/0002_positions_faulty.sqlfaulty column + positions_device_ts_idx ((device_id, ts DESC)).

Diff the two before retiring. If processor's SQL has an index, column, or constraint directus's db-init/ doesn't, the deliverable starts with porting that diff into directus's db-init/ (and snapshotting if applicable). Specific things to verify:

  • All four indexes exist in directus's db-init: positions_device_ts (UNIQUE), positions_ts, positions_device_ts_idx.
  • Column types match exactly: device_id text, ts timestamptz, ingested_at timestamptz DEFAULT now(), etc.
  • chunk_time_interval is INTERVAL '1 day' on both sides.
  • The ON CONFLICT (device_id, ts) DO NOTHING upsert path requires the UNIQUE on (device_id, ts) — that's the positions_device_ts index, not positions_device_ts_idx. Both must exist.

2. Verify the directus db-init apply order is fixed

Per docs/wiki/entities/directus.md's 5-step boot pipeline:

1. db-init pre-schema   → positions hypertable, faulty column, timescaledb extension
2. directus bootstrap   → Directus system tables + first admin
3. directus schema apply → Directus-managed user collections
4. db-init post-schema  → composite UNIQUE constraints on user collections
5. pm2-runtime start    → server up at :8055

So when processor boots against a stage stack:

  • directus container has run steps 14 (positions exists; everything else exists).
  • processor container can connect and INSERT immediately.

Compose ordering. trm/deploy/compose.yaml's processor service should depends_on: directus with condition: service_healthy so processor doesn't try to read positions before directus's db-init has run on first deploy. Verify in this task.

Deliverables

trm/processor/

  1. Delete src/db/migrations/0001_positions.sql.
  2. Delete src/db/migrations/0002_positions_faulty.sql.
  3. Delete the migrations directory if it's now empty.
  4. Delete src/db/migrate.ts (or whatever the migration-runner module is named — the file that owns the migrations_applied table, the file walker, the pg_advisory_lock if any).
  5. Update src/main.ts to remove the await migrate(...) step from boot. Postgres pool creation stays; migration call goes.
  6. Update tests that exercise the migration runner — most likely deletes the corresponding test file. Integration tests that previously seeded the schema via migrate() either:
    • (a) Use directus's db-init/*.sql files directly (read them in beforeAll, execute against the testcontainer Postgres), or
    • (b) Carry a fixture SQL file in test/fixtures/ (the same approach Phase 1.5 task 1.5.6 already takes for its integration test). Pick (b) — it's already the established pattern.
  7. Update Dockerfile to drop any migration-running step from the entrypoint (Phase 1's Dockerfile may not have this, but verify; the runtime container shouldn't carry the migrations directory if the runner is gone).
  8. Update package.json dependencies — if pg-migrate or any migration-runner library was a Phase-1-only dep, remove it.
  9. Update phase-1-throughput/04-postgres-schema.md's Done section with a note: "Migration runner retired in Phase 3 task 3.5 — see that task for context."
  10. Update ROADMAP.md to reflect the retired runner under Phase 1's "what changed since landing" note.

trm/docs/wiki/

  1. Update wiki/entities/processor.md — drop the "Schema ownership vs. write access" caveat that says "the positions hypertable is owned by processor's migration runner." Replace with a single sentence: "Processor never runs DDL. Schema is exclusively owned by Directus (snapshot.yaml + db-init/ for things the snapshot can't express)."
  2. Update wiki/entities/directus.md — confirm the Schema-management section already lists db-init/ as covering everything (no edits unless current text implies a split).
  3. Update wiki/entities/postgres-timescaledb.md — verify the writer-side documentation; remove any "split schema authority" framing.
  4. Append docs/log.md with a note entry recording the retirement.

trm/deploy/

  1. Verify compose.yaml's processor service has depends_on: directus: { condition: service_healthy }. Add if missing.
  2. Confirm the deploy README doesn't mention the processor's migration runner anywhere.

Specification

What stays in src/db/

  • pool.ts — Postgres Pool factory. Untouched.
  • Connection helpers, query helpers (if any). Untouched.

What goes

  • migrations/*.sql — gone.
  • migrate.ts (the runner). Gone.
  • migrations_applied table — directus's runner has its own; the processor's becomes orphaned but harmless. Don't drop it from existing databases. The retirement is a runtime change; the table is just unused. Phase 3 hardening's eventual OPERATIONS.md (task 3.7) can document a one-off DROP TABLE migrations_applied step for operators who want a clean schema.

Boot order on first deploy

1. postgres container starts → DB available.
2. directus container starts → runs 5-step boot pipeline.
   ├─ Step 1 (db-init pre-schema) creates positions hypertable + faulty column + extensions.
   ├─ Steps 2-4 set up Directus's own world.
   └─ Step 5 marks the container healthy.
3. processor container starts (depends_on: directus: service_healthy) → connects, finds positions, starts consuming.

If processor races directus on a fresh stack (no depends_on), it'll fail to find the positions table and crash-loop until directus catches up. depends_on: service_healthy makes the order deterministic.

Dev workflow

compose.dev.yaml in trm/processor (if it exists for processor-side dev) should depends_on: directus if running both. For pure-processor dev (no directus), the developer either:

  • Runs directus's db-init/*.sql manually against their local Postgres before booting processor.
  • Or copies the equivalent SQL into a one-off bootstrap script in processor/test/fixtures/.

Document the chosen path in processor/README.md.

What this task does NOT do

  • Does not retire directus's snapshot-managed collections.
  • Does not change Phase 1 or Phase 1.5 code paths beyond removing the migration runner step.
  • Does not introduce a new migration tool. The fix is fewer moving parts, not different ones.

Acceptance criteria

  • pnpm typecheck, pnpm lint, pnpm test clean.
  • pnpm test:integration runs green — the integration test no longer relies on migrate(); it loads schema from a fixture SQL file instead.
  • src/db/migrations/ directory is gone (or empty + gitignored).
  • No migrate() call anywhere in the source tree.
  • No migrations_applied references in processor source.
  • Stage smoke against a fresh DB: redeploy the stack, watch directus boot through its 5 steps, watch processor connect and start writing positions. No errors.
  • docs/wiki/entities/processor.md and directus.md agree: directus is the sole DDL owner.
  • docs/log.md has a note entry recording the retirement.
  • trm/deploy/compose.yaml's processor service has depends_on: directus: service_healthy.

Risks / open questions

  • Existing prod databases. If anyone has deployed the processor's migrations on a real DB, the migrations_applied table is harmless but stale. Document a one-off cleanup query for operators (in OPERATIONS.md when 3.7 lands).
  • Schema drift between processor's old migrations and directus's db-init. If the diff in pre-flight step 1 surfaces anything, that diff must land in directus's db-init/ before the processor's runner is retired. Order of operations matters: never delete the processor migration before the equivalent SQL is verified live in directus's runner.
  • Test container schema setup. The integration test fixture has to mirror what directus actually creates. If directus's db-init/ changes in a way that breaks processor's read paths, the fixture and the read paths both need updating. Mitigation: the fixture file lives in test/fixtures/ and a comment at its top says "syncs with trm/directus/db-init/ — update both when schema changes."
  • The original 3.5 ("Migration advisory lock") concern. Once processor doesn't run migrations, the advisory-lock concern is delegated to directus's runner. That's a directus concern; whether to add an advisory lock to directus's apply-db-init.sh is tracked as a follow-up in directus's own roadmap, not here.
  • PostGIS usage in Phase 2. Processor's 0001_positions.sql enables PostGIS even though Phase 1 doesn't use it. Directus's db-init/001_extensions.sql does the same. Confirm in pre-flight; no change needed if the directus side already has it.

Done

(Filled in when the task lands.)