Files
processor/.planning/phase-3-hardening/05-retire-migration-runner.md
T
julian 3c2c5cf50e docs(planning): file phase-3 task 3.5 — retire processor migration runner
Replaces the original "migration advisory lock" sketch. Once processor
doesn't run DDL, the lock concern delegates to Directus's db-init runner.

Context: positions hypertable + faulty column DDL currently exists in
both processor (src/db/migrations/0001 + 0002) and directus
(db-init/001/002/003). Two sources of truth for the same schema is a
known hazard — adding a column means editing two files in two repos,
and silent drift between them is invisible until runtime.

Fix: directus becomes the sole DDL owner. Processor's migration runner
is retired; only INSERT/SELECT/UPDATE remain.

Task spec covers:
- Pre-flight diff between processor migrations and directus db-init
  (must be byte/semantically equivalent before deletion)
- File-by-file deletion list
- Test infra migration (integration test moves to fixture-based schema
  setup, matching the established Phase 1.5 task 1.5.6 pattern)
- Wiki + ROADMAP updates
- compose.yaml depends_on directus: service_healthy
- Operational notes (existing migrations_applied table is left in place)

Sequence: ideally lands AFTER Phase 1.5 ships so the agent shipping the
WS endpoint isn't pulled into a side quest mid-flight.
2026-05-02 18:37:47 +02:00

170 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Task 3.5 — Retire processor migration runner
**Phase:** 3 — Production hardening
**Status:** ⬜ Not started
**Depends on:** Phase 1.5 ideally landed (avoid mid-flight code churn for the agent shipping the WS endpoint). No hard code dependency.
**Replaces:** the original 3.5 sketch ("Migration advisory lock"). Once the processor doesn't run migrations, the lock concern is delegated to Directus's `db-init` runner — outside this service's surface.
**Wiki refs:** `docs/wiki/entities/processor.md` §"Schema ownership vs. write access" (the line that needs to change), `docs/wiki/entities/directus.md` §"Schema management — snapshot/apply pipeline", `docs/wiki/entities/postgres-timescaledb.md`
## Goal
Establish [[directus]] as the **single owner of all DDL** against the shared Postgres database. Retire the processor's migration runner. After this task, the only DDL paths are:
- `trm/directus/db-init/*.sql` (pre-schema: extensions, hypertables, raw tables Directus's snapshot-yaml format can't express).
- `trm/directus/snapshots/schema.yaml` (Directus-managed user collections).
- `trm/directus/db-init-post/*.sql` (post-schema: composite UNIQUE constraints on Directus-managed tables).
Processor exclusively does `INSERT` / `SELECT` / `UPDATE`. No `CREATE`, `ALTER`, `CREATE EXTENSION`, or any other DDL.
## Context — why this exists
The current state has **both** services creating the `positions` hypertable and the `faulty` column:
- `trm/processor/src/db/migrations/0001_positions.sql` and `0002_positions_faulty.sql` (processor's runner, from Phase 1 task 1.4 + the recent 1.5.5 prep).
- `trm/directus/db-init/001_extensions.sql`, `002_positions_hypertable.sql`, `003_faulty_column.sql` (directus's runner, added later when the destructive-apply incident showed positions had to exist *before* `directus schema apply` runs or it would get wiped).
Both runners are idempotent (`IF NOT EXISTS`, etc.) so the runtime collision is benign at the moment, but the architectural risks are real:
- **Two sources of truth.** Adding a column means editing two files in two repos; either one can drift silently.
- **Schema divergence.** A processor migration that adds a column the directus side doesn't know about is silently invisible to the admin UI.
- **Two `migrations_applied` tables**, which already caused the ghost-collection apply conflict earlier in Phase 1 of directus.
- **Operator confusion.** The wiki says "Directus owns the schema" but processor runs migrations — newcomers can't tell which is canonical.
The fix is the wiki's stated intent: directus owns DDL. Processor was the historical owner before directus's `db-init` story matured; the legacy runner survived the transition because nobody retired it.
## Pre-flight (before deleting anything)
### 1. Confirm directus's `db-init/` covers the full processor schema surface
Check that `trm/directus/db-init/`'s SQL is byte-equivalent (or semantically equivalent) to processor's migrations. As of writing, directus has:
- `001_extensions.sql``CREATE EXTENSION IF NOT EXISTS timescaledb` + `postgis`.
- `002_positions_hypertable.sql``CREATE TABLE positions (...)` + `create_hypertable(...)` + indexes.
- `003_faulty_column.sql``ALTER TABLE positions ADD COLUMN IF NOT EXISTS faulty ...` + `positions_device_ts_idx`.
Processor has:
- `src/db/migrations/0001_positions.sql` — extensions + table + hypertable + `positions_device_ts` (UNIQUE on `(device_id, ts)`) + `positions_ts` (DESC).
- `src/db/migrations/0002_positions_faulty.sql``faulty` column + `positions_device_ts_idx` (`(device_id, ts DESC)`).
**Diff the two before retiring.** If processor's SQL has an index, column, or constraint directus's `db-init/` doesn't, the deliverable starts with porting that diff into directus's `db-init/` (and snapshotting if applicable). Specific things to verify:
- All four indexes exist in directus's db-init: `positions_device_ts` (UNIQUE), `positions_ts`, `positions_device_ts_idx`.
- Column types match exactly: `device_id text`, `ts timestamptz`, `ingested_at timestamptz DEFAULT now()`, etc.
- `chunk_time_interval` is `INTERVAL '1 day'` on both sides.
- The `ON CONFLICT (device_id, ts) DO NOTHING` upsert path requires the UNIQUE on `(device_id, ts)` — that's the `positions_device_ts` index, not `positions_device_ts_idx`. Both must exist.
### 2. Verify the directus `db-init` apply order is fixed
Per `docs/wiki/entities/directus.md`'s 5-step boot pipeline:
```
1. db-init pre-schema → positions hypertable, faulty column, timescaledb extension
2. directus bootstrap → Directus system tables + first admin
3. directus schema apply → Directus-managed user collections
4. db-init post-schema → composite UNIQUE constraints on user collections
5. pm2-runtime start → server up at :8055
```
So when processor boots against a stage stack:
- `directus` container has run steps 14 (positions exists; everything else exists).
- `processor` container can connect and `INSERT` immediately.
**Compose ordering.** `trm/deploy/compose.yaml`'s `processor` service should `depends_on: directus` with `condition: service_healthy` so processor doesn't try to read `positions` before directus's db-init has run on first deploy. Verify in this task.
## Deliverables
### `trm/processor/`
1. **Delete** `src/db/migrations/0001_positions.sql`.
2. **Delete** `src/db/migrations/0002_positions_faulty.sql`.
3. **Delete** the migrations directory if it's now empty.
4. **Delete** `src/db/migrate.ts` (or whatever the migration-runner module is named — the file that owns the `migrations_applied` table, the file walker, the `pg_advisory_lock` if any).
5. **Update `src/main.ts`** to remove the `await migrate(...)` step from boot. Postgres pool creation stays; migration call goes.
6. **Update tests** that exercise the migration runner — most likely deletes the corresponding test file. Integration tests that previously seeded the schema via `migrate()` either:
- (a) Use directus's `db-init/*.sql` files directly (read them in `beforeAll`, execute against the testcontainer Postgres), or
- (b) Carry a fixture SQL file in `test/fixtures/` (the same approach Phase 1.5 task 1.5.6 already takes for its integration test).
Pick (b) — it's already the established pattern.
7. **Update `Dockerfile`** to drop any migration-running step from the entrypoint (Phase 1's Dockerfile may not have this, but verify; the runtime container shouldn't carry the migrations directory if the runner is gone).
8. **Update `package.json` `dependencies`** — if `pg-migrate` or any migration-runner library was a Phase-1-only dep, remove it.
9. **Update `phase-1-throughput/04-postgres-schema.md`'s Done section** with a note: "Migration runner retired in Phase 3 task 3.5 — see that task for context."
10. **Update `ROADMAP.md`** to reflect the retired runner under Phase 1's "what changed since landing" note.
### `trm/docs/wiki/`
1. **Update `wiki/entities/processor.md`** — drop the "Schema ownership vs. write access" caveat that says "the positions hypertable is owned by [[processor]]'s migration runner." Replace with a single sentence: "Processor never runs DDL. Schema is exclusively owned by Directus (snapshot.yaml + `db-init/` for things the snapshot can't express)."
2. **Update `wiki/entities/directus.md`** — confirm the Schema-management section already lists `db-init/` as covering everything (no edits unless current text implies a split).
3. **Update `wiki/entities/postgres-timescaledb.md`** — verify the writer-side documentation; remove any "split schema authority" framing.
4. **Append `docs/log.md`** with a `note` entry recording the retirement.
### `trm/deploy/`
1. **Verify** `compose.yaml`'s `processor` service has `depends_on: directus: { condition: service_healthy }`. Add if missing.
2. **Confirm** the deploy README doesn't mention the processor's migration runner anywhere.
## Specification
### What stays in `src/db/`
- `pool.ts` — Postgres `Pool` factory. Untouched.
- Connection helpers, query helpers (if any). Untouched.
### What goes
- `migrations/*.sql` — gone.
- `migrate.ts` (the runner). Gone.
- `migrations_applied` table — directus's runner has its own; the processor's becomes orphaned but harmless. **Don't drop it from existing databases.** The retirement is a *runtime* change; the table is just unused. Phase 3 hardening's eventual `OPERATIONS.md` (task 3.7) can document a one-off `DROP TABLE migrations_applied` step for operators who want a clean schema.
### Boot order on first deploy
```
1. postgres container starts → DB available.
2. directus container starts → runs 5-step boot pipeline.
├─ Step 1 (db-init pre-schema) creates positions hypertable + faulty column + extensions.
├─ Steps 2-4 set up Directus's own world.
└─ Step 5 marks the container healthy.
3. processor container starts (depends_on: directus: service_healthy) → connects, finds positions, starts consuming.
```
If processor races directus on a fresh stack (no `depends_on`), it'll fail to find the `positions` table and crash-loop until directus catches up. `depends_on: service_healthy` makes the order deterministic.
### Dev workflow
`compose.dev.yaml` in `trm/processor` (if it exists for processor-side dev) should `depends_on: directus` if running both. For pure-processor dev (no directus), the developer either:
- Runs directus's `db-init/*.sql` manually against their local Postgres before booting processor.
- Or copies the equivalent SQL into a one-off bootstrap script in `processor/test/fixtures/`.
Document the chosen path in `processor/README.md`.
### What this task does NOT do
- Does not retire directus's snapshot-managed collections.
- Does not change Phase 1 or Phase 1.5 code paths beyond removing the migration runner step.
- Does not introduce a new migration tool. The fix is *fewer* moving parts, not different ones.
## Acceptance criteria
- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
- [ ] `pnpm test:integration` runs green — the integration test no longer relies on `migrate()`; it loads schema from a fixture SQL file instead.
- [ ] `src/db/migrations/` directory is gone (or empty + gitignored).
- [ ] No `migrate()` call anywhere in the source tree.
- [ ] No `migrations_applied` references in processor source.
- [ ] Stage smoke against a fresh DB: redeploy the stack, watch directus boot through its 5 steps, watch processor connect and start writing positions. No errors.
- [ ] `docs/wiki/entities/processor.md` and `directus.md` agree: directus is the sole DDL owner.
- [ ] `docs/log.md` has a `note` entry recording the retirement.
- [ ] `trm/deploy/compose.yaml`'s `processor` service has `depends_on: directus: service_healthy`.
## Risks / open questions
- **Existing prod databases.** If anyone has deployed the processor's migrations on a real DB, the `migrations_applied` table is harmless but stale. Document a one-off cleanup query for operators (in `OPERATIONS.md` when 3.7 lands).
- **Schema drift between processor's old migrations and directus's db-init.** If the diff in pre-flight step 1 surfaces anything, that diff must land in directus's `db-init/` *before* the processor's runner is retired. Order of operations matters: never delete the processor migration before the equivalent SQL is verified live in directus's runner.
- **Test container schema setup.** The integration test fixture has to mirror what directus actually creates. If directus's `db-init/` changes in a way that breaks processor's read paths, the fixture and the read paths both need updating. Mitigation: the fixture file lives in `test/fixtures/` and a comment at its top says "syncs with `trm/directus/db-init/` — update both when schema changes."
- **The original 3.5 ("Migration advisory lock") concern.** Once processor doesn't run migrations, the advisory-lock concern is delegated to directus's runner. That's a directus concern; whether to add an advisory lock to directus's `apply-db-init.sh` is tracked as a follow-up in directus's own roadmap, not here.
- **PostGIS usage in Phase 2.** Processor's `0001_positions.sql` enables PostGIS even though Phase 1 doesn't use it. Directus's `db-init/001_extensions.sql` does the same. Confirm in pre-flight; no change needed if the directus side already has it.
## Done
(Filled in when the task lands.)