3c2c5cf50e
Replaces the original "migration advisory lock" sketch. Once processor doesn't run DDL, the lock concern delegates to Directus's db-init runner. Context: positions hypertable + faulty column DDL currently exists in both processor (src/db/migrations/0001 + 0002) and directus (db-init/001/002/003). Two sources of truth for the same schema is a known hazard — adding a column means editing two files in two repos, and silent drift between them is invisible until runtime. Fix: directus becomes the sole DDL owner. Processor's migration runner is retired; only INSERT/SELECT/UPDATE remain. Task spec covers: - Pre-flight diff between processor migrations and directus db-init (must be byte/semantically equivalent before deletion) - File-by-file deletion list - Test infra migration (integration test moves to fixture-based schema setup, matching the established Phase 1.5 task 1.5.6 pattern) - Wiki + ROADMAP updates - compose.yaml depends_on directus: service_healthy - Operational notes (existing migrations_applied table is left in place) Sequence: ideally lands AFTER Phase 1.5 ships so the agent shipping the WS endpoint isn't pulled into a side quest mid-flight.
170 lines
12 KiB
Markdown
170 lines
12 KiB
Markdown
# Task 3.5 — Retire processor migration runner
|
||
|
||
**Phase:** 3 — Production hardening
|
||
**Status:** ⬜ Not started
|
||
**Depends on:** Phase 1.5 ideally landed (avoid mid-flight code churn for the agent shipping the WS endpoint). No hard code dependency.
|
||
**Replaces:** the original 3.5 sketch ("Migration advisory lock"). Once the processor doesn't run migrations, the lock concern is delegated to Directus's `db-init` runner — outside this service's surface.
|
||
**Wiki refs:** `docs/wiki/entities/processor.md` §"Schema ownership vs. write access" (the line that needs to change), `docs/wiki/entities/directus.md` §"Schema management — snapshot/apply pipeline", `docs/wiki/entities/postgres-timescaledb.md`
|
||
|
||
## Goal
|
||
|
||
Establish [[directus]] as the **single owner of all DDL** against the shared Postgres database. Retire the processor's migration runner. After this task, the only DDL paths are:
|
||
|
||
- `trm/directus/db-init/*.sql` (pre-schema: extensions, hypertables, raw tables Directus's snapshot-yaml format can't express).
|
||
- `trm/directus/snapshots/schema.yaml` (Directus-managed user collections).
|
||
- `trm/directus/db-init-post/*.sql` (post-schema: composite UNIQUE constraints on Directus-managed tables).
|
||
|
||
Processor exclusively does `INSERT` / `SELECT` / `UPDATE`. No `CREATE`, `ALTER`, `CREATE EXTENSION`, or any other DDL.
|
||
|
||
## Context — why this exists
|
||
|
||
The current state has **both** services creating the `positions` hypertable and the `faulty` column:
|
||
|
||
- `trm/processor/src/db/migrations/0001_positions.sql` and `0002_positions_faulty.sql` (processor's runner, from Phase 1 task 1.4 + the recent 1.5.5 prep).
|
||
- `trm/directus/db-init/001_extensions.sql`, `002_positions_hypertable.sql`, `003_faulty_column.sql` (directus's runner, added later when the destructive-apply incident showed positions had to exist *before* `directus schema apply` runs or it would get wiped).
|
||
|
||
Both runners are idempotent (`IF NOT EXISTS`, etc.) so the runtime collision is benign at the moment, but the architectural risks are real:
|
||
|
||
- **Two sources of truth.** Adding a column means editing two files in two repos; either one can drift silently.
|
||
- **Schema divergence.** A processor migration that adds a column the directus side doesn't know about is silently invisible to the admin UI.
|
||
- **Two `migrations_applied` tables**, which already caused the ghost-collection apply conflict earlier in Phase 1 of directus.
|
||
- **Operator confusion.** The wiki says "Directus owns the schema" but processor runs migrations — newcomers can't tell which is canonical.
|
||
|
||
The fix is the wiki's stated intent: directus owns DDL. Processor was the historical owner before directus's `db-init` story matured; the legacy runner survived the transition because nobody retired it.
|
||
|
||
## Pre-flight (before deleting anything)
|
||
|
||
### 1. Confirm directus's `db-init/` covers the full processor schema surface
|
||
|
||
Check that `trm/directus/db-init/`'s SQL is byte-equivalent (or semantically equivalent) to processor's migrations. As of writing, directus has:
|
||
|
||
- `001_extensions.sql` — `CREATE EXTENSION IF NOT EXISTS timescaledb` + `postgis`.
|
||
- `002_positions_hypertable.sql` — `CREATE TABLE positions (...)` + `create_hypertable(...)` + indexes.
|
||
- `003_faulty_column.sql` — `ALTER TABLE positions ADD COLUMN IF NOT EXISTS faulty ...` + `positions_device_ts_idx`.
|
||
|
||
Processor has:
|
||
|
||
- `src/db/migrations/0001_positions.sql` — extensions + table + hypertable + `positions_device_ts` (UNIQUE on `(device_id, ts)`) + `positions_ts` (DESC).
|
||
- `src/db/migrations/0002_positions_faulty.sql` — `faulty` column + `positions_device_ts_idx` (`(device_id, ts DESC)`).
|
||
|
||
**Diff the two before retiring.** If processor's SQL has an index, column, or constraint directus's `db-init/` doesn't, the deliverable starts with porting that diff into directus's `db-init/` (and snapshotting if applicable). Specific things to verify:
|
||
|
||
- All four indexes exist in directus's db-init: `positions_device_ts` (UNIQUE), `positions_ts`, `positions_device_ts_idx`.
|
||
- Column types match exactly: `device_id text`, `ts timestamptz`, `ingested_at timestamptz DEFAULT now()`, etc.
|
||
- `chunk_time_interval` is `INTERVAL '1 day'` on both sides.
|
||
- The `ON CONFLICT (device_id, ts) DO NOTHING` upsert path requires the UNIQUE on `(device_id, ts)` — that's the `positions_device_ts` index, not `positions_device_ts_idx`. Both must exist.
|
||
|
||
### 2. Verify the directus `db-init` apply order is fixed
|
||
|
||
Per `docs/wiki/entities/directus.md`'s 5-step boot pipeline:
|
||
|
||
```
|
||
1. db-init pre-schema → positions hypertable, faulty column, timescaledb extension
|
||
2. directus bootstrap → Directus system tables + first admin
|
||
3. directus schema apply → Directus-managed user collections
|
||
4. db-init post-schema → composite UNIQUE constraints on user collections
|
||
5. pm2-runtime start → server up at :8055
|
||
```
|
||
|
||
So when processor boots against a stage stack:
|
||
|
||
- `directus` container has run steps 1–4 (positions exists; everything else exists).
|
||
- `processor` container can connect and `INSERT` immediately.
|
||
|
||
**Compose ordering.** `trm/deploy/compose.yaml`'s `processor` service should `depends_on: directus` with `condition: service_healthy` so processor doesn't try to read `positions` before directus's db-init has run on first deploy. Verify in this task.
|
||
|
||
## Deliverables
|
||
|
||
### `trm/processor/`
|
||
|
||
1. **Delete** `src/db/migrations/0001_positions.sql`.
|
||
2. **Delete** `src/db/migrations/0002_positions_faulty.sql`.
|
||
3. **Delete** the migrations directory if it's now empty.
|
||
4. **Delete** `src/db/migrate.ts` (or whatever the migration-runner module is named — the file that owns the `migrations_applied` table, the file walker, the `pg_advisory_lock` if any).
|
||
5. **Update `src/main.ts`** to remove the `await migrate(...)` step from boot. Postgres pool creation stays; migration call goes.
|
||
6. **Update tests** that exercise the migration runner — most likely deletes the corresponding test file. Integration tests that previously seeded the schema via `migrate()` either:
|
||
- (a) Use directus's `db-init/*.sql` files directly (read them in `beforeAll`, execute against the testcontainer Postgres), or
|
||
- (b) Carry a fixture SQL file in `test/fixtures/` (the same approach Phase 1.5 task 1.5.6 already takes for its integration test).
|
||
Pick (b) — it's already the established pattern.
|
||
7. **Update `Dockerfile`** to drop any migration-running step from the entrypoint (Phase 1's Dockerfile may not have this, but verify; the runtime container shouldn't carry the migrations directory if the runner is gone).
|
||
8. **Update `package.json` `dependencies`** — if `pg-migrate` or any migration-runner library was a Phase-1-only dep, remove it.
|
||
9. **Update `phase-1-throughput/04-postgres-schema.md`'s Done section** with a note: "Migration runner retired in Phase 3 task 3.5 — see that task for context."
|
||
10. **Update `ROADMAP.md`** to reflect the retired runner under Phase 1's "what changed since landing" note.
|
||
|
||
### `trm/docs/wiki/`
|
||
|
||
1. **Update `wiki/entities/processor.md`** — drop the "Schema ownership vs. write access" caveat that says "the positions hypertable is owned by [[processor]]'s migration runner." Replace with a single sentence: "Processor never runs DDL. Schema is exclusively owned by Directus (snapshot.yaml + `db-init/` for things the snapshot can't express)."
|
||
2. **Update `wiki/entities/directus.md`** — confirm the Schema-management section already lists `db-init/` as covering everything (no edits unless current text implies a split).
|
||
3. **Update `wiki/entities/postgres-timescaledb.md`** — verify the writer-side documentation; remove any "split schema authority" framing.
|
||
4. **Append `docs/log.md`** with a `note` entry recording the retirement.
|
||
|
||
### `trm/deploy/`
|
||
|
||
1. **Verify** `compose.yaml`'s `processor` service has `depends_on: directus: { condition: service_healthy }`. Add if missing.
|
||
2. **Confirm** the deploy README doesn't mention the processor's migration runner anywhere.
|
||
|
||
## Specification
|
||
|
||
### What stays in `src/db/`
|
||
|
||
- `pool.ts` — Postgres `Pool` factory. Untouched.
|
||
- Connection helpers, query helpers (if any). Untouched.
|
||
|
||
### What goes
|
||
|
||
- `migrations/*.sql` — gone.
|
||
- `migrate.ts` (the runner). Gone.
|
||
- `migrations_applied` table — directus's runner has its own; the processor's becomes orphaned but harmless. **Don't drop it from existing databases.** The retirement is a *runtime* change; the table is just unused. Phase 3 hardening's eventual `OPERATIONS.md` (task 3.7) can document a one-off `DROP TABLE migrations_applied` step for operators who want a clean schema.
|
||
|
||
### Boot order on first deploy
|
||
|
||
```
|
||
1. postgres container starts → DB available.
|
||
2. directus container starts → runs 5-step boot pipeline.
|
||
├─ Step 1 (db-init pre-schema) creates positions hypertable + faulty column + extensions.
|
||
├─ Steps 2-4 set up Directus's own world.
|
||
└─ Step 5 marks the container healthy.
|
||
3. processor container starts (depends_on: directus: service_healthy) → connects, finds positions, starts consuming.
|
||
```
|
||
|
||
If processor races directus on a fresh stack (no `depends_on`), it'll fail to find the `positions` table and crash-loop until directus catches up. `depends_on: service_healthy` makes the order deterministic.
|
||
|
||
### Dev workflow
|
||
|
||
`compose.dev.yaml` in `trm/processor` (if it exists for processor-side dev) should `depends_on: directus` if running both. For pure-processor dev (no directus), the developer either:
|
||
|
||
- Runs directus's `db-init/*.sql` manually against their local Postgres before booting processor.
|
||
- Or copies the equivalent SQL into a one-off bootstrap script in `processor/test/fixtures/`.
|
||
|
||
Document the chosen path in `processor/README.md`.
|
||
|
||
### What this task does NOT do
|
||
|
||
- Does not retire directus's snapshot-managed collections.
|
||
- Does not change Phase 1 or Phase 1.5 code paths beyond removing the migration runner step.
|
||
- Does not introduce a new migration tool. The fix is *fewer* moving parts, not different ones.
|
||
|
||
## Acceptance criteria
|
||
|
||
- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
|
||
- [ ] `pnpm test:integration` runs green — the integration test no longer relies on `migrate()`; it loads schema from a fixture SQL file instead.
|
||
- [ ] `src/db/migrations/` directory is gone (or empty + gitignored).
|
||
- [ ] No `migrate()` call anywhere in the source tree.
|
||
- [ ] No `migrations_applied` references in processor source.
|
||
- [ ] Stage smoke against a fresh DB: redeploy the stack, watch directus boot through its 5 steps, watch processor connect and start writing positions. No errors.
|
||
- [ ] `docs/wiki/entities/processor.md` and `directus.md` agree: directus is the sole DDL owner.
|
||
- [ ] `docs/log.md` has a `note` entry recording the retirement.
|
||
- [ ] `trm/deploy/compose.yaml`'s `processor` service has `depends_on: directus: service_healthy`.
|
||
|
||
## Risks / open questions
|
||
|
||
- **Existing prod databases.** If anyone has deployed the processor's migrations on a real DB, the `migrations_applied` table is harmless but stale. Document a one-off cleanup query for operators (in `OPERATIONS.md` when 3.7 lands).
|
||
- **Schema drift between processor's old migrations and directus's db-init.** If the diff in pre-flight step 1 surfaces anything, that diff must land in directus's `db-init/` *before* the processor's runner is retired. Order of operations matters: never delete the processor migration before the equivalent SQL is verified live in directus's runner.
|
||
- **Test container schema setup.** The integration test fixture has to mirror what directus actually creates. If directus's `db-init/` changes in a way that breaks processor's read paths, the fixture and the read paths both need updating. Mitigation: the fixture file lives in `test/fixtures/` and a comment at its top says "syncs with `trm/directus/db-init/` — update both when schema changes."
|
||
- **The original 3.5 ("Migration advisory lock") concern.** Once processor doesn't run migrations, the advisory-lock concern is delegated to directus's runner. That's a directus concern; whether to add an advisory lock to directus's `apply-db-init.sh` is tracked as a follow-up in directus's own roadmap, not here.
|
||
- **PostGIS usage in Phase 2.** Processor's `0001_positions.sql` enables PostGIS even though Phase 1 doesn't use it. Directus's `db-init/001_extensions.sql` does the same. Confirm in pre-flight; no change needed if the directus side already has it.
|
||
|
||
## Done
|
||
|
||
(Filled in when the task lands.)
|