docs(planning): file phase-3 task 3.5 — retire processor migration runner
Replaces the original "migration advisory lock" sketch. Once processor doesn't run DDL, the lock concern delegates to Directus's db-init runner. Context: positions hypertable + faulty column DDL currently exists in both processor (src/db/migrations/0001 + 0002) and directus (db-init/001/002/003). Two sources of truth for the same schema is a known hazard — adding a column means editing two files in two repos, and silent drift between them is invisible until runtime. Fix: directus becomes the sole DDL owner. Processor's migration runner is retired; only INSERT/SELECT/UPDATE remain. Task spec covers: - Pre-flight diff between processor migrations and directus db-init (must be byte/semantically equivalent before deletion) - File-by-file deletion list - Test infra migration (integration test moves to fixture-based schema setup, matching the established Phase 1.5 task 1.5.6 pattern) - Wiki + ROADMAP updates - compose.yaml depends_on directus: service_healthy - Operational notes (existing migrations_applied table is left in place) Sequence: ideally lands AFTER Phase 1.5 ships so the agent shipping the WS endpoint isn't pulled into a side quest mid-flight.
This commit is contained in:
@@ -0,0 +1,169 @@
|
||||
# Task 3.5 — Retire processor migration runner
|
||||
|
||||
**Phase:** 3 — Production hardening
|
||||
**Status:** ⬜ Not started
|
||||
**Depends on:** Phase 1.5 ideally landed (avoid mid-flight code churn for the agent shipping the WS endpoint). No hard code dependency.
|
||||
**Replaces:** the original 3.5 sketch ("Migration advisory lock"). Once the processor doesn't run migrations, the lock concern is delegated to Directus's `db-init` runner — outside this service's surface.
|
||||
**Wiki refs:** `docs/wiki/entities/processor.md` §"Schema ownership vs. write access" (the line that needs to change), `docs/wiki/entities/directus.md` §"Schema management — snapshot/apply pipeline", `docs/wiki/entities/postgres-timescaledb.md`
|
||||
|
||||
## Goal
|
||||
|
||||
Establish [[directus]] as the **single owner of all DDL** against the shared Postgres database. Retire the processor's migration runner. After this task, the only DDL paths are:
|
||||
|
||||
- `trm/directus/db-init/*.sql` (pre-schema: extensions, hypertables, raw tables Directus's snapshot-yaml format can't express).
|
||||
- `trm/directus/snapshots/schema.yaml` (Directus-managed user collections).
|
||||
- `trm/directus/db-init-post/*.sql` (post-schema: composite UNIQUE constraints on Directus-managed tables).
|
||||
|
||||
Processor exclusively does `INSERT` / `SELECT` / `UPDATE`. No `CREATE`, `ALTER`, `CREATE EXTENSION`, or any other DDL.
|
||||
|
||||
## Context — why this exists
|
||||
|
||||
The current state has **both** services creating the `positions` hypertable and the `faulty` column:
|
||||
|
||||
- `trm/processor/src/db/migrations/0001_positions.sql` and `0002_positions_faulty.sql` (processor's runner, from Phase 1 task 1.4 + the recent 1.5.5 prep).
|
||||
- `trm/directus/db-init/001_extensions.sql`, `002_positions_hypertable.sql`, `003_faulty_column.sql` (directus's runner, added later when the destructive-apply incident showed positions had to exist *before* `directus schema apply` runs or it would get wiped).
|
||||
|
||||
Both runners are idempotent (`IF NOT EXISTS`, etc.) so the runtime collision is benign at the moment, but the architectural risks are real:
|
||||
|
||||
- **Two sources of truth.** Adding a column means editing two files in two repos; either one can drift silently.
|
||||
- **Schema divergence.** A processor migration that adds a column the directus side doesn't know about is silently invisible to the admin UI.
|
||||
- **Two `migrations_applied` tables**, which already caused the ghost-collection apply conflict earlier in Phase 1 of directus.
|
||||
- **Operator confusion.** The wiki says "Directus owns the schema" but processor runs migrations — newcomers can't tell which is canonical.
|
||||
|
||||
The fix is the wiki's stated intent: directus owns DDL. Processor was the historical owner before directus's `db-init` story matured; the legacy runner survived the transition because nobody retired it.
|
||||
|
||||
## Pre-flight (before deleting anything)
|
||||
|
||||
### 1. Confirm directus's `db-init/` covers the full processor schema surface
|
||||
|
||||
Check that `trm/directus/db-init/`'s SQL is byte-equivalent (or semantically equivalent) to processor's migrations. As of writing, directus has:
|
||||
|
||||
- `001_extensions.sql` — `CREATE EXTENSION IF NOT EXISTS timescaledb` + `postgis`.
|
||||
- `002_positions_hypertable.sql` — `CREATE TABLE positions (...)` + `create_hypertable(...)` + indexes.
|
||||
- `003_faulty_column.sql` — `ALTER TABLE positions ADD COLUMN IF NOT EXISTS faulty ...` + `positions_device_ts_idx`.
|
||||
|
||||
Processor has:
|
||||
|
||||
- `src/db/migrations/0001_positions.sql` — extensions + table + hypertable + `positions_device_ts` (UNIQUE on `(device_id, ts)`) + `positions_ts` (DESC).
|
||||
- `src/db/migrations/0002_positions_faulty.sql` — `faulty` column + `positions_device_ts_idx` (`(device_id, ts DESC)`).
|
||||
|
||||
**Diff the two before retiring.** If processor's SQL has an index, column, or constraint directus's `db-init/` doesn't, the deliverable starts with porting that diff into directus's `db-init/` (and snapshotting if applicable). Specific things to verify:
|
||||
|
||||
- All four indexes exist in directus's db-init: `positions_device_ts` (UNIQUE), `positions_ts`, `positions_device_ts_idx`.
|
||||
- Column types match exactly: `device_id text`, `ts timestamptz`, `ingested_at timestamptz DEFAULT now()`, etc.
|
||||
- `chunk_time_interval` is `INTERVAL '1 day'` on both sides.
|
||||
- The `ON CONFLICT (device_id, ts) DO NOTHING` upsert path requires the UNIQUE on `(device_id, ts)` — that's the `positions_device_ts` index, not `positions_device_ts_idx`. Both must exist.
|
||||
|
||||
### 2. Verify the directus `db-init` apply order is fixed
|
||||
|
||||
Per `docs/wiki/entities/directus.md`'s 5-step boot pipeline:
|
||||
|
||||
```
|
||||
1. db-init pre-schema → positions hypertable, faulty column, timescaledb extension
|
||||
2. directus bootstrap → Directus system tables + first admin
|
||||
3. directus schema apply → Directus-managed user collections
|
||||
4. db-init post-schema → composite UNIQUE constraints on user collections
|
||||
5. pm2-runtime start → server up at :8055
|
||||
```
|
||||
|
||||
So when processor boots against a stage stack:
|
||||
|
||||
- `directus` container has run steps 1–4 (positions exists; everything else exists).
|
||||
- `processor` container can connect and `INSERT` immediately.
|
||||
|
||||
**Compose ordering.** `trm/deploy/compose.yaml`'s `processor` service should `depends_on: directus` with `condition: service_healthy` so processor doesn't try to read `positions` before directus's db-init has run on first deploy. Verify in this task.
|
||||
|
||||
## Deliverables
|
||||
|
||||
### `trm/processor/`
|
||||
|
||||
1. **Delete** `src/db/migrations/0001_positions.sql`.
|
||||
2. **Delete** `src/db/migrations/0002_positions_faulty.sql`.
|
||||
3. **Delete** the migrations directory if it's now empty.
|
||||
4. **Delete** `src/db/migrate.ts` (or whatever the migration-runner module is named — the file that owns the `migrations_applied` table, the file walker, the `pg_advisory_lock` if any).
|
||||
5. **Update `src/main.ts`** to remove the `await migrate(...)` step from boot. Postgres pool creation stays; migration call goes.
|
||||
6. **Update tests** that exercise the migration runner — most likely deletes the corresponding test file. Integration tests that previously seeded the schema via `migrate()` either:
|
||||
- (a) Use directus's `db-init/*.sql` files directly (read them in `beforeAll`, execute against the testcontainer Postgres), or
|
||||
- (b) Carry a fixture SQL file in `test/fixtures/` (the same approach Phase 1.5 task 1.5.6 already takes for its integration test).
|
||||
Pick (b) — it's already the established pattern.
|
||||
7. **Update `Dockerfile`** to drop any migration-running step from the entrypoint (Phase 1's Dockerfile may not have this, but verify; the runtime container shouldn't carry the migrations directory if the runner is gone).
|
||||
8. **Update `package.json` `dependencies`** — if `pg-migrate` or any migration-runner library was a Phase-1-only dep, remove it.
|
||||
9. **Update `phase-1-throughput/04-postgres-schema.md`'s Done section** with a note: "Migration runner retired in Phase 3 task 3.5 — see that task for context."
|
||||
10. **Update `ROADMAP.md`** to reflect the retired runner under Phase 1's "what changed since landing" note.
|
||||
|
||||
### `trm/docs/wiki/`
|
||||
|
||||
1. **Update `wiki/entities/processor.md`** — drop the "Schema ownership vs. write access" caveat that says "the positions hypertable is owned by [[processor]]'s migration runner." Replace with a single sentence: "Processor never runs DDL. Schema is exclusively owned by Directus (snapshot.yaml + `db-init/` for things the snapshot can't express)."
|
||||
2. **Update `wiki/entities/directus.md`** — confirm the Schema-management section already lists `db-init/` as covering everything (no edits unless current text implies a split).
|
||||
3. **Update `wiki/entities/postgres-timescaledb.md`** — verify the writer-side documentation; remove any "split schema authority" framing.
|
||||
4. **Append `docs/log.md`** with a `note` entry recording the retirement.
|
||||
|
||||
### `trm/deploy/`
|
||||
|
||||
1. **Verify** `compose.yaml`'s `processor` service has `depends_on: directus: { condition: service_healthy }`. Add if missing.
|
||||
2. **Confirm** the deploy README doesn't mention the processor's migration runner anywhere.
|
||||
|
||||
## Specification
|
||||
|
||||
### What stays in `src/db/`
|
||||
|
||||
- `pool.ts` — Postgres `Pool` factory. Untouched.
|
||||
- Connection helpers, query helpers (if any). Untouched.
|
||||
|
||||
### What goes
|
||||
|
||||
- `migrations/*.sql` — gone.
|
||||
- `migrate.ts` (the runner). Gone.
|
||||
- `migrations_applied` table — directus's runner has its own; the processor's becomes orphaned but harmless. **Don't drop it from existing databases.** The retirement is a *runtime* change; the table is just unused. Phase 3 hardening's eventual `OPERATIONS.md` (task 3.7) can document a one-off `DROP TABLE migrations_applied` step for operators who want a clean schema.
|
||||
|
||||
### Boot order on first deploy
|
||||
|
||||
```
|
||||
1. postgres container starts → DB available.
|
||||
2. directus container starts → runs 5-step boot pipeline.
|
||||
├─ Step 1 (db-init pre-schema) creates positions hypertable + faulty column + extensions.
|
||||
├─ Steps 2-4 set up Directus's own world.
|
||||
└─ Step 5 marks the container healthy.
|
||||
3. processor container starts (depends_on: directus: service_healthy) → connects, finds positions, starts consuming.
|
||||
```
|
||||
|
||||
If processor races directus on a fresh stack (no `depends_on`), it'll fail to find the `positions` table and crash-loop until directus catches up. `depends_on: service_healthy` makes the order deterministic.
|
||||
|
||||
### Dev workflow
|
||||
|
||||
`compose.dev.yaml` in `trm/processor` (if it exists for processor-side dev) should `depends_on: directus` if running both. For pure-processor dev (no directus), the developer either:
|
||||
|
||||
- Runs directus's `db-init/*.sql` manually against their local Postgres before booting processor.
|
||||
- Or copies the equivalent SQL into a one-off bootstrap script in `processor/test/fixtures/`.
|
||||
|
||||
Document the chosen path in `processor/README.md`.
|
||||
|
||||
### What this task does NOT do
|
||||
|
||||
- Does not retire directus's snapshot-managed collections.
|
||||
- Does not change Phase 1 or Phase 1.5 code paths beyond removing the migration runner step.
|
||||
- Does not introduce a new migration tool. The fix is *fewer* moving parts, not different ones.
|
||||
|
||||
## Acceptance criteria
|
||||
|
||||
- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
|
||||
- [ ] `pnpm test:integration` runs green — the integration test no longer relies on `migrate()`; it loads schema from a fixture SQL file instead.
|
||||
- [ ] `src/db/migrations/` directory is gone (or empty + gitignored).
|
||||
- [ ] No `migrate()` call anywhere in the source tree.
|
||||
- [ ] No `migrations_applied` references in processor source.
|
||||
- [ ] Stage smoke against a fresh DB: redeploy the stack, watch directus boot through its 5 steps, watch processor connect and start writing positions. No errors.
|
||||
- [ ] `docs/wiki/entities/processor.md` and `directus.md` agree: directus is the sole DDL owner.
|
||||
- [ ] `docs/log.md` has a `note` entry recording the retirement.
|
||||
- [ ] `trm/deploy/compose.yaml`'s `processor` service has `depends_on: directus: service_healthy`.
|
||||
|
||||
## Risks / open questions
|
||||
|
||||
- **Existing prod databases.** If anyone has deployed the processor's migrations on a real DB, the `migrations_applied` table is harmless but stale. Document a one-off cleanup query for operators (in `OPERATIONS.md` when 3.7 lands).
|
||||
- **Schema drift between processor's old migrations and directus's db-init.** If the diff in pre-flight step 1 surfaces anything, that diff must land in directus's `db-init/` *before* the processor's runner is retired. Order of operations matters: never delete the processor migration before the equivalent SQL is verified live in directus's runner.
|
||||
- **Test container schema setup.** The integration test fixture has to mirror what directus actually creates. If directus's `db-init/` changes in a way that breaks processor's read paths, the fixture and the read paths both need updating. Mitigation: the fixture file lives in `test/fixtures/` and a comment at its top says "syncs with `trm/directus/db-init/` — update both when schema changes."
|
||||
- **The original 3.5 ("Migration advisory lock") concern.** Once processor doesn't run migrations, the advisory-lock concern is delegated to directus's runner. That's a directus concern; whether to add an advisory lock to directus's `apply-db-init.sh` is tracked as a follow-up in directus's own roadmap, not here.
|
||||
- **PostGIS usage in Phase 2.** Processor's `0001_positions.sql` enables PostGIS even though Phase 1 doesn't use it. Directus's `db-init/001_extensions.sql` does the same. Confirm in pre-flight; no change needed if the directus side already has it.
|
||||
|
||||
## Done
|
||||
|
||||
(Filled in when the task lands.)
|
||||
@@ -25,7 +25,7 @@ When Phase 3 is done:
|
||||
| 3.2 | Per-device state rehydration on first-packet | Single `SELECT ... LIMIT 1` per cold device. Memoized by LRU |
|
||||
| 3.3 | `XAUTOCLAIM` runner | Periodic + on-startup. Claims entries pending > `CLAIM_THRESHOLD_MS`. Re-runs the sink |
|
||||
| 3.4 | Dead-letter stream | After N failed decodes/writes, record goes to `telemetry:teltonika:dlq`; original ACKed off the main stream |
|
||||
| 3.5 | Migration advisory lock | `pg_advisory_lock(<hash>)` around the migrate runner; two instances can start simultaneously |
|
||||
| 3.5 | [Retire processor migration runner](./05-retire-migration-runner.md) | Delete `src/db/migrations/*` and the runner; Directus becomes the sole DDL owner via its `db-init/`. Closes the two-sources-of-truth hazard for `positions`. Replaces the original "migration advisory lock" sketch — once processor doesn't run migrations, the lock concern delegates to Directus. |
|
||||
| 3.6 | Uncaught exception / unhandled rejection handlers | Log, flush, exit. Match `tcp-ingestion`'s eventual Phase 1 task 1.12 work when that lands |
|
||||
| 3.7 | OPERATIONS.md | The runbook |
|
||||
| 3.8 | Multi-instance load test | A test (manual or in CI) that proves two instances split the work; document expected lag behaviour during failover |
|
||||
|
||||
Reference in New Issue
Block a user