Add planning documents for Phase 1 (throughput pipeline) and stub Phases 2-4

ROADMAP.md establishes status legend, architectural anchors pointing at the wiki, and seven non-negotiable design rules — most importantly the core/domain boundary that protects Phase 1 from Phase 2 churn, the schema-authority split (positions hypertable owned here; everything else owned by Directus), and idempotent-writes via (device_id, ts) ON CONFLICT. Phase 1 (throughput pipeline) is fully detailed across 11 task files: scaffold, core types + sentinel decoder, config + logging, Postgres hypertable, Redis Stream consumer, per-device LRU state, batched writer, main wiring, observability, integration test, Dockerfile + Gitea CI. Observability is in Phase 1 (not deferred) — lesson learned from tcp-ingestion task 1.10. Phases 2-4 are stub READMEs. Phase 2 (domain logic) blocks on Directus schema decisions and lists those open questions explicitly. Phase 3 (production hardening) and Phase 4 (future) sketch the task shape.
2026-04-30 21:16:26 +02:00
parent 1a4202f4d1
commit c314ba0902
17 changed files with 1191 additions and 0 deletions
@@ -0,0 +1,89 @@
+# Task 1.4 — Postgres connection & `positions` hypertable
+
+**Phase:** 1 — Throughput pipeline
+**Status:** ⬜ Not started
+**Depends on:** 1.1, 1.3
+**Wiki refs:** `docs/wiki/entities/postgres-timescaledb.md`
+
+## Goal
+
+Stand up the Postgres connection (a single `pg.Pool`) and define the `positions` hypertable migration. This is the only table whose schema the Processor owns directly (per the design rule in ROADMAP.md). Every other table is owned by Directus.
+
+## Deliverables
+
+- `src/db/pool.ts` exporting:
+  - `createPool(url: string): pg.Pool` — single Pool with sane defaults (`max: 10`, `idleTimeoutMillis: 30_000`, `connectionTimeoutMillis: 5_000`). Sets `application_name = 'processor'` so connections are identifiable in `pg_stat_activity`.
+  - `connectWithRetry(pool, logger): Promise<void>` — runs `SELECT 1` with exponential backoff (3 attempts, up to 5s). Mirrors `tcp-ingestion`'s `connectRedis` pattern. Calls `process.exit(1)` on final failure.
+- `src/db/migrations/0001_positions.sql` containing:
+  - `CREATE EXTENSION IF NOT EXISTS timescaledb;` (no-op if already enabled)
+  - `CREATE TABLE IF NOT EXISTS positions (...)` per the schema below
+  - `SELECT create_hypertable('positions', 'ts', if_not_exists => TRUE, chunk_time_interval => INTERVAL '1 day');`
+  - `CREATE UNIQUE INDEX IF NOT EXISTS positions_device_ts ON positions (device_id, ts);`
+  - `CREATE INDEX IF NOT EXISTS positions_ts ON positions (ts DESC);`
+- `src/db/migrate.ts` — minimal runner that executes pending migration files in order. Tracks applied migrations in a `schema_migrations(version, applied_at)` table. Idempotent. Called from `main.ts` before the consumer starts.
+- `test/db/migrate.test.ts` covering: applying a fresh migration; applying twice is a no-op; bad SQL fails loudly.
+
+## Specification
+
+### `positions` table schema
+
+```sql
+CREATE TABLE IF NOT EXISTS positions (
+  device_id    text        NOT NULL,
+  ts           timestamptz NOT NULL,        -- canonical event time from device GPS
+  ingested_at  timestamptz NOT NULL DEFAULT now(),  -- when Processor wrote the row
+  latitude     double precision NOT NULL,
+  longitude    double precision NOT NULL,
+  altitude     real        NOT NULL,
+  angle        real        NOT NULL,
+  speed        real        NOT NULL,
+  satellites   smallint    NOT NULL,
+  priority     smallint    NOT NULL,
+  codec        text        NOT NULL,        -- '8' | '8E' | '16'
+  attributes   jsonb       NOT NULL         -- the IO bag, sentinel-decoded
+);
+```
+
+### Why these column types
+
+- `device_id text` — IMEIs are 15 ASCII digits. Could be `bigint`, but `text` keeps the door open for non-IMEI device identifiers (future vendors) and avoids leading-zero loss.
+- `ts timestamptz` — the **device-reported** time, not ingestion time. This is the hypertable partitioning column.
+- `ingested_at timestamptz` — diagnostic: helps spot devices with clock skew or buffered records (the 55-record buffer flush we saw on stage). Not part of the natural key.
+- `altitude/angle/speed real` — float32 is plenty; saves space on a high-volume table.
+- `attributes jsonb` — preserves the IO bag verbatim. Per the design rule, no naming or unit conversion happens here; that's Phase 2 in `src/domain/`.
+
+### bigint and Buffer attributes — JSONB encoding
+
+The codec (task 1.2) decodes `__bigint` to `bigint` and `__buffer_b64` to `Buffer`. Postgres `jsonb` is JSON, so we re-encode for storage:
+- `bigint` → JSON number if it fits in `Number.MAX_SAFE_INTEGER`, else JSON string. Always store as a string is simpler and unambiguous; **decision: always string for bigint**.
+- `Buffer` → base64 string.
+
+**Re-encoding loses the type tag.** Phase 2 IO interpretation (per-model mapping table) is responsible for knowing that `attributes.io_240` is a u64 stored as a string. Phase 1 doesn't need to query individual attributes — it's pass-through storage.
+
+If this becomes painful later, options to revisit: a separate `attributes_typed` column with structured shape; or store bigints as `numeric` and Buffers as `bytea` in dedicated columns. **Defer** — 80% of attributes are small ints, and the simple string approach unblocks Phase 1.
+
+### Migration runner
+
+Follow the simplest possible pattern. The runner:
+1. `CREATE TABLE IF NOT EXISTS schema_migrations (version text PRIMARY KEY, applied_at timestamptz NOT NULL DEFAULT now())`.
+2. Lists `*.sql` files in `src/db/migrations/` sorted by filename.
+3. For each, `SELECT 1 FROM schema_migrations WHERE version = $1`. If absent, run the SQL inside a transaction and insert the row.
+4. Logs each applied or skipped migration at `info`.
+
+Do **not** introduce a heavy framework (Knex, node-pg-migrate). The Processor has one migration file in Phase 1 — a 30-line runner is the right answer.
+
+## Acceptance criteria
+
+- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
+- [ ] Integration test (testcontainers TimescaleDB): apply migration; insert a row with a bigint-as-string attribute; query it back; verify shape.
+- [ ] Re-running the migration on an already-migrated database is a no-op.
+- [ ] `connectWithRetry` retries 3 times with exponential backoff, then calls `process.exit(1)`. Verify with a unit test using a fake Pool.
+
+## Risks / open questions
+
+- **TimescaleDB extension availability.** The `deploy/` repo's Postgres container must be the `timescale/timescaledb` image, not stock `postgres`. Document this explicitly in the deploy README when Phase 1 ships. Fall back to a regular table (no hypertable) if the extension is unavailable: `create_hypertable` will error, but the `IF NOT EXISTS` table creation succeeds. The performance falls off a cliff at scale, but functional.
+- **Schema authority overlap with Directus.** Directus also speaks Postgres. When Directus connects and introspects the schema, it will see the `positions` table created by Processor. That's fine — Directus can reflect tables it didn't create. But if an operator later modifies `positions` from the Directus admin UI, the migration may break. Document: `positions` is a Processor-owned table; do not edit from Directus.
+
+## Done
+
+(Fill in once complete: commit SHA, brief notes.)