Add planning documents for Phase 1 (throughput pipeline) and stub Phases 2-4

ROADMAP.md establishes status legend, architectural anchors pointing at the
wiki, and seven non-negotiable design rules — most importantly the
core/domain boundary that protects Phase 1 from Phase 2 churn, the
schema-authority split (positions hypertable owned here; everything else
owned by Directus), and idempotent-writes via (device_id, ts) ON CONFLICT.

Phase 1 (throughput pipeline) is fully detailed across 11 task files:
scaffold, core types + sentinel decoder, config + logging, Postgres
hypertable, Redis Stream consumer, per-device LRU state, batched writer,
main wiring, observability, integration test, Dockerfile + Gitea CI.
Observability is in Phase 1 (not deferred) — lesson learned from
tcp-ingestion task 1.10.

Phases 2-4 are stub READMEs. Phase 2 (domain logic) blocks on Directus
schema decisions and lists those open questions explicitly. Phase 3
(production hardening) and Phase 4 (future) sketch the task shape.
This commit is contained in:
2026-04-30 21:16:26 +02:00
parent 1a4202f4d1
commit c314ba0902
17 changed files with 1191 additions and 0 deletions
@@ -0,0 +1,89 @@
# Task 1.4 — Postgres connection & `positions` hypertable
**Phase:** 1 — Throughput pipeline
**Status:** ⬜ Not started
**Depends on:** 1.1, 1.3
**Wiki refs:** `docs/wiki/entities/postgres-timescaledb.md`
## Goal
Stand up the Postgres connection (a single `pg.Pool`) and define the `positions` hypertable migration. This is the only table whose schema the Processor owns directly (per the design rule in ROADMAP.md). Every other table is owned by Directus.
## Deliverables
- `src/db/pool.ts` exporting:
- `createPool(url: string): pg.Pool` — single Pool with sane defaults (`max: 10`, `idleTimeoutMillis: 30_000`, `connectionTimeoutMillis: 5_000`). Sets `application_name = 'processor'` so connections are identifiable in `pg_stat_activity`.
- `connectWithRetry(pool, logger): Promise<void>` — runs `SELECT 1` with exponential backoff (3 attempts, up to 5s). Mirrors `tcp-ingestion`'s `connectRedis` pattern. Calls `process.exit(1)` on final failure.
- `src/db/migrations/0001_positions.sql` containing:
- `CREATE EXTENSION IF NOT EXISTS timescaledb;` (no-op if already enabled)
- `CREATE TABLE IF NOT EXISTS positions (...)` per the schema below
- `SELECT create_hypertable('positions', 'ts', if_not_exists => TRUE, chunk_time_interval => INTERVAL '1 day');`
- `CREATE UNIQUE INDEX IF NOT EXISTS positions_device_ts ON positions (device_id, ts);`
- `CREATE INDEX IF NOT EXISTS positions_ts ON positions (ts DESC);`
- `src/db/migrate.ts` — minimal runner that executes pending migration files in order. Tracks applied migrations in a `schema_migrations(version, applied_at)` table. Idempotent. Called from `main.ts` before the consumer starts.
- `test/db/migrate.test.ts` covering: applying a fresh migration; applying twice is a no-op; bad SQL fails loudly.
## Specification
### `positions` table schema
```sql
CREATE TABLE IF NOT EXISTS positions (
device_id text NOT NULL,
ts timestamptz NOT NULL, -- canonical event time from device GPS
ingested_at timestamptz NOT NULL DEFAULT now(), -- when Processor wrote the row
latitude double precision NOT NULL,
longitude double precision NOT NULL,
altitude real NOT NULL,
angle real NOT NULL,
speed real NOT NULL,
satellites smallint NOT NULL,
priority smallint NOT NULL,
codec text NOT NULL, -- '8' | '8E' | '16'
attributes jsonb NOT NULL -- the IO bag, sentinel-decoded
);
```
### Why these column types
- `device_id text` — IMEIs are 15 ASCII digits. Could be `bigint`, but `text` keeps the door open for non-IMEI device identifiers (future vendors) and avoids leading-zero loss.
- `ts timestamptz` — the **device-reported** time, not ingestion time. This is the hypertable partitioning column.
- `ingested_at timestamptz` — diagnostic: helps spot devices with clock skew or buffered records (the 55-record buffer flush we saw on stage). Not part of the natural key.
- `altitude/angle/speed real` — float32 is plenty; saves space on a high-volume table.
- `attributes jsonb` — preserves the IO bag verbatim. Per the design rule, no naming or unit conversion happens here; that's Phase 2 in `src/domain/`.
### bigint and Buffer attributes — JSONB encoding
The codec (task 1.2) decodes `__bigint` to `bigint` and `__buffer_b64` to `Buffer`. Postgres `jsonb` is JSON, so we re-encode for storage:
- `bigint` → JSON number if it fits in `Number.MAX_SAFE_INTEGER`, else JSON string. Always store as a string is simpler and unambiguous; **decision: always string for bigint**.
- `Buffer` → base64 string.
**Re-encoding loses the type tag.** Phase 2 IO interpretation (per-model mapping table) is responsible for knowing that `attributes.io_240` is a u64 stored as a string. Phase 1 doesn't need to query individual attributes — it's pass-through storage.
If this becomes painful later, options to revisit: a separate `attributes_typed` column with structured shape; or store bigints as `numeric` and Buffers as `bytea` in dedicated columns. **Defer** — 80% of attributes are small ints, and the simple string approach unblocks Phase 1.
### Migration runner
Follow the simplest possible pattern. The runner:
1. `CREATE TABLE IF NOT EXISTS schema_migrations (version text PRIMARY KEY, applied_at timestamptz NOT NULL DEFAULT now())`.
2. Lists `*.sql` files in `src/db/migrations/` sorted by filename.
3. For each, `SELECT 1 FROM schema_migrations WHERE version = $1`. If absent, run the SQL inside a transaction and insert the row.
4. Logs each applied or skipped migration at `info`.
Do **not** introduce a heavy framework (Knex, node-pg-migrate). The Processor has one migration file in Phase 1 — a 30-line runner is the right answer.
## Acceptance criteria
- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
- [ ] Integration test (testcontainers TimescaleDB): apply migration; insert a row with a bigint-as-string attribute; query it back; verify shape.
- [ ] Re-running the migration on an already-migrated database is a no-op.
- [ ] `connectWithRetry` retries 3 times with exponential backoff, then calls `process.exit(1)`. Verify with a unit test using a fake Pool.
## Risks / open questions
- **TimescaleDB extension availability.** The `deploy/` repo's Postgres container must be the `timescale/timescaledb` image, not stock `postgres`. Document this explicitly in the deploy README when Phase 1 ships. Fall back to a regular table (no hypertable) if the extension is unavailable: `create_hypertable` will error, but the `IF NOT EXISTS` table creation succeeds. The performance falls off a cliff at scale, but functional.
- **Schema authority overlap with Directus.** Directus also speaks Postgres. When Directus connects and introspects the schema, it will see the `positions` table created by Processor. That's fine — Directus can reflect tables it didn't create. But if an operator later modifies `positions` from the Directus admin UI, the migration may break. Document: `positions` is a Processor-owned table; do not edit from Directus.
## Done
(Fill in once complete: commit SHA, brief notes.)