processor/.planning/phase-1-throughput/04-postgres-schema.md

# Task 1.4 — Postgres connection & `positions` hypertable

**Phase:** 1 — Throughput pipeline
**Status:** ⬜ Not started
**Depends on:** 1.1, 1.3
**Wiki refs:** `docs/wiki/entities/postgres-timescaledb.md`

## Goal

Stand up the Postgres connection (a single `pg.Pool`) and define the `positions` hypertable migration. This is the only table whose schema the Processor owns directly (per the design rule in ROADMAP.md). Every other table is owned by Directus.

## Deliverables

- `src/db/pool.ts` exporting:
  - `createPool(url: string): pg.Pool` — single Pool with sane defaults (`max: 10`, `idleTimeoutMillis: 30_000`, `connectionTimeoutMillis: 5_000`). Sets `application_name = 'processor'` so connections are identifiable in `pg_stat_activity`.
  - `connectWithRetry(pool, logger): Promise<void>` — runs `SELECT 1` with exponential backoff (3 attempts, up to 5s). Mirrors `tcp-ingestion`'s `connectRedis` pattern. Calls `process.exit(1)` on final failure.
- `src/db/migrations/0001_positions.sql` containing:
  - `CREATE EXTENSION IF NOT EXISTS timescaledb;` (no-op if already enabled)
  - `CREATE TABLE IF NOT EXISTS positions (...)` per the schema below
  - `SELECT create_hypertable('positions', 'ts', if_not_exists => TRUE, chunk_time_interval => INTERVAL '1 day');`
  - `CREATE UNIQUE INDEX IF NOT EXISTS positions_device_ts ON positions (device_id, ts);`
  - `CREATE INDEX IF NOT EXISTS positions_ts ON positions (ts DESC);`
- `src/db/migrate.ts` — minimal runner that executes pending migration files in order. Tracks applied migrations in a `schema_migrations(version, applied_at)` table. Idempotent. Called from `main.ts` before the consumer starts.
- `test/db/migrate.test.ts` covering: applying a fresh migration; applying twice is a no-op; bad SQL fails loudly.

## Specification

### `positions` table schema

```sql
CREATE TABLE IF NOT EXISTS positions (
  device_id    text        NOT NULL,
  ts           timestamptz NOT NULL,        -- canonical event time from device GPS
  ingested_at  timestamptz NOT NULL DEFAULT now(),  -- when Processor wrote the row
  latitude     double precision NOT NULL,
  longitude    double precision NOT NULL,
  altitude     real        NOT NULL,
  angle        real        NOT NULL,
  speed        real        NOT NULL,
  satellites   smallint    NOT NULL,
  priority     smallint    NOT NULL,
  codec        text        NOT NULL,        -- '8' | '8E' | '16'
  attributes   jsonb       NOT NULL         -- the IO bag, sentinel-decoded
);
```

### Why these column types

- `device_id text` — IMEIs are 15 ASCII digits. Could be `bigint`, but `text` keeps the door open for non-IMEI device identifiers (future vendors) and avoids leading-zero loss.
- `ts timestamptz` — the **device-reported** time, not ingestion time. This is the hypertable partitioning column.
- `ingested_at timestamptz` — diagnostic: helps spot devices with clock skew or buffered records (the 55-record buffer flush we saw on stage). Not part of the natural key.
- `altitude/angle/speed real` — float32 is plenty; saves space on a high-volume table.
- `attributes jsonb` — preserves the IO bag verbatim. Per the design rule, no naming or unit conversion happens here; that's Phase 2 in `src/domain/`.

### bigint and Buffer attributes — JSONB encoding

The codec (task 1.2) decodes `__bigint` to `bigint` and `__buffer_b64` to `Buffer`. Postgres `jsonb` is JSON, so we re-encode for storage:
- `bigint` → JSON number if it fits in `Number.MAX_SAFE_INTEGER`, else JSON string. Always store as a string is simpler and unambiguous; **decision: always string for bigint**.
- `Buffer` → base64 string.

**Re-encoding loses the type tag.** Phase 2 IO interpretation (per-model mapping table) is responsible for knowing that `attributes.io_240` is a u64 stored as a string. Phase 1 doesn't need to query individual attributes — it's pass-through storage.

If this becomes painful later, options to revisit: a separate `attributes_typed` column with structured shape; or store bigints as `numeric` and Buffers as `bytea` in dedicated columns. **Defer** — 80% of attributes are small ints, and the simple string approach unblocks Phase 1.

### Migration runner

Follow the simplest possible pattern. The runner:
1. `CREATE TABLE IF NOT EXISTS schema_migrations (version text PRIMARY KEY, applied_at timestamptz NOT NULL DEFAULT now())`.
2. Lists `*.sql` files in `src/db/migrations/` sorted by filename.
3. For each, `SELECT 1 FROM schema_migrations WHERE version = $1`. If absent, run the SQL inside a transaction and insert the row.
4. Logs each applied or skipped migration at `info`.

Do **not** introduce a heavy framework (Knex, node-pg-migrate). The Processor has one migration file in Phase 1 — a 30-line runner is the right answer.

## Acceptance criteria

- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
- [ ] Integration test (testcontainers TimescaleDB): apply migration; insert a row with a bigint-as-string attribute; query it back; verify shape.
- [ ] Re-running the migration on an already-migrated database is a no-op.
- [ ] `connectWithRetry` retries 3 times with exponential backoff, then calls `process.exit(1)`. Verify with a unit test using a fake Pool.

## Risks / open questions

- **TimescaleDB extension availability.** The `deploy/` repo's Postgres container must be the `timescale/timescaledb` image, not stock `postgres`. Document this explicitly in the deploy README when Phase 1 ships. Fall back to a regular table (no hypertable) if the extension is unavailable: `create_hypertable` will error, but the `IF NOT EXISTS` table creation succeeds. The performance falls off a cliff at scale, but functional.
- **Schema authority overlap with Directus.** Directus also speaks Postgres. When Directus connects and introspects the schema, it will see the `positions` table created by Processor. That's fine — Directus can reflect tables it didn't create. But if an operator later modifies `positions` from the Directus admin UI, the migration may break. Document: `positions` is a Processor-owned table; do not edit from Directus.

## Done

(Fill in once complete: commit SHA, brief notes.)