Task 1.3 — Initial migrations

Three SQL files under db-init/ create the schema processor writes against. All three apply cleanly via apply-db-init.sh, are idempotent on re-run, and end with assertion blocks that catch silent schema drift. 001_extensions.sql — registers timescaledb on the directus database. PostGIS deferred to Phase 2 (per Plan A). The timescaledb-ha image pre-creates the extension at DB init, so the IF NOT EXISTS guard fires as a NOTICE — expected and harmless. 002_positions_hypertable.sql — positions hypertable, exact column-by-column match against processor/src/db/migrations/0001_positions.sql. Cross-checking against processor surfaced 8 divergences from the original task spec; processor wins in every case (it is the writer and is in production). The corrections: - added ingested_at timestamptz NOT NULL DEFAULT now() - added codec text NOT NULL - altitude/angle/speed: real NOT NULL (not DOUBLE PRECISION nullable) - satellites/priority: NOT NULL - removed attributes DEFAULT '{}'::jsonb (processor always writes) - replaced PRIMARY KEY with UNIQUE INDEX positions_device_ts (idiomatic for TimescaleDB hypertables) - chunk interval 1 day, not 7 days - two indexes (positions_device_ts + positions_ts), not one composite Without these corrections every processor INSERT would have failed with NOT NULL violations. Spec deliverables section updated to reflect the correct shape so future readers see the right schema. 003_faulty_column.sql — adds the operator-controlled faulty boolean flag plus the partial index positions_faulty_idx ON (device_id, ts DESC) WHERE faulty = FALSE. The column is set only via Directus admin (Phase 4 permissions); processor's writer never touches it. The partial index optimises the hot-path read pattern (every processor evaluator filters faulty = FALSE); operator queries that look at faulty rows specifically use the broader positions_device_ts index from 002. Live-verified 2026-05-01: - First apply: 3 applied, 0 skipped, exit 0. - Re-run: 0 applied, 3 skipped, exit 0. - All 13 columns present with correct types/nullability/defaults. - Hypertable registered with 1-day chunk interval. - Three expected indexes present. Non-blocking observation: TimescaleDB's create_hypertable() auto-created a fourth index (positions_ts_idx) duplicating our explicit positions_ts. Processor's migration has the same redundancy so stage already lives with this. Cleanup path documented in the task spec for Phase 3 hardening (create_default_indexes => FALSE in the create_hypertable call). ROADMAP marks 1.3 done; 1.4 next.
2026-05-01 22:51:34 +02:00
parent dec2d190ce
commit 25a9731070
5 changed files with 514 additions and 22 deletions
@@ -0,0 +1,32 @@
+-- 001_extensions.sql
+-- Registers the TimescaleDB extension on the directus database.
+--
+-- What this file does:
+--   Enables TimescaleDB so that migration 002 can call create_hypertable().
+--   CASCADE is included so any required dependency extensions install
+--   transparently; for TimescaleDB on timescaledb-ha there are none, but
+--   the clause is harmless and future-proof.
+--
+-- PostGIS is intentionally NOT registered here.
+--   Decision: PostGIS lands in a separate migration when Phase 2
+--   (geofences, SLZs, waypoints) is implemented. The timescaledb-ha image
+--   ships the PostGIS binaries, so the binary is present; it just is not
+--   registered on this database yet. The boot-time "PostGIS isn't installed"
+--   warning from the processor service is benign and expected during Phase 1.
+--
+-- Idempotency: IF NOT EXISTS makes this a no-op if timescaledb is already
+--   registered. Running this file twice produces no error.
+
+CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
+
+-- Assertion: verify the extension is actually present after the statement
+-- above. If the CREATE EXTENSION silently failed for any reason (e.g. binary
+-- missing from the image), this block halts boot with an actionable message
+-- rather than letting migration 002 fail with a less-obvious error.
+DO $$ BEGIN
+  IF NOT EXISTS (
+    SELECT 1 FROM pg_extension WHERE extname = 'timescaledb'
+  ) THEN
+    RAISE EXCEPTION 'timescaledb extension was not created — check that the timescaledb-ha image is being used and the binary is present';
+  END IF;
+END $$;
@@ -0,0 +1,273 @@
+-- 002_positions_hypertable.sql
+-- Creates the positions hypertable. This is the canonical positions schema
+-- as of Phase 1. Future shape changes go through NEW numbered migration files;
+-- never edit this file after it has been applied (the runner's checksum guard
+-- will detect the edit and halt boot with exit code 2).
+--
+-- Schema authority: after this migration lands, directus/db-init/ is the
+--   canonical definition of this table for operational purposes. The processor
+--   service is the sole writer; Directus reads positions and manages the faulty
+--   flag (added in migration 003). Do NOT alter this table from the Directus
+--   admin UI — hypertable DDL must go through db-init migrations.
+--
+-- Cross-checked against:
+--   processor/src/db/migrations/0001_positions.sql
+--   Cross-check date: 2026-05-01
+--
+--   The processor migration is the ground truth for column names, types, and
+--   nullability. Discrepancies between that file and the task spec
+--   (03-initial-migrations.md) are documented below. The processor file wins
+--   in all cases.
+--
+-- DIVERGENCES FROM 03-initial-migrations.md (task spec) — read before review:
+--
+--   1. ingested_at (timestamptz NOT NULL DEFAULT now())
+--      Spec: column not listed.
+--      Processor migration: present, NOT NULL, DEFAULT now().
+--      Resolution: included here to match what processor writes. Omitting it
+--      would cause NOT NULL violations on every processor insert.
+--      RECOMMENDATION: add ingested_at to 03-initial-migrations.md deliverables.
+--
+--   2. altitude / angle / speed — type and nullability
+--      Spec: DOUBLE PRECISION, nullable.
+--      Processor migration: real NOT NULL (float32, not float64; NOT NULL).
+--      Resolution: real NOT NULL matches the processor writer. Using
+--      DOUBLE PRECISION would not cause failures (Postgres widens silently on
+--      insert) but would waste storage for no gain. Nullable columns would
+--      allow NULLs processor never writes and complicate query plans.
+--      RECOMMENDATION: update spec to real NOT NULL for these three columns.
+--
+--   3. satellites / priority — nullability
+--      Spec: SMALLINT (nullable).
+--      Processor migration: smallint NOT NULL.
+--      Resolution: NOT NULL matches what the processor always writes.
+--      RECOMMENDATION: update spec to NOT NULL for these columns.
+--
+--   4. codec (text NOT NULL)
+--      Spec: column not listed.
+--      Processor migration: present, NOT NULL.
+--      Resolution: included here. Omitting it causes NOT NULL failures on
+--      processor inserts (codec is always populated — e.g. "codec8",
+--      "codec8ext", "codec16").
+--      RECOMMENDATION: add codec to 03-initial-migrations.md deliverables.
+--
+--   5. attributes DEFAULT
+--      Spec: JSONB NOT NULL DEFAULT '{}'::jsonb.
+--      Processor migration: jsonb NOT NULL (no DEFAULT).
+--      Resolution: the task brief says "spec and processor disagree → processor
+--      wins." Processor always writes attributes (never relies on a default),
+--      so the column is declared NOT NULL with no DEFAULT here. A DEFAULT is
+--      harmless but misleading; the processor writer never omits this field.
+--      RECOMMENDATION: remove the default from the spec or leave it and
+--      acknowledge it is unused by the writer.
+--
+--   6. PRIMARY KEY (device_id, ts) vs. UNIQUE INDEX
+--      Spec: PRIMARY KEY (device_id, ts).
+--      Processor migration: NO primary key; uses a separate
+--      CREATE UNIQUE INDEX IF NOT EXISTS positions_device_ts ON positions (device_id, ts).
+--      Resolution: TimescaleDB strongly discourages PRIMARY KEY on the
+--      partition column (ts) — it requires ts to be part of every unique
+--      constraint, which is true here, but the physical enforcement in
+--      TimescaleDB is via unique index per chunk, not a table-level PK
+--      constraint. The processor migration's approach (unique index, no PK)
+--      is idiomatic for hypertables. This migration follows the processor:
+--      no PRIMARY KEY, unique index instead.
+--      RECOMMENDATION: change spec to use UNIQUE INDEX, not PRIMARY KEY.
+--
+--   7. Chunk interval: INTERVAL '1 day' vs. INTERVAL '7 days'
+--      Spec: INTERVAL '7 days'.
+--      Processor migration: INTERVAL '1 day'.
+--      Resolution: processor migration wins. GPS telemetry at 1-60 second
+--      intervals from hundreds of devices makes 1-day chunks a better fit
+--      for range queries that span hours-to-days. 7-day chunks would create
+--      much larger per-chunk indexes and slower chunk exclusion.
+--      RECOMMENDATION: change spec to INTERVAL '1 day'.
+--
+--   8. Index shape
+--      Spec: positions_device_ts_idx ON positions (device_id, ts DESC).
+--      Processor migration: positions_device_ts ON positions (device_id, ts)
+--        [ascending, no DESC] + positions_ts ON positions (ts DESC).
+--      Resolution: two indexes are created here matching the processor
+--      migration. The spec's single (device_id, ts DESC) composite is
+--      not equivalent — it does not cover the (ts DESC) range-scan pattern
+--      used by global timestamp queries.
+--      RECOMMENDATION: update spec to list both indexes.
+
+CREATE TABLE IF NOT EXISTS positions (
+  device_id    text             NOT NULL,
+  ts           timestamptz      NOT NULL,
+  ingested_at  timestamptz      NOT NULL DEFAULT now(),
+  latitude     double precision NOT NULL,
+  longitude    double precision NOT NULL,
+  altitude     real             NOT NULL,
+  angle        real             NOT NULL,
+  speed        real             NOT NULL,
+  satellites   smallint         NOT NULL,
+  priority     smallint         NOT NULL,
+  codec        text             NOT NULL,
+  attributes   jsonb            NOT NULL
+);
+
+-- Convert to a TimescaleDB hypertable partitioned by event time (ts).
+--   chunk_time_interval = 1 day: appropriate for GPS telemetry where queries
+--   span hours-to-days and devices send at 1-60 second intervals. Tunable
+--   in a future migration but NOT via editing this file (checksum guard).
+--   if_not_exists = TRUE: no-op if a stage environment already has the
+--   hypertable; the chunk interval cannot be retroactively changed via this
+--   call — that is an accepted known divergence documented in 03-initial-
+--   migrations.md under Risks.
+SELECT create_hypertable(
+  'positions',
+  'ts',
+  if_not_exists => TRUE,
+  chunk_time_interval => INTERVAL '1 day'
+);
+
+-- Unique index enforcing the natural key for idempotent upserts.
+-- The processor writer uses ON CONFLICT (device_id, ts) DO NOTHING.
+-- TimescaleDB's idiomatic pattern is a unique index (not a PRIMARY KEY
+-- constraint) on the hypertable partition column — see divergence note 6.
+CREATE UNIQUE INDEX IF NOT EXISTS positions_device_ts
+  ON positions (device_id, ts);
+
+-- Descending ts index for range queries scanning the most recent positions
+-- first (e.g. "latest N positions" queries and time-bounded aggregations).
+CREATE INDEX IF NOT EXISTS positions_ts
+  ON positions (ts DESC);
+
+-- -------------------------------------------------------------------------
+-- Assertion block: verify the table and its shape after the statements above.
+-- If any assertion fails, RAISE EXCEPTION halts boot immediately. The operator
+-- gets an actionable error message naming the offending column/constraint.
+-- This catches the case where a stage environment has the table but with
+-- subtly different column types (the CREATE TABLE IF NOT EXISTS above is a
+-- no-op against an existing table — silent drift without this block).
+-- -------------------------------------------------------------------------
+DO $$ DECLARE
+  _hypertable_count int;
+  _col_type         text;
+BEGIN
+
+  -- 1. Table exists
+  IF NOT EXISTS (
+    SELECT 1 FROM information_schema.tables
+    WHERE table_schema = 'public' AND table_name = 'positions'
+  ) THEN
+    RAISE EXCEPTION 'positions table does not exist after migration 002';
+  END IF;
+
+  -- 2. Hypertable registered
+  SELECT count(*) INTO _hypertable_count
+  FROM timescaledb_information.hypertables
+  WHERE hypertable_schema = 'public' AND hypertable_name = 'positions';
+  IF _hypertable_count = 0 THEN
+    RAISE EXCEPTION 'positions is not a hypertable — create_hypertable() may have failed silently';
+  END IF;
+
+  -- 3. Column assertions — one per critical column
+
+  SELECT data_type INTO _col_type
+  FROM information_schema.columns
+  WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'device_id';
+  IF _col_type IS DISTINCT FROM 'text' THEN
+    RAISE EXCEPTION 'positions.device_id expected type text, found %', coalesce(_col_type, 'MISSING');
+  END IF;
+
+  SELECT data_type INTO _col_type
+  FROM information_schema.columns
+  WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'ts';
+  IF _col_type IS DISTINCT FROM 'timestamp with time zone' THEN
+    RAISE EXCEPTION 'positions.ts expected type "timestamp with time zone", found %', coalesce(_col_type, 'MISSING');
+  END IF;
+
+  SELECT data_type INTO _col_type
+  FROM information_schema.columns
+  WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'ingested_at';
+  IF _col_type IS DISTINCT FROM 'timestamp with time zone' THEN
+    RAISE EXCEPTION 'positions.ingested_at expected type "timestamp with time zone", found %', coalesce(_col_type, 'MISSING');
+  END IF;
+
+  SELECT data_type INTO _col_type
+  FROM information_schema.columns
+  WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'latitude';
+  IF _col_type IS DISTINCT FROM 'double precision' THEN
+    RAISE EXCEPTION 'positions.latitude expected type "double precision", found %', coalesce(_col_type, 'MISSING');
+  END IF;
+
+  SELECT data_type INTO _col_type
+  FROM information_schema.columns
+  WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'longitude';
+  IF _col_type IS DISTINCT FROM 'double precision' THEN
+    RAISE EXCEPTION 'positions.longitude expected type "double precision", found %', coalesce(_col_type, 'MISSING');
+  END IF;
+
+  SELECT data_type INTO _col_type
+  FROM information_schema.columns
+  WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'altitude';
+  IF _col_type IS DISTINCT FROM 'real' THEN
+    RAISE EXCEPTION 'positions.altitude expected type real, found %', coalesce(_col_type, 'MISSING');
+  END IF;
+
+  SELECT data_type INTO _col_type
+  FROM information_schema.columns
+  WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'angle';
+  IF _col_type IS DISTINCT FROM 'real' THEN
+    RAISE EXCEPTION 'positions.angle expected type real, found %', coalesce(_col_type, 'MISSING');
+  END IF;
+
+  SELECT data_type INTO _col_type
+  FROM information_schema.columns
+  WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'speed';
+  IF _col_type IS DISTINCT FROM 'real' THEN
+    RAISE EXCEPTION 'positions.speed expected type real, found %', coalesce(_col_type, 'MISSING');
+  END IF;
+
+  SELECT data_type INTO _col_type
+  FROM information_schema.columns
+  WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'satellites';
+  IF _col_type IS DISTINCT FROM 'smallint' THEN
+    RAISE EXCEPTION 'positions.satellites expected type smallint, found %', coalesce(_col_type, 'MISSING');
+  END IF;
+
+  SELECT data_type INTO _col_type
+  FROM information_schema.columns
+  WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'priority';
+  IF _col_type IS DISTINCT FROM 'smallint' THEN
+    RAISE EXCEPTION 'positions.priority expected type smallint, found %', coalesce(_col_type, 'MISSING');
+  END IF;
+
+  SELECT data_type INTO _col_type
+  FROM information_schema.columns
+  WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'codec';
+  IF _col_type IS DISTINCT FROM 'text' THEN
+    RAISE EXCEPTION 'positions.codec expected type text, found %', coalesce(_col_type, 'MISSING');
+  END IF;
+
+  SELECT data_type INTO _col_type
+  FROM information_schema.columns
+  WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'attributes';
+  IF _col_type IS DISTINCT FROM 'jsonb' THEN
+    RAISE EXCEPTION 'positions.attributes expected type jsonb, found %', coalesce(_col_type, 'MISSING');
+  END IF;
+
+  -- 4. Unique index on (device_id, ts)
+  IF NOT EXISTS (
+    SELECT 1 FROM pg_indexes
+    WHERE schemaname = 'public'
+      AND tablename = 'positions'
+      AND indexname = 'positions_device_ts'
+  ) THEN
+    RAISE EXCEPTION 'unique index positions_device_ts is missing from positions';
+  END IF;
+
+  -- 5. Descending ts index
+  IF NOT EXISTS (
+    SELECT 1 FROM pg_indexes
+    WHERE schemaname = 'public'
+      AND tablename = 'positions'
+      AND indexname = 'positions_ts'
+  ) THEN
+    RAISE EXCEPTION 'index positions_ts is missing from positions';
+  END IF;
+
+END $$;
@@ -0,0 +1,106 @@
+-- 003_faulty_column.sql
+-- Adds the operator-controlled faulty flag to the positions hypertable,
+-- plus the partial index that optimises the processor's hot-path read.
+--
+-- Why this is a separate file from 002:
+--   The faulty column is an operator-plane concern layered on top of the
+--   hypertable's initial shape. Keeping it in its own migration makes the
+--   evolution visible in git history — an operator or analyst reading the
+--   migration log can see that positions started as a pure telemetry store
+--   and the quality-control flag was added as a deliberate, dated step.
+--   It also keeps migration 002 as the authoritative "what processor writes"
+--   definition, uncluttered by the downstream read/flag concern.
+--
+-- Column semantics:
+--   faulty BOOLEAN NOT NULL DEFAULT FALSE
+--   The column is operator-controlled; it is NEVER set by the [[processor]]
+--   writer. The processor always inserts rows with faulty = FALSE (via the
+--   DEFAULT — the column is intentionally omitted from INSERT statements).
+--   A track operator flips the flag through [[directus]] when a recorded
+--   position is unrealistic (jumpy GPS, impossible speed/coordinate).
+--   When the flag is set, Directus emits a webhook to the
+--   recompute:requests Redis stream; the processor re-evaluates any
+--   entry_penalties whose window overlaps the flagged position's timestamp.
+--
+-- Index strategy:
+--   positions_faulty_idx is a PARTIAL index covering only rows where
+--   faulty = FALSE. This matches the processor's hot-path read pattern:
+--   all evaluators (peak-speed, crossing detection, replay recompute) filter
+--   WHERE faulty = FALSE. The partial index is smaller than a full index,
+--   fits better in shared_buffers, and is never consulted for operator
+--   queries that explicitly look at faulty rows — those use the broader
+--   positions_device_ts index from migration 002.
+
+ALTER TABLE positions
+  ADD COLUMN IF NOT EXISTS faulty boolean NOT NULL DEFAULT FALSE;
+
+-- Partial index: covers the processor's standard read path (faulty = FALSE).
+-- Column order (device_id, ts DESC) supports per-device range queries
+-- returning most-recent-first, which is the dominant access pattern for
+-- peak-speed evaluation and crossing detection.
+CREATE INDEX IF NOT EXISTS positions_faulty_idx
+  ON positions (device_id, ts DESC)
+  WHERE faulty = FALSE;
+
+-- -------------------------------------------------------------------------
+-- Assertion block: verify the column and index are present with the expected
+-- shape. Catches drift where stage already has a faulty column but with a
+-- different type or a missing DEFAULT (ADD COLUMN IF NOT EXISTS is a no-op
+-- against an existing column regardless of its definition).
+-- -------------------------------------------------------------------------
+DO $$ DECLARE
+  _col_type        text;
+  _col_notnull     boolean;
+  _col_default     text;
+BEGIN
+
+  -- 1. Column exists with correct type
+  SELECT data_type, is_nullable = 'NO', column_default
+  INTO _col_type, _col_notnull, _col_default
+  FROM information_schema.columns
+  WHERE table_schema = 'public'
+    AND table_name   = 'positions'
+    AND column_name  = 'faulty';
+
+  IF _col_type IS NULL THEN
+    RAISE EXCEPTION 'positions.faulty column is missing after migration 003';
+  END IF;
+
+  IF _col_type IS DISTINCT FROM 'boolean' THEN
+    RAISE EXCEPTION 'positions.faulty expected type boolean, found %', _col_type;
+  END IF;
+
+  IF NOT _col_notnull THEN
+    RAISE EXCEPTION 'positions.faulty must be NOT NULL but is nullable — schema drift';
+  END IF;
+
+  -- DEFAULT is stored as a normalised expression string; Postgres represents
+  -- FALSE as 'false' in information_schema (lower-case, no quotes).
+  IF _col_default IS DISTINCT FROM 'false' THEN
+    RAISE EXCEPTION 'positions.faulty expected DEFAULT false, found %', coalesce(_col_default, 'NULL (no default)');
+  END IF;
+
+  -- 2. Partial index exists
+  IF NOT EXISTS (
+    SELECT 1 FROM pg_indexes
+    WHERE schemaname = 'public'
+      AND tablename  = 'positions'
+      AND indexname  = 'positions_faulty_idx'
+  ) THEN
+    RAISE EXCEPTION 'partial index positions_faulty_idx is missing from positions';
+  END IF;
+
+  -- 3. Partial index has the expected WHERE predicate
+  --    pg_indexes.indexdef includes the full CREATE INDEX statement as text;
+  --    the predicate appears as "WHERE (faulty = false)".
+  IF NOT EXISTS (
+    SELECT 1 FROM pg_indexes
+    WHERE schemaname = 'public'
+      AND tablename  = 'positions'
+      AND indexname  = 'positions_faulty_idx'
+      AND indexdef ILIKE '%where (faulty = false)%'
+  ) THEN
+    RAISE EXCEPTION 'positions_faulty_idx exists but does not have the expected predicate "WHERE (faulty = false)" — check indexdef in pg_indexes';
+  END IF;
+
+END $$;