Task 1.3 — Initial migrations

Three SQL files under db-init/ create the schema processor writes
against. All three apply cleanly via apply-db-init.sh, are idempotent
on re-run, and end with assertion blocks that catch silent
schema drift.

001_extensions.sql — registers timescaledb on the directus database.
  PostGIS deferred to Phase 2 (per Plan A). The timescaledb-ha image
  pre-creates the extension at DB init, so the IF NOT EXISTS guard
  fires as a NOTICE — expected and harmless.

002_positions_hypertable.sql — positions hypertable, exact
  column-by-column match against processor/src/db/migrations/0001_positions.sql.

  Cross-checking against processor surfaced 8 divergences from the
  original task spec; processor wins in every case (it is the writer
  and is in production). The corrections:

    - added ingested_at timestamptz NOT NULL DEFAULT now()
    - added codec text NOT NULL
    - altitude/angle/speed: real NOT NULL (not DOUBLE PRECISION nullable)
    - satellites/priority: NOT NULL
    - removed attributes DEFAULT '{}'::jsonb (processor always writes)
    - replaced PRIMARY KEY with UNIQUE INDEX positions_device_ts
      (idiomatic for TimescaleDB hypertables)
    - chunk interval 1 day, not 7 days
    - two indexes (positions_device_ts + positions_ts), not one composite

  Without these corrections every processor INSERT would have failed
  with NOT NULL violations. Spec deliverables section updated to
  reflect the correct shape so future readers see the right schema.

003_faulty_column.sql — adds the operator-controlled faulty boolean
  flag plus the partial index positions_faulty_idx ON (device_id,
  ts DESC) WHERE faulty = FALSE. The column is set only via Directus
  admin (Phase 4 permissions); processor's writer never touches it.
  The partial index optimises the hot-path read pattern (every
  processor evaluator filters faulty = FALSE); operator queries that
  look at faulty rows specifically use the broader positions_device_ts
  index from 002.

Live-verified 2026-05-01:
  - First apply: 3 applied, 0 skipped, exit 0.
  - Re-run: 0 applied, 3 skipped, exit 0.
  - All 13 columns present with correct types/nullability/defaults.
  - Hypertable registered with 1-day chunk interval.
  - Three expected indexes present.

Non-blocking observation: TimescaleDB's create_hypertable()
auto-created a fourth index (positions_ts_idx) duplicating our
explicit positions_ts. Processor's migration has the same redundancy
so stage already lives with this. Cleanup path documented in the
task spec for Phase 3 hardening (create_default_indexes => FALSE
in the create_hypertable call).

ROADMAP marks 1.3 done; 1.4 next.
This commit is contained in:
2026-05-01 22:51:34 +02:00
parent dec2d190ce
commit 25a9731070
5 changed files with 514 additions and 22 deletions
+32
View File
@@ -0,0 +1,32 @@
-- 001_extensions.sql
-- Registers the TimescaleDB extension on the directus database.
--
-- What this file does:
-- Enables TimescaleDB so that migration 002 can call create_hypertable().
-- CASCADE is included so any required dependency extensions install
-- transparently; for TimescaleDB on timescaledb-ha there are none, but
-- the clause is harmless and future-proof.
--
-- PostGIS is intentionally NOT registered here.
-- Decision: PostGIS lands in a separate migration when Phase 2
-- (geofences, SLZs, waypoints) is implemented. The timescaledb-ha image
-- ships the PostGIS binaries, so the binary is present; it just is not
-- registered on this database yet. The boot-time "PostGIS isn't installed"
-- warning from the processor service is benign and expected during Phase 1.
--
-- Idempotency: IF NOT EXISTS makes this a no-op if timescaledb is already
-- registered. Running this file twice produces no error.
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
-- Assertion: verify the extension is actually present after the statement
-- above. If the CREATE EXTENSION silently failed for any reason (e.g. binary
-- missing from the image), this block halts boot with an actionable message
-- rather than letting migration 002 fail with a less-obvious error.
DO $$ BEGIN
IF NOT EXISTS (
SELECT 1 FROM pg_extension WHERE extname = 'timescaledb'
) THEN
RAISE EXCEPTION 'timescaledb extension was not created — check that the timescaledb-ha image is being used and the binary is present';
END IF;
END $$;
+273
View File
@@ -0,0 +1,273 @@
-- 002_positions_hypertable.sql
-- Creates the positions hypertable. This is the canonical positions schema
-- as of Phase 1. Future shape changes go through NEW numbered migration files;
-- never edit this file after it has been applied (the runner's checksum guard
-- will detect the edit and halt boot with exit code 2).
--
-- Schema authority: after this migration lands, directus/db-init/ is the
-- canonical definition of this table for operational purposes. The processor
-- service is the sole writer; Directus reads positions and manages the faulty
-- flag (added in migration 003). Do NOT alter this table from the Directus
-- admin UI — hypertable DDL must go through db-init migrations.
--
-- Cross-checked against:
-- processor/src/db/migrations/0001_positions.sql
-- Cross-check date: 2026-05-01
--
-- The processor migration is the ground truth for column names, types, and
-- nullability. Discrepancies between that file and the task spec
-- (03-initial-migrations.md) are documented below. The processor file wins
-- in all cases.
--
-- DIVERGENCES FROM 03-initial-migrations.md (task spec) — read before review:
--
-- 1. ingested_at (timestamptz NOT NULL DEFAULT now())
-- Spec: column not listed.
-- Processor migration: present, NOT NULL, DEFAULT now().
-- Resolution: included here to match what processor writes. Omitting it
-- would cause NOT NULL violations on every processor insert.
-- RECOMMENDATION: add ingested_at to 03-initial-migrations.md deliverables.
--
-- 2. altitude / angle / speed — type and nullability
-- Spec: DOUBLE PRECISION, nullable.
-- Processor migration: real NOT NULL (float32, not float64; NOT NULL).
-- Resolution: real NOT NULL matches the processor writer. Using
-- DOUBLE PRECISION would not cause failures (Postgres widens silently on
-- insert) but would waste storage for no gain. Nullable columns would
-- allow NULLs processor never writes and complicate query plans.
-- RECOMMENDATION: update spec to real NOT NULL for these three columns.
--
-- 3. satellites / priority — nullability
-- Spec: SMALLINT (nullable).
-- Processor migration: smallint NOT NULL.
-- Resolution: NOT NULL matches what the processor always writes.
-- RECOMMENDATION: update spec to NOT NULL for these columns.
--
-- 4. codec (text NOT NULL)
-- Spec: column not listed.
-- Processor migration: present, NOT NULL.
-- Resolution: included here. Omitting it causes NOT NULL failures on
-- processor inserts (codec is always populated — e.g. "codec8",
-- "codec8ext", "codec16").
-- RECOMMENDATION: add codec to 03-initial-migrations.md deliverables.
--
-- 5. attributes DEFAULT
-- Spec: JSONB NOT NULL DEFAULT '{}'::jsonb.
-- Processor migration: jsonb NOT NULL (no DEFAULT).
-- Resolution: the task brief says "spec and processor disagree → processor
-- wins." Processor always writes attributes (never relies on a default),
-- so the column is declared NOT NULL with no DEFAULT here. A DEFAULT is
-- harmless but misleading; the processor writer never omits this field.
-- RECOMMENDATION: remove the default from the spec or leave it and
-- acknowledge it is unused by the writer.
--
-- 6. PRIMARY KEY (device_id, ts) vs. UNIQUE INDEX
-- Spec: PRIMARY KEY (device_id, ts).
-- Processor migration: NO primary key; uses a separate
-- CREATE UNIQUE INDEX IF NOT EXISTS positions_device_ts ON positions (device_id, ts).
-- Resolution: TimescaleDB strongly discourages PRIMARY KEY on the
-- partition column (ts) — it requires ts to be part of every unique
-- constraint, which is true here, but the physical enforcement in
-- TimescaleDB is via unique index per chunk, not a table-level PK
-- constraint. The processor migration's approach (unique index, no PK)
-- is idiomatic for hypertables. This migration follows the processor:
-- no PRIMARY KEY, unique index instead.
-- RECOMMENDATION: change spec to use UNIQUE INDEX, not PRIMARY KEY.
--
-- 7. Chunk interval: INTERVAL '1 day' vs. INTERVAL '7 days'
-- Spec: INTERVAL '7 days'.
-- Processor migration: INTERVAL '1 day'.
-- Resolution: processor migration wins. GPS telemetry at 1-60 second
-- intervals from hundreds of devices makes 1-day chunks a better fit
-- for range queries that span hours-to-days. 7-day chunks would create
-- much larger per-chunk indexes and slower chunk exclusion.
-- RECOMMENDATION: change spec to INTERVAL '1 day'.
--
-- 8. Index shape
-- Spec: positions_device_ts_idx ON positions (device_id, ts DESC).
-- Processor migration: positions_device_ts ON positions (device_id, ts)
-- [ascending, no DESC] + positions_ts ON positions (ts DESC).
-- Resolution: two indexes are created here matching the processor
-- migration. The spec's single (device_id, ts DESC) composite is
-- not equivalent — it does not cover the (ts DESC) range-scan pattern
-- used by global timestamp queries.
-- RECOMMENDATION: update spec to list both indexes.
CREATE TABLE IF NOT EXISTS positions (
device_id text NOT NULL,
ts timestamptz NOT NULL,
ingested_at timestamptz NOT NULL DEFAULT now(),
latitude double precision NOT NULL,
longitude double precision NOT NULL,
altitude real NOT NULL,
angle real NOT NULL,
speed real NOT NULL,
satellites smallint NOT NULL,
priority smallint NOT NULL,
codec text NOT NULL,
attributes jsonb NOT NULL
);
-- Convert to a TimescaleDB hypertable partitioned by event time (ts).
-- chunk_time_interval = 1 day: appropriate for GPS telemetry where queries
-- span hours-to-days and devices send at 1-60 second intervals. Tunable
-- in a future migration but NOT via editing this file (checksum guard).
-- if_not_exists = TRUE: no-op if a stage environment already has the
-- hypertable; the chunk interval cannot be retroactively changed via this
-- call — that is an accepted known divergence documented in 03-initial-
-- migrations.md under Risks.
SELECT create_hypertable(
'positions',
'ts',
if_not_exists => TRUE,
chunk_time_interval => INTERVAL '1 day'
);
-- Unique index enforcing the natural key for idempotent upserts.
-- The processor writer uses ON CONFLICT (device_id, ts) DO NOTHING.
-- TimescaleDB's idiomatic pattern is a unique index (not a PRIMARY KEY
-- constraint) on the hypertable partition column — see divergence note 6.
CREATE UNIQUE INDEX IF NOT EXISTS positions_device_ts
ON positions (device_id, ts);
-- Descending ts index for range queries scanning the most recent positions
-- first (e.g. "latest N positions" queries and time-bounded aggregations).
CREATE INDEX IF NOT EXISTS positions_ts
ON positions (ts DESC);
-- -------------------------------------------------------------------------
-- Assertion block: verify the table and its shape after the statements above.
-- If any assertion fails, RAISE EXCEPTION halts boot immediately. The operator
-- gets an actionable error message naming the offending column/constraint.
-- This catches the case where a stage environment has the table but with
-- subtly different column types (the CREATE TABLE IF NOT EXISTS above is a
-- no-op against an existing table — silent drift without this block).
-- -------------------------------------------------------------------------
DO $$ DECLARE
_hypertable_count int;
_col_type text;
BEGIN
-- 1. Table exists
IF NOT EXISTS (
SELECT 1 FROM information_schema.tables
WHERE table_schema = 'public' AND table_name = 'positions'
) THEN
RAISE EXCEPTION 'positions table does not exist after migration 002';
END IF;
-- 2. Hypertable registered
SELECT count(*) INTO _hypertable_count
FROM timescaledb_information.hypertables
WHERE hypertable_schema = 'public' AND hypertable_name = 'positions';
IF _hypertable_count = 0 THEN
RAISE EXCEPTION 'positions is not a hypertable — create_hypertable() may have failed silently';
END IF;
-- 3. Column assertions — one per critical column
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'device_id';
IF _col_type IS DISTINCT FROM 'text' THEN
RAISE EXCEPTION 'positions.device_id expected type text, found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'ts';
IF _col_type IS DISTINCT FROM 'timestamp with time zone' THEN
RAISE EXCEPTION 'positions.ts expected type "timestamp with time zone", found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'ingested_at';
IF _col_type IS DISTINCT FROM 'timestamp with time zone' THEN
RAISE EXCEPTION 'positions.ingested_at expected type "timestamp with time zone", found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'latitude';
IF _col_type IS DISTINCT FROM 'double precision' THEN
RAISE EXCEPTION 'positions.latitude expected type "double precision", found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'longitude';
IF _col_type IS DISTINCT FROM 'double precision' THEN
RAISE EXCEPTION 'positions.longitude expected type "double precision", found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'altitude';
IF _col_type IS DISTINCT FROM 'real' THEN
RAISE EXCEPTION 'positions.altitude expected type real, found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'angle';
IF _col_type IS DISTINCT FROM 'real' THEN
RAISE EXCEPTION 'positions.angle expected type real, found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'speed';
IF _col_type IS DISTINCT FROM 'real' THEN
RAISE EXCEPTION 'positions.speed expected type real, found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'satellites';
IF _col_type IS DISTINCT FROM 'smallint' THEN
RAISE EXCEPTION 'positions.satellites expected type smallint, found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'priority';
IF _col_type IS DISTINCT FROM 'smallint' THEN
RAISE EXCEPTION 'positions.priority expected type smallint, found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'codec';
IF _col_type IS DISTINCT FROM 'text' THEN
RAISE EXCEPTION 'positions.codec expected type text, found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'attributes';
IF _col_type IS DISTINCT FROM 'jsonb' THEN
RAISE EXCEPTION 'positions.attributes expected type jsonb, found %', coalesce(_col_type, 'MISSING');
END IF;
-- 4. Unique index on (device_id, ts)
IF NOT EXISTS (
SELECT 1 FROM pg_indexes
WHERE schemaname = 'public'
AND tablename = 'positions'
AND indexname = 'positions_device_ts'
) THEN
RAISE EXCEPTION 'unique index positions_device_ts is missing from positions';
END IF;
-- 5. Descending ts index
IF NOT EXISTS (
SELECT 1 FROM pg_indexes
WHERE schemaname = 'public'
AND tablename = 'positions'
AND indexname = 'positions_ts'
) THEN
RAISE EXCEPTION 'index positions_ts is missing from positions';
END IF;
END $$;
+106
View File
@@ -0,0 +1,106 @@
-- 003_faulty_column.sql
-- Adds the operator-controlled faulty flag to the positions hypertable,
-- plus the partial index that optimises the processor's hot-path read.
--
-- Why this is a separate file from 002:
-- The faulty column is an operator-plane concern layered on top of the
-- hypertable's initial shape. Keeping it in its own migration makes the
-- evolution visible in git history — an operator or analyst reading the
-- migration log can see that positions started as a pure telemetry store
-- and the quality-control flag was added as a deliberate, dated step.
-- It also keeps migration 002 as the authoritative "what processor writes"
-- definition, uncluttered by the downstream read/flag concern.
--
-- Column semantics:
-- faulty BOOLEAN NOT NULL DEFAULT FALSE
-- The column is operator-controlled; it is NEVER set by the [[processor]]
-- writer. The processor always inserts rows with faulty = FALSE (via the
-- DEFAULT — the column is intentionally omitted from INSERT statements).
-- A track operator flips the flag through [[directus]] when a recorded
-- position is unrealistic (jumpy GPS, impossible speed/coordinate).
-- When the flag is set, Directus emits a webhook to the
-- recompute:requests Redis stream; the processor re-evaluates any
-- entry_penalties whose window overlaps the flagged position's timestamp.
--
-- Index strategy:
-- positions_faulty_idx is a PARTIAL index covering only rows where
-- faulty = FALSE. This matches the processor's hot-path read pattern:
-- all evaluators (peak-speed, crossing detection, replay recompute) filter
-- WHERE faulty = FALSE. The partial index is smaller than a full index,
-- fits better in shared_buffers, and is never consulted for operator
-- queries that explicitly look at faulty rows — those use the broader
-- positions_device_ts index from migration 002.
ALTER TABLE positions
ADD COLUMN IF NOT EXISTS faulty boolean NOT NULL DEFAULT FALSE;
-- Partial index: covers the processor's standard read path (faulty = FALSE).
-- Column order (device_id, ts DESC) supports per-device range queries
-- returning most-recent-first, which is the dominant access pattern for
-- peak-speed evaluation and crossing detection.
CREATE INDEX IF NOT EXISTS positions_faulty_idx
ON positions (device_id, ts DESC)
WHERE faulty = FALSE;
-- -------------------------------------------------------------------------
-- Assertion block: verify the column and index are present with the expected
-- shape. Catches drift where stage already has a faulty column but with a
-- different type or a missing DEFAULT (ADD COLUMN IF NOT EXISTS is a no-op
-- against an existing column regardless of its definition).
-- -------------------------------------------------------------------------
DO $$ DECLARE
_col_type text;
_col_notnull boolean;
_col_default text;
BEGIN
-- 1. Column exists with correct type
SELECT data_type, is_nullable = 'NO', column_default
INTO _col_type, _col_notnull, _col_default
FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = 'positions'
AND column_name = 'faulty';
IF _col_type IS NULL THEN
RAISE EXCEPTION 'positions.faulty column is missing after migration 003';
END IF;
IF _col_type IS DISTINCT FROM 'boolean' THEN
RAISE EXCEPTION 'positions.faulty expected type boolean, found %', _col_type;
END IF;
IF NOT _col_notnull THEN
RAISE EXCEPTION 'positions.faulty must be NOT NULL but is nullable — schema drift';
END IF;
-- DEFAULT is stored as a normalised expression string; Postgres represents
-- FALSE as 'false' in information_schema (lower-case, no quotes).
IF _col_default IS DISTINCT FROM 'false' THEN
RAISE EXCEPTION 'positions.faulty expected DEFAULT false, found %', coalesce(_col_default, 'NULL (no default)');
END IF;
-- 2. Partial index exists
IF NOT EXISTS (
SELECT 1 FROM pg_indexes
WHERE schemaname = 'public'
AND tablename = 'positions'
AND indexname = 'positions_faulty_idx'
) THEN
RAISE EXCEPTION 'partial index positions_faulty_idx is missing from positions';
END IF;
-- 3. Partial index has the expected WHERE predicate
-- pg_indexes.indexdef includes the full CREATE INDEX statement as text;
-- the predicate appears as "WHERE (faulty = false)".
IF NOT EXISTS (
SELECT 1 FROM pg_indexes
WHERE schemaname = 'public'
AND tablename = 'positions'
AND indexname = 'positions_faulty_idx'
AND indexdef ILIKE '%where (faulty = false)%'
) THEN
RAISE EXCEPTION 'positions_faulty_idx exists but does not have the expected predicate "WHERE (faulty = false)" — check indexdef in pg_indexes';
END IF;
END $$;