Files
directus/db-init/002_positions_hypertable.sql
julian 25a9731070 Task 1.3 — Initial migrations
Three SQL files under db-init/ create the schema processor writes
against. All three apply cleanly via apply-db-init.sh, are idempotent
on re-run, and end with assertion blocks that catch silent
schema drift.

001_extensions.sql — registers timescaledb on the directus database.
  PostGIS deferred to Phase 2 (per Plan A). The timescaledb-ha image
  pre-creates the extension at DB init, so the IF NOT EXISTS guard
  fires as a NOTICE — expected and harmless.

002_positions_hypertable.sql — positions hypertable, exact
  column-by-column match against processor/src/db/migrations/0001_positions.sql.

  Cross-checking against processor surfaced 8 divergences from the
  original task spec; processor wins in every case (it is the writer
  and is in production). The corrections:

    - added ingested_at timestamptz NOT NULL DEFAULT now()
    - added codec text NOT NULL
    - altitude/angle/speed: real NOT NULL (not DOUBLE PRECISION nullable)
    - satellites/priority: NOT NULL
    - removed attributes DEFAULT '{}'::jsonb (processor always writes)
    - replaced PRIMARY KEY with UNIQUE INDEX positions_device_ts
      (idiomatic for TimescaleDB hypertables)
    - chunk interval 1 day, not 7 days
    - two indexes (positions_device_ts + positions_ts), not one composite

  Without these corrections every processor INSERT would have failed
  with NOT NULL violations. Spec deliverables section updated to
  reflect the correct shape so future readers see the right schema.

003_faulty_column.sql — adds the operator-controlled faulty boolean
  flag plus the partial index positions_faulty_idx ON (device_id,
  ts DESC) WHERE faulty = FALSE. The column is set only via Directus
  admin (Phase 4 permissions); processor's writer never touches it.
  The partial index optimises the hot-path read pattern (every
  processor evaluator filters faulty = FALSE); operator queries that
  look at faulty rows specifically use the broader positions_device_ts
  index from 002.

Live-verified 2026-05-01:
  - First apply: 3 applied, 0 skipped, exit 0.
  - Re-run: 0 applied, 3 skipped, exit 0.
  - All 13 columns present with correct types/nullability/defaults.
  - Hypertable registered with 1-day chunk interval.
  - Three expected indexes present.

Non-blocking observation: TimescaleDB's create_hypertable()
auto-created a fourth index (positions_ts_idx) duplicating our
explicit positions_ts. Processor's migration has the same redundancy
so stage already lives with this. Cleanup path documented in the
task spec for Phase 3 hardening (create_default_indexes => FALSE
in the create_hypertable call).

ROADMAP marks 1.3 done; 1.4 next.
2026-05-01 22:52:06 +02:00

274 lines
12 KiB
SQL

-- 002_positions_hypertable.sql
-- Creates the positions hypertable. This is the canonical positions schema
-- as of Phase 1. Future shape changes go through NEW numbered migration files;
-- never edit this file after it has been applied (the runner's checksum guard
-- will detect the edit and halt boot with exit code 2).
--
-- Schema authority: after this migration lands, directus/db-init/ is the
-- canonical definition of this table for operational purposes. The processor
-- service is the sole writer; Directus reads positions and manages the faulty
-- flag (added in migration 003). Do NOT alter this table from the Directus
-- admin UI — hypertable DDL must go through db-init migrations.
--
-- Cross-checked against:
-- processor/src/db/migrations/0001_positions.sql
-- Cross-check date: 2026-05-01
--
-- The processor migration is the ground truth for column names, types, and
-- nullability. Discrepancies between that file and the task spec
-- (03-initial-migrations.md) are documented below. The processor file wins
-- in all cases.
--
-- DIVERGENCES FROM 03-initial-migrations.md (task spec) — read before review:
--
-- 1. ingested_at (timestamptz NOT NULL DEFAULT now())
-- Spec: column not listed.
-- Processor migration: present, NOT NULL, DEFAULT now().
-- Resolution: included here to match what processor writes. Omitting it
-- would cause NOT NULL violations on every processor insert.
-- RECOMMENDATION: add ingested_at to 03-initial-migrations.md deliverables.
--
-- 2. altitude / angle / speed — type and nullability
-- Spec: DOUBLE PRECISION, nullable.
-- Processor migration: real NOT NULL (float32, not float64; NOT NULL).
-- Resolution: real NOT NULL matches the processor writer. Using
-- DOUBLE PRECISION would not cause failures (Postgres widens silently on
-- insert) but would waste storage for no gain. Nullable columns would
-- allow NULLs processor never writes and complicate query plans.
-- RECOMMENDATION: update spec to real NOT NULL for these three columns.
--
-- 3. satellites / priority — nullability
-- Spec: SMALLINT (nullable).
-- Processor migration: smallint NOT NULL.
-- Resolution: NOT NULL matches what the processor always writes.
-- RECOMMENDATION: update spec to NOT NULL for these columns.
--
-- 4. codec (text NOT NULL)
-- Spec: column not listed.
-- Processor migration: present, NOT NULL.
-- Resolution: included here. Omitting it causes NOT NULL failures on
-- processor inserts (codec is always populated — e.g. "codec8",
-- "codec8ext", "codec16").
-- RECOMMENDATION: add codec to 03-initial-migrations.md deliverables.
--
-- 5. attributes DEFAULT
-- Spec: JSONB NOT NULL DEFAULT '{}'::jsonb.
-- Processor migration: jsonb NOT NULL (no DEFAULT).
-- Resolution: the task brief says "spec and processor disagree → processor
-- wins." Processor always writes attributes (never relies on a default),
-- so the column is declared NOT NULL with no DEFAULT here. A DEFAULT is
-- harmless but misleading; the processor writer never omits this field.
-- RECOMMENDATION: remove the default from the spec or leave it and
-- acknowledge it is unused by the writer.
--
-- 6. PRIMARY KEY (device_id, ts) vs. UNIQUE INDEX
-- Spec: PRIMARY KEY (device_id, ts).
-- Processor migration: NO primary key; uses a separate
-- CREATE UNIQUE INDEX IF NOT EXISTS positions_device_ts ON positions (device_id, ts).
-- Resolution: TimescaleDB strongly discourages PRIMARY KEY on the
-- partition column (ts) — it requires ts to be part of every unique
-- constraint, which is true here, but the physical enforcement in
-- TimescaleDB is via unique index per chunk, not a table-level PK
-- constraint. The processor migration's approach (unique index, no PK)
-- is idiomatic for hypertables. This migration follows the processor:
-- no PRIMARY KEY, unique index instead.
-- RECOMMENDATION: change spec to use UNIQUE INDEX, not PRIMARY KEY.
--
-- 7. Chunk interval: INTERVAL '1 day' vs. INTERVAL '7 days'
-- Spec: INTERVAL '7 days'.
-- Processor migration: INTERVAL '1 day'.
-- Resolution: processor migration wins. GPS telemetry at 1-60 second
-- intervals from hundreds of devices makes 1-day chunks a better fit
-- for range queries that span hours-to-days. 7-day chunks would create
-- much larger per-chunk indexes and slower chunk exclusion.
-- RECOMMENDATION: change spec to INTERVAL '1 day'.
--
-- 8. Index shape
-- Spec: positions_device_ts_idx ON positions (device_id, ts DESC).
-- Processor migration: positions_device_ts ON positions (device_id, ts)
-- [ascending, no DESC] + positions_ts ON positions (ts DESC).
-- Resolution: two indexes are created here matching the processor
-- migration. The spec's single (device_id, ts DESC) composite is
-- not equivalent — it does not cover the (ts DESC) range-scan pattern
-- used by global timestamp queries.
-- RECOMMENDATION: update spec to list both indexes.
CREATE TABLE IF NOT EXISTS positions (
device_id text NOT NULL,
ts timestamptz NOT NULL,
ingested_at timestamptz NOT NULL DEFAULT now(),
latitude double precision NOT NULL,
longitude double precision NOT NULL,
altitude real NOT NULL,
angle real NOT NULL,
speed real NOT NULL,
satellites smallint NOT NULL,
priority smallint NOT NULL,
codec text NOT NULL,
attributes jsonb NOT NULL
);
-- Convert to a TimescaleDB hypertable partitioned by event time (ts).
-- chunk_time_interval = 1 day: appropriate for GPS telemetry where queries
-- span hours-to-days and devices send at 1-60 second intervals. Tunable
-- in a future migration but NOT via editing this file (checksum guard).
-- if_not_exists = TRUE: no-op if a stage environment already has the
-- hypertable; the chunk interval cannot be retroactively changed via this
-- call — that is an accepted known divergence documented in 03-initial-
-- migrations.md under Risks.
SELECT create_hypertable(
'positions',
'ts',
if_not_exists => TRUE,
chunk_time_interval => INTERVAL '1 day'
);
-- Unique index enforcing the natural key for idempotent upserts.
-- The processor writer uses ON CONFLICT (device_id, ts) DO NOTHING.
-- TimescaleDB's idiomatic pattern is a unique index (not a PRIMARY KEY
-- constraint) on the hypertable partition column — see divergence note 6.
CREATE UNIQUE INDEX IF NOT EXISTS positions_device_ts
ON positions (device_id, ts);
-- Descending ts index for range queries scanning the most recent positions
-- first (e.g. "latest N positions" queries and time-bounded aggregations).
CREATE INDEX IF NOT EXISTS positions_ts
ON positions (ts DESC);
-- -------------------------------------------------------------------------
-- Assertion block: verify the table and its shape after the statements above.
-- If any assertion fails, RAISE EXCEPTION halts boot immediately. The operator
-- gets an actionable error message naming the offending column/constraint.
-- This catches the case where a stage environment has the table but with
-- subtly different column types (the CREATE TABLE IF NOT EXISTS above is a
-- no-op against an existing table — silent drift without this block).
-- -------------------------------------------------------------------------
DO $$ DECLARE
_hypertable_count int;
_col_type text;
BEGIN
-- 1. Table exists
IF NOT EXISTS (
SELECT 1 FROM information_schema.tables
WHERE table_schema = 'public' AND table_name = 'positions'
) THEN
RAISE EXCEPTION 'positions table does not exist after migration 002';
END IF;
-- 2. Hypertable registered
SELECT count(*) INTO _hypertable_count
FROM timescaledb_information.hypertables
WHERE hypertable_schema = 'public' AND hypertable_name = 'positions';
IF _hypertable_count = 0 THEN
RAISE EXCEPTION 'positions is not a hypertable — create_hypertable() may have failed silently';
END IF;
-- 3. Column assertions — one per critical column
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'device_id';
IF _col_type IS DISTINCT FROM 'text' THEN
RAISE EXCEPTION 'positions.device_id expected type text, found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'ts';
IF _col_type IS DISTINCT FROM 'timestamp with time zone' THEN
RAISE EXCEPTION 'positions.ts expected type "timestamp with time zone", found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'ingested_at';
IF _col_type IS DISTINCT FROM 'timestamp with time zone' THEN
RAISE EXCEPTION 'positions.ingested_at expected type "timestamp with time zone", found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'latitude';
IF _col_type IS DISTINCT FROM 'double precision' THEN
RAISE EXCEPTION 'positions.latitude expected type "double precision", found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'longitude';
IF _col_type IS DISTINCT FROM 'double precision' THEN
RAISE EXCEPTION 'positions.longitude expected type "double precision", found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'altitude';
IF _col_type IS DISTINCT FROM 'real' THEN
RAISE EXCEPTION 'positions.altitude expected type real, found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'angle';
IF _col_type IS DISTINCT FROM 'real' THEN
RAISE EXCEPTION 'positions.angle expected type real, found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'speed';
IF _col_type IS DISTINCT FROM 'real' THEN
RAISE EXCEPTION 'positions.speed expected type real, found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'satellites';
IF _col_type IS DISTINCT FROM 'smallint' THEN
RAISE EXCEPTION 'positions.satellites expected type smallint, found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'priority';
IF _col_type IS DISTINCT FROM 'smallint' THEN
RAISE EXCEPTION 'positions.priority expected type smallint, found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'codec';
IF _col_type IS DISTINCT FROM 'text' THEN
RAISE EXCEPTION 'positions.codec expected type text, found %', coalesce(_col_type, 'MISSING');
END IF;
SELECT data_type INTO _col_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'attributes';
IF _col_type IS DISTINCT FROM 'jsonb' THEN
RAISE EXCEPTION 'positions.attributes expected type jsonb, found %', coalesce(_col_type, 'MISSING');
END IF;
-- 4. Unique index on (device_id, ts)
IF NOT EXISTS (
SELECT 1 FROM pg_indexes
WHERE schemaname = 'public'
AND tablename = 'positions'
AND indexname = 'positions_device_ts'
) THEN
RAISE EXCEPTION 'unique index positions_device_ts is missing from positions';
END IF;
-- 5. Descending ts index
IF NOT EXISTS (
SELECT 1 FROM pg_indexes
WHERE schemaname = 'public'
AND tablename = 'positions'
AND indexname = 'positions_ts'
) THEN
RAISE EXCEPTION 'index positions_ts is missing from positions';
END IF;
END $$;