Files
directus/db-init/003_faulty_column.sql
T
julian 25a9731070 Task 1.3 — Initial migrations
Three SQL files under db-init/ create the schema processor writes
against. All three apply cleanly via apply-db-init.sh, are idempotent
on re-run, and end with assertion blocks that catch silent
schema drift.

001_extensions.sql — registers timescaledb on the directus database.
  PostGIS deferred to Phase 2 (per Plan A). The timescaledb-ha image
  pre-creates the extension at DB init, so the IF NOT EXISTS guard
  fires as a NOTICE — expected and harmless.

002_positions_hypertable.sql — positions hypertable, exact
  column-by-column match against processor/src/db/migrations/0001_positions.sql.

  Cross-checking against processor surfaced 8 divergences from the
  original task spec; processor wins in every case (it is the writer
  and is in production). The corrections:

    - added ingested_at timestamptz NOT NULL DEFAULT now()
    - added codec text NOT NULL
    - altitude/angle/speed: real NOT NULL (not DOUBLE PRECISION nullable)
    - satellites/priority: NOT NULL
    - removed attributes DEFAULT '{}'::jsonb (processor always writes)
    - replaced PRIMARY KEY with UNIQUE INDEX positions_device_ts
      (idiomatic for TimescaleDB hypertables)
    - chunk interval 1 day, not 7 days
    - two indexes (positions_device_ts + positions_ts), not one composite

  Without these corrections every processor INSERT would have failed
  with NOT NULL violations. Spec deliverables section updated to
  reflect the correct shape so future readers see the right schema.

003_faulty_column.sql — adds the operator-controlled faulty boolean
  flag plus the partial index positions_faulty_idx ON (device_id,
  ts DESC) WHERE faulty = FALSE. The column is set only via Directus
  admin (Phase 4 permissions); processor's writer never touches it.
  The partial index optimises the hot-path read pattern (every
  processor evaluator filters faulty = FALSE); operator queries that
  look at faulty rows specifically use the broader positions_device_ts
  index from 002.

Live-verified 2026-05-01:
  - First apply: 3 applied, 0 skipped, exit 0.
  - Re-run: 0 applied, 3 skipped, exit 0.
  - All 13 columns present with correct types/nullability/defaults.
  - Hypertable registered with 1-day chunk interval.
  - Three expected indexes present.

Non-blocking observation: TimescaleDB's create_hypertable()
auto-created a fourth index (positions_ts_idx) duplicating our
explicit positions_ts. Processor's migration has the same redundancy
so stage already lives with this. Cleanup path documented in the
task spec for Phase 3 hardening (create_default_indexes => FALSE
in the create_hypertable call).

ROADMAP marks 1.3 done; 1.4 next.
2026-05-01 22:52:06 +02:00

107 lines
4.7 KiB
SQL

-- 003_faulty_column.sql
-- Adds the operator-controlled faulty flag to the positions hypertable,
-- plus the partial index that optimises the processor's hot-path read.
--
-- Why this is a separate file from 002:
-- The faulty column is an operator-plane concern layered on top of the
-- hypertable's initial shape. Keeping it in its own migration makes the
-- evolution visible in git history — an operator or analyst reading the
-- migration log can see that positions started as a pure telemetry store
-- and the quality-control flag was added as a deliberate, dated step.
-- It also keeps migration 002 as the authoritative "what processor writes"
-- definition, uncluttered by the downstream read/flag concern.
--
-- Column semantics:
-- faulty BOOLEAN NOT NULL DEFAULT FALSE
-- The column is operator-controlled; it is NEVER set by the [[processor]]
-- writer. The processor always inserts rows with faulty = FALSE (via the
-- DEFAULT — the column is intentionally omitted from INSERT statements).
-- A track operator flips the flag through [[directus]] when a recorded
-- position is unrealistic (jumpy GPS, impossible speed/coordinate).
-- When the flag is set, Directus emits a webhook to the
-- recompute:requests Redis stream; the processor re-evaluates any
-- entry_penalties whose window overlaps the flagged position's timestamp.
--
-- Index strategy:
-- positions_faulty_idx is a PARTIAL index covering only rows where
-- faulty = FALSE. This matches the processor's hot-path read pattern:
-- all evaluators (peak-speed, crossing detection, replay recompute) filter
-- WHERE faulty = FALSE. The partial index is smaller than a full index,
-- fits better in shared_buffers, and is never consulted for operator
-- queries that explicitly look at faulty rows — those use the broader
-- positions_device_ts index from migration 002.
ALTER TABLE positions
ADD COLUMN IF NOT EXISTS faulty boolean NOT NULL DEFAULT FALSE;
-- Partial index: covers the processor's standard read path (faulty = FALSE).
-- Column order (device_id, ts DESC) supports per-device range queries
-- returning most-recent-first, which is the dominant access pattern for
-- peak-speed evaluation and crossing detection.
CREATE INDEX IF NOT EXISTS positions_faulty_idx
ON positions (device_id, ts DESC)
WHERE faulty = FALSE;
-- -------------------------------------------------------------------------
-- Assertion block: verify the column and index are present with the expected
-- shape. Catches drift where stage already has a faulty column but with a
-- different type or a missing DEFAULT (ADD COLUMN IF NOT EXISTS is a no-op
-- against an existing column regardless of its definition).
-- -------------------------------------------------------------------------
DO $$ DECLARE
_col_type text;
_col_notnull boolean;
_col_default text;
BEGIN
-- 1. Column exists with correct type
SELECT data_type, is_nullable = 'NO', column_default
INTO _col_type, _col_notnull, _col_default
FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = 'positions'
AND column_name = 'faulty';
IF _col_type IS NULL THEN
RAISE EXCEPTION 'positions.faulty column is missing after migration 003';
END IF;
IF _col_type IS DISTINCT FROM 'boolean' THEN
RAISE EXCEPTION 'positions.faulty expected type boolean, found %', _col_type;
END IF;
IF NOT _col_notnull THEN
RAISE EXCEPTION 'positions.faulty must be NOT NULL but is nullable — schema drift';
END IF;
-- DEFAULT is stored as a normalised expression string; Postgres represents
-- FALSE as 'false' in information_schema (lower-case, no quotes).
IF _col_default IS DISTINCT FROM 'false' THEN
RAISE EXCEPTION 'positions.faulty expected DEFAULT false, found %', coalesce(_col_default, 'NULL (no default)');
END IF;
-- 2. Partial index exists
IF NOT EXISTS (
SELECT 1 FROM pg_indexes
WHERE schemaname = 'public'
AND tablename = 'positions'
AND indexname = 'positions_faulty_idx'
) THEN
RAISE EXCEPTION 'partial index positions_faulty_idx is missing from positions';
END IF;
-- 3. Partial index has the expected WHERE predicate
-- pg_indexes.indexdef includes the full CREATE INDEX statement as text;
-- the predicate appears as "WHERE (faulty = false)".
IF NOT EXISTS (
SELECT 1 FROM pg_indexes
WHERE schemaname = 'public'
AND tablename = 'positions'
AND indexname = 'positions_faulty_idx'
AND indexdef ILIKE '%where (faulty = false)%'
) THEN
RAISE EXCEPTION 'positions_faulty_idx exists but does not have the expected predicate "WHERE (faulty = false)" — check indexdef in pg_indexes';
END IF;
END $$;