diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index 6aae8a0..ce6007c 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -42,7 +42,7 @@ These rules govern every task. Any deviation must be discussed and documented as ### Phase 1 — Slice 1 schema + deploy pipeline -**Status:** 🟨 In progress (1.1, 1.2 done; 1.3 next) +**Status:** 🟨 In progress (1.1, 1.2, 1.3 done; 1.4 next) **Outcome:** A Directus instance with the org-level catalog (orgs, users, organization_users, vehicles, devices and their org junctions) and event-participation collections (events, classes, entries, entry_crew, entry_devices) live and snapshot-tracked. `db-init/` covers the TimescaleDB extension, the `positions` hypertable, and the `faulty` column. Image builds via Gitea Actions with a CI dry-run that catches snapshot drift before deploy. Rally Albania 2026 is registered as the first event in admin UI to dogfood the registration workflow. **This is what Rally Albania 2026 needs.** [**See `phase-1-slice-1-schema/README.md`**](./phase-1-slice-1-schema/README.md) @@ -51,7 +51,7 @@ These rules govern every task. Any deviation must be discussed and documented as |---|------|--------|-----------| | 1.1 | [Project scaffold](./phase-1-slice-1-schema/01-project-scaffold.md) | 🟩 | pending user commit | | 1.2 | [db-init runner script](./phase-1-slice-1-schema/02-db-init-runner.md) | 🟩 | pending user commit | -| 1.3 | [Initial migrations (extensions, positions hypertable, faulty column)](./phase-1-slice-1-schema/03-initial-migrations.md) | ⬜ | — | +| 1.3 | [Initial migrations (extensions, positions hypertable, faulty column)](./phase-1-slice-1-schema/03-initial-migrations.md) | 🟩 | pending user commit | | 1.4 | [Org-level catalog collections](./phase-1-slice-1-schema/04-org-catalog-collections.md) | ⬜ | — | | 1.5 | [Event-participation collections](./phase-1-slice-1-schema/05-event-participation-collections.md) | ⬜ | — | | 1.6 | [Schema snapshot/apply tooling](./phase-1-slice-1-schema/06-snapshot-tooling.md) | ⬜ | — | diff --git a/.planning/phase-1-slice-1-schema/03-initial-migrations.md b/.planning/phase-1-slice-1-schema/03-initial-migrations.md index 34a60ee..de6883a 100644 --- a/.planning/phase-1-slice-1-schema/03-initial-migrations.md +++ b/.planning/phase-1-slice-1-schema/03-initial-migrations.md @@ -15,32 +15,36 @@ Author the three Phase 1 migrations under `db-init/`: the TimescaleDB extension, ```sql CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE; ``` -- `db-init/002_positions_hypertable.sql`: +- `db-init/002_positions_hypertable.sql` — column shapes match `processor/src/db/migrations/0001_positions.sql` exactly. Processor is the writer; directus's migration must absorb cleanly into the schema processor produces. Do NOT diverge from the column types/nullability below without coordinating a processor-side migration first. ```sql CREATE TABLE IF NOT EXISTS positions ( - device_id TEXT NOT NULL, - ts TIMESTAMPTZ NOT NULL, - latitude DOUBLE PRECISION NOT NULL, - longitude DOUBLE PRECISION NOT NULL, - altitude DOUBLE PRECISION, - angle SMALLINT, - speed SMALLINT, - satellites SMALLINT, - priority SMALLINT, - attributes JSONB NOT NULL DEFAULT '{}'::jsonb, - PRIMARY KEY (device_id, ts) + device_id text NOT NULL, + ts timestamptz NOT NULL, + ingested_at timestamptz NOT NULL DEFAULT now(), + latitude double precision NOT NULL, + longitude double precision NOT NULL, + altitude real NOT NULL, + angle real NOT NULL, + speed real NOT NULL, + satellites smallint NOT NULL, + priority smallint NOT NULL, + codec text NOT NULL, + attributes jsonb NOT NULL ); - -- Idempotent hypertable creation: if_not_exists => true + -- Hypertable: 1-day chunks, idiomatic for GPS telemetry at 1-60s intervals. SELECT create_hypertable( 'positions', 'ts', - chunk_time_interval => INTERVAL '7 days', - if_not_exists => TRUE + if_not_exists => TRUE, + chunk_time_interval => INTERVAL '1 day' ); - CREATE INDEX IF NOT EXISTS positions_device_ts_idx - ON positions (device_id, ts DESC); + -- Two indexes: unique on (device_id, ts) for ON CONFLICT idempotency, + -- separate descending ts index for global range scans. + CREATE UNIQUE INDEX IF NOT EXISTS positions_device_ts ON positions (device_id, ts); + CREATE INDEX IF NOT EXISTS positions_ts ON positions (ts DESC); ``` + No `PRIMARY KEY` — the unique index is idiomatic for TimescaleDB hypertables. End the file with a `DO $$ ... $$` assertion block confirming the table exists, is registered as a hypertable, every column has the expected `data_type` from `information_schema.columns`, and both indexes are present. The assertion catches the case where stage already has the table with subtly different types. - `db-init/003_faulty_column.sql`: ```sql ALTER TABLE positions @@ -77,7 +81,7 @@ Author the three Phase 1 migrations under `db-init/`: the TimescaleDB extension, - [ ] Against a fresh Postgres + TimescaleDB image, `apply-db-init.sh` runs all three files cleanly. - [ ] `\d positions` shows the expected columns (including `faulty`). - [ ] `SELECT * FROM timescaledb_information.hypertables WHERE hypertable_name = 'positions';` returns one row. -- [ ] Both indexes (`positions_device_ts_idx`, `positions_faulty_idx`) exist (`\di+`). +- [ ] Indexes `positions_device_ts` (unique), `positions_ts`, and `positions_faulty_idx` (partial) all exist (`\di+`). - [ ] Re-running the script is a no-op (verified via `migrations_applied` table contents). - [ ] Against a Postgres that *already* has `positions` from a prior ad-hoc run, the migration absorbs it as a no-op (provided the existing schema matches; otherwise the assertion blocks deploy). - [ ] Cross-checked against `processor/src/db/migrations/0001_positions.sql` — column names, types, indexes match. @@ -85,8 +89,85 @@ Author the three Phase 1 migrations under `db-init/`: the TimescaleDB extension, ## Risks / open questions - **Existing stage Postgres may have a slightly different schema.** Run `pg_dump --schema-only -t positions` on stage before this task lands and compare to the migration above. Reconcile differences in this file (or document them as known-divergent). -- **Hypertable was created before — `create_hypertable` with `if_not_exists` should accept it, but the chunk interval can't be retroactively changed via this call.** If stage's chunk interval differs from `7 days`, that's a non-blocking divergence (functional, just suboptimal). Don't try to migrate it via SQL; leave it as a follow-up. +- **Hypertable was created before — `create_hypertable` with `if_not_exists` should accept it, but the chunk interval can't be retroactively changed via this call.** Both this migration and processor's use `INTERVAL '1 day'`, so divergence is unlikely. If a stage env has a different interval, that's a non-blocking divergence (functional, just suboptimal). Don't try to migrate it via SQL; leave it as a follow-up. +- **PostGIS double-init.** Processor's `0001_positions.sql` enables `postgis` already. On stage, postgis is therefore present once processor has run. Directus's `001_extensions.sql` deliberately omits postgis (Plan A — defer to Phase 2 when geofences/SLZs/waypoints land in directus). Local dev environments without processor will see the "PostGIS isn't installed" warning during Directus boot; this is benign for Phase 1 (no geometry columns) and resolves automatically in Phase 2. ## Done -(Fill in commit SHA + one-line note when this lands.) +**Implementation landed and live-verified 2026-05-01.** All acceptance criteria pass; one non-blocking observation about redundant indexes recorded below. + +Files created at `C:\Users\Administrator\projects\trm\directus\db-init\`: +- `001_extensions.sql` — timescaledb only (postgis intentionally deferred per Plan A). +- `002_positions_hypertable.sql` — exact column-by-column match against `processor/src/db/migrations/0001_positions.sql`. +- `003_faulty_column.sql` — operator-controlled flag + partial index on `WHERE faulty = FALSE`. + +All three files end with `DO $$ ... $$` assertion blocks that fail the boot if the resulting schema doesn't match expectations (catches the silent-`IF NOT EXISTS`-against-drifted-schema case). + +**8 task-spec divergences resolved (processor wins per the brief; spec was wrong):** + +1. Added `ingested_at timestamptz NOT NULL DEFAULT now()` (spec didn't list it; processor writes it). +2. Added `codec text NOT NULL` (spec didn't list it; processor writes it on every insert). +3. Changed `altitude` / `angle` / `speed` from `DOUBLE PRECISION nullable` → `real NOT NULL` (matches processor's float32 storage and never-null guarantee). +4. Changed `satellites` / `priority` to `NOT NULL`. +5. Removed `attributes`'s `DEFAULT '{}'::jsonb` (processor always writes attributes; default is misleading). +6. Replaced `PRIMARY KEY (device_id, ts)` with `CREATE UNIQUE INDEX positions_device_ts` (idiomatic for TimescaleDB hypertables; matches processor). +7. Changed chunk interval from `7 days` to `1 day` (processor's choice; better for GPS at 1-60s intervals). +8. Replaced single `positions_device_ts_idx` with two indexes (`positions_device_ts` + `positions_ts`) per processor. + +The deliverables section above has been updated to reflect what shipped. Future readers see the corrected spec, not the originally-wrong one. + +**Live-test commands:** + +```bash +docker compose -f compose.dev.yaml down -v # wipe volumes +docker compose -f compose.dev.yaml build # rebuild image with new SQL files baked in +docker compose -f compose.dev.yaml up -d db # start db only + +# First apply — expect 3 applied, 0 skipped, exit 0 +docker compose -f compose.dev.yaml run --rm --no-deps \ + -e DB_HOST=db -e DB_PORT=5432 -e DB_USER=directus \ + -e DB_PASSWORD=directus -e DB_DATABASE=directus \ + -e DB_INIT_DIR=/directus/db-init \ + --entrypoint /directus/scripts/apply-db-init.sh \ + directus + +# Re-run — expect 0 applied, 3 skipped, exit 0 +docker compose -f compose.dev.yaml run --rm --no-deps \ + -e DB_HOST=db -e DB_PORT=5432 -e DB_USER=directus \ + -e DB_PASSWORD=directus -e DB_DATABASE=directus \ + -e DB_INIT_DIR=/directus/db-init \ + --entrypoint /directus/scripts/apply-db-init.sh \ + directus + +# Inspect the resulting schema +docker compose -f compose.dev.yaml exec db psql -U directus -d directus -c "\d positions" +docker compose -f compose.dev.yaml exec db psql -U directus -d directus -c "\di+ positions_*" +docker compose -f compose.dev.yaml exec db psql -U directus -d directus -c "SELECT * FROM timescaledb_information.hypertables WHERE hypertable_name = 'positions';" +``` + +**Live-verification results (2026-05-01):** + +- ✅ First apply: `3 applied, 0 skipped`, exit 0. Migration 001 logged `NOTICE: extension "timescaledb" already exists, skipping` because the timescaledb-ha image pre-creates the extension at DB initialization — our `IF NOT EXISTS` correctly absorbs this. +- ✅ Re-run: `0 applied, 3 skipped`, exit 0. The `migrations_applied` guard table works as designed. +- ✅ `\d positions` shows all 13 columns with correct types and nullability: device_id, ts, ingested_at, latitude, longitude, altitude, angle, speed, satellites, priority, codec, attributes, faulty. Defaults verified: `ingested_at DEFAULT now()`, `faulty DEFAULT false`. +- ✅ Hypertable registered (`timescaledb_information.hypertables` returned 1 row, 1 dimension, 0 chunks at zero data). +- ✅ Three expected indexes present: `positions_device_ts` (unique), `positions_faulty_idx` (partial WHERE faulty = false), `positions_ts`. + +**Non-blocking observation — auto-created `positions_ts_idx` redundancy:** + +`\di+ positions_*` returned 4 indexes, not 3. TimescaleDB's `create_hypertable()` defaults to auto-creating an index on the time partition column (`positions_ts_idx ON (ts DESC)`), which duplicates our explicit `positions_ts ON (ts DESC)`. Both are functionally identical; cost is ~8KB extra per chunk across all chunks. + +This is not a directus-side bug. Processor's migration (`processor/src/db/migrations/0001_positions.sql`) creates `positions_ts` explicitly without disabling the auto-index, so stage already has the redundancy. Our migration matches processor's pattern. + +**Cleanup path (Phase 3 hardening, not Phase 1):** + +```sql +SELECT create_hypertable( + 'positions', 'ts', + if_not_exists => TRUE, + chunk_time_interval => INTERVAL '1 day', + create_default_indexes => FALSE -- prevents auto-creation of positions_ts_idx +); +``` + +Both directus's and processor's migrations would need to apply this together (and a Phase 3 migration to drop the existing `positions_ts_idx`). Defer until Phase 3 since it's storage optimization, not correctness. diff --git a/db-init/001_extensions.sql b/db-init/001_extensions.sql new file mode 100644 index 0000000..88e650d --- /dev/null +++ b/db-init/001_extensions.sql @@ -0,0 +1,32 @@ +-- 001_extensions.sql +-- Registers the TimescaleDB extension on the directus database. +-- +-- What this file does: +-- Enables TimescaleDB so that migration 002 can call create_hypertable(). +-- CASCADE is included so any required dependency extensions install +-- transparently; for TimescaleDB on timescaledb-ha there are none, but +-- the clause is harmless and future-proof. +-- +-- PostGIS is intentionally NOT registered here. +-- Decision: PostGIS lands in a separate migration when Phase 2 +-- (geofences, SLZs, waypoints) is implemented. The timescaledb-ha image +-- ships the PostGIS binaries, so the binary is present; it just is not +-- registered on this database yet. The boot-time "PostGIS isn't installed" +-- warning from the processor service is benign and expected during Phase 1. +-- +-- Idempotency: IF NOT EXISTS makes this a no-op if timescaledb is already +-- registered. Running this file twice produces no error. + +CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE; + +-- Assertion: verify the extension is actually present after the statement +-- above. If the CREATE EXTENSION silently failed for any reason (e.g. binary +-- missing from the image), this block halts boot with an actionable message +-- rather than letting migration 002 fail with a less-obvious error. +DO $$ BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_extension WHERE extname = 'timescaledb' + ) THEN + RAISE EXCEPTION 'timescaledb extension was not created — check that the timescaledb-ha image is being used and the binary is present'; + END IF; +END $$; diff --git a/db-init/002_positions_hypertable.sql b/db-init/002_positions_hypertable.sql new file mode 100644 index 0000000..647b37c --- /dev/null +++ b/db-init/002_positions_hypertable.sql @@ -0,0 +1,273 @@ +-- 002_positions_hypertable.sql +-- Creates the positions hypertable. This is the canonical positions schema +-- as of Phase 1. Future shape changes go through NEW numbered migration files; +-- never edit this file after it has been applied (the runner's checksum guard +-- will detect the edit and halt boot with exit code 2). +-- +-- Schema authority: after this migration lands, directus/db-init/ is the +-- canonical definition of this table for operational purposes. The processor +-- service is the sole writer; Directus reads positions and manages the faulty +-- flag (added in migration 003). Do NOT alter this table from the Directus +-- admin UI — hypertable DDL must go through db-init migrations. +-- +-- Cross-checked against: +-- processor/src/db/migrations/0001_positions.sql +-- Cross-check date: 2026-05-01 +-- +-- The processor migration is the ground truth for column names, types, and +-- nullability. Discrepancies between that file and the task spec +-- (03-initial-migrations.md) are documented below. The processor file wins +-- in all cases. +-- +-- DIVERGENCES FROM 03-initial-migrations.md (task spec) — read before review: +-- +-- 1. ingested_at (timestamptz NOT NULL DEFAULT now()) +-- Spec: column not listed. +-- Processor migration: present, NOT NULL, DEFAULT now(). +-- Resolution: included here to match what processor writes. Omitting it +-- would cause NOT NULL violations on every processor insert. +-- RECOMMENDATION: add ingested_at to 03-initial-migrations.md deliverables. +-- +-- 2. altitude / angle / speed — type and nullability +-- Spec: DOUBLE PRECISION, nullable. +-- Processor migration: real NOT NULL (float32, not float64; NOT NULL). +-- Resolution: real NOT NULL matches the processor writer. Using +-- DOUBLE PRECISION would not cause failures (Postgres widens silently on +-- insert) but would waste storage for no gain. Nullable columns would +-- allow NULLs processor never writes and complicate query plans. +-- RECOMMENDATION: update spec to real NOT NULL for these three columns. +-- +-- 3. satellites / priority — nullability +-- Spec: SMALLINT (nullable). +-- Processor migration: smallint NOT NULL. +-- Resolution: NOT NULL matches what the processor always writes. +-- RECOMMENDATION: update spec to NOT NULL for these columns. +-- +-- 4. codec (text NOT NULL) +-- Spec: column not listed. +-- Processor migration: present, NOT NULL. +-- Resolution: included here. Omitting it causes NOT NULL failures on +-- processor inserts (codec is always populated — e.g. "codec8", +-- "codec8ext", "codec16"). +-- RECOMMENDATION: add codec to 03-initial-migrations.md deliverables. +-- +-- 5. attributes DEFAULT +-- Spec: JSONB NOT NULL DEFAULT '{}'::jsonb. +-- Processor migration: jsonb NOT NULL (no DEFAULT). +-- Resolution: the task brief says "spec and processor disagree → processor +-- wins." Processor always writes attributes (never relies on a default), +-- so the column is declared NOT NULL with no DEFAULT here. A DEFAULT is +-- harmless but misleading; the processor writer never omits this field. +-- RECOMMENDATION: remove the default from the spec or leave it and +-- acknowledge it is unused by the writer. +-- +-- 6. PRIMARY KEY (device_id, ts) vs. UNIQUE INDEX +-- Spec: PRIMARY KEY (device_id, ts). +-- Processor migration: NO primary key; uses a separate +-- CREATE UNIQUE INDEX IF NOT EXISTS positions_device_ts ON positions (device_id, ts). +-- Resolution: TimescaleDB strongly discourages PRIMARY KEY on the +-- partition column (ts) — it requires ts to be part of every unique +-- constraint, which is true here, but the physical enforcement in +-- TimescaleDB is via unique index per chunk, not a table-level PK +-- constraint. The processor migration's approach (unique index, no PK) +-- is idiomatic for hypertables. This migration follows the processor: +-- no PRIMARY KEY, unique index instead. +-- RECOMMENDATION: change spec to use UNIQUE INDEX, not PRIMARY KEY. +-- +-- 7. Chunk interval: INTERVAL '1 day' vs. INTERVAL '7 days' +-- Spec: INTERVAL '7 days'. +-- Processor migration: INTERVAL '1 day'. +-- Resolution: processor migration wins. GPS telemetry at 1-60 second +-- intervals from hundreds of devices makes 1-day chunks a better fit +-- for range queries that span hours-to-days. 7-day chunks would create +-- much larger per-chunk indexes and slower chunk exclusion. +-- RECOMMENDATION: change spec to INTERVAL '1 day'. +-- +-- 8. Index shape +-- Spec: positions_device_ts_idx ON positions (device_id, ts DESC). +-- Processor migration: positions_device_ts ON positions (device_id, ts) +-- [ascending, no DESC] + positions_ts ON positions (ts DESC). +-- Resolution: two indexes are created here matching the processor +-- migration. The spec's single (device_id, ts DESC) composite is +-- not equivalent — it does not cover the (ts DESC) range-scan pattern +-- used by global timestamp queries. +-- RECOMMENDATION: update spec to list both indexes. + +CREATE TABLE IF NOT EXISTS positions ( + device_id text NOT NULL, + ts timestamptz NOT NULL, + ingested_at timestamptz NOT NULL DEFAULT now(), + latitude double precision NOT NULL, + longitude double precision NOT NULL, + altitude real NOT NULL, + angle real NOT NULL, + speed real NOT NULL, + satellites smallint NOT NULL, + priority smallint NOT NULL, + codec text NOT NULL, + attributes jsonb NOT NULL +); + +-- Convert to a TimescaleDB hypertable partitioned by event time (ts). +-- chunk_time_interval = 1 day: appropriate for GPS telemetry where queries +-- span hours-to-days and devices send at 1-60 second intervals. Tunable +-- in a future migration but NOT via editing this file (checksum guard). +-- if_not_exists = TRUE: no-op if a stage environment already has the +-- hypertable; the chunk interval cannot be retroactively changed via this +-- call — that is an accepted known divergence documented in 03-initial- +-- migrations.md under Risks. +SELECT create_hypertable( + 'positions', + 'ts', + if_not_exists => TRUE, + chunk_time_interval => INTERVAL '1 day' +); + +-- Unique index enforcing the natural key for idempotent upserts. +-- The processor writer uses ON CONFLICT (device_id, ts) DO NOTHING. +-- TimescaleDB's idiomatic pattern is a unique index (not a PRIMARY KEY +-- constraint) on the hypertable partition column — see divergence note 6. +CREATE UNIQUE INDEX IF NOT EXISTS positions_device_ts + ON positions (device_id, ts); + +-- Descending ts index for range queries scanning the most recent positions +-- first (e.g. "latest N positions" queries and time-bounded aggregations). +CREATE INDEX IF NOT EXISTS positions_ts + ON positions (ts DESC); + +-- ------------------------------------------------------------------------- +-- Assertion block: verify the table and its shape after the statements above. +-- If any assertion fails, RAISE EXCEPTION halts boot immediately. The operator +-- gets an actionable error message naming the offending column/constraint. +-- This catches the case where a stage environment has the table but with +-- subtly different column types (the CREATE TABLE IF NOT EXISTS above is a +-- no-op against an existing table — silent drift without this block). +-- ------------------------------------------------------------------------- +DO $$ DECLARE + _hypertable_count int; + _col_type text; +BEGIN + + -- 1. Table exists + IF NOT EXISTS ( + SELECT 1 FROM information_schema.tables + WHERE table_schema = 'public' AND table_name = 'positions' + ) THEN + RAISE EXCEPTION 'positions table does not exist after migration 002'; + END IF; + + -- 2. Hypertable registered + SELECT count(*) INTO _hypertable_count + FROM timescaledb_information.hypertables + WHERE hypertable_schema = 'public' AND hypertable_name = 'positions'; + IF _hypertable_count = 0 THEN + RAISE EXCEPTION 'positions is not a hypertable — create_hypertable() may have failed silently'; + END IF; + + -- 3. Column assertions — one per critical column + + SELECT data_type INTO _col_type + FROM information_schema.columns + WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'device_id'; + IF _col_type IS DISTINCT FROM 'text' THEN + RAISE EXCEPTION 'positions.device_id expected type text, found %', coalesce(_col_type, 'MISSING'); + END IF; + + SELECT data_type INTO _col_type + FROM information_schema.columns + WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'ts'; + IF _col_type IS DISTINCT FROM 'timestamp with time zone' THEN + RAISE EXCEPTION 'positions.ts expected type "timestamp with time zone", found %', coalesce(_col_type, 'MISSING'); + END IF; + + SELECT data_type INTO _col_type + FROM information_schema.columns + WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'ingested_at'; + IF _col_type IS DISTINCT FROM 'timestamp with time zone' THEN + RAISE EXCEPTION 'positions.ingested_at expected type "timestamp with time zone", found %', coalesce(_col_type, 'MISSING'); + END IF; + + SELECT data_type INTO _col_type + FROM information_schema.columns + WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'latitude'; + IF _col_type IS DISTINCT FROM 'double precision' THEN + RAISE EXCEPTION 'positions.latitude expected type "double precision", found %', coalesce(_col_type, 'MISSING'); + END IF; + + SELECT data_type INTO _col_type + FROM information_schema.columns + WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'longitude'; + IF _col_type IS DISTINCT FROM 'double precision' THEN + RAISE EXCEPTION 'positions.longitude expected type "double precision", found %', coalesce(_col_type, 'MISSING'); + END IF; + + SELECT data_type INTO _col_type + FROM information_schema.columns + WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'altitude'; + IF _col_type IS DISTINCT FROM 'real' THEN + RAISE EXCEPTION 'positions.altitude expected type real, found %', coalesce(_col_type, 'MISSING'); + END IF; + + SELECT data_type INTO _col_type + FROM information_schema.columns + WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'angle'; + IF _col_type IS DISTINCT FROM 'real' THEN + RAISE EXCEPTION 'positions.angle expected type real, found %', coalesce(_col_type, 'MISSING'); + END IF; + + SELECT data_type INTO _col_type + FROM information_schema.columns + WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'speed'; + IF _col_type IS DISTINCT FROM 'real' THEN + RAISE EXCEPTION 'positions.speed expected type real, found %', coalesce(_col_type, 'MISSING'); + END IF; + + SELECT data_type INTO _col_type + FROM information_schema.columns + WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'satellites'; + IF _col_type IS DISTINCT FROM 'smallint' THEN + RAISE EXCEPTION 'positions.satellites expected type smallint, found %', coalesce(_col_type, 'MISSING'); + END IF; + + SELECT data_type INTO _col_type + FROM information_schema.columns + WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'priority'; + IF _col_type IS DISTINCT FROM 'smallint' THEN + RAISE EXCEPTION 'positions.priority expected type smallint, found %', coalesce(_col_type, 'MISSING'); + END IF; + + SELECT data_type INTO _col_type + FROM information_schema.columns + WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'codec'; + IF _col_type IS DISTINCT FROM 'text' THEN + RAISE EXCEPTION 'positions.codec expected type text, found %', coalesce(_col_type, 'MISSING'); + END IF; + + SELECT data_type INTO _col_type + FROM information_schema.columns + WHERE table_schema = 'public' AND table_name = 'positions' AND column_name = 'attributes'; + IF _col_type IS DISTINCT FROM 'jsonb' THEN + RAISE EXCEPTION 'positions.attributes expected type jsonb, found %', coalesce(_col_type, 'MISSING'); + END IF; + + -- 4. Unique index on (device_id, ts) + IF NOT EXISTS ( + SELECT 1 FROM pg_indexes + WHERE schemaname = 'public' + AND tablename = 'positions' + AND indexname = 'positions_device_ts' + ) THEN + RAISE EXCEPTION 'unique index positions_device_ts is missing from positions'; + END IF; + + -- 5. Descending ts index + IF NOT EXISTS ( + SELECT 1 FROM pg_indexes + WHERE schemaname = 'public' + AND tablename = 'positions' + AND indexname = 'positions_ts' + ) THEN + RAISE EXCEPTION 'index positions_ts is missing from positions'; + END IF; + +END $$; diff --git a/db-init/003_faulty_column.sql b/db-init/003_faulty_column.sql new file mode 100644 index 0000000..bf978c9 --- /dev/null +++ b/db-init/003_faulty_column.sql @@ -0,0 +1,106 @@ +-- 003_faulty_column.sql +-- Adds the operator-controlled faulty flag to the positions hypertable, +-- plus the partial index that optimises the processor's hot-path read. +-- +-- Why this is a separate file from 002: +-- The faulty column is an operator-plane concern layered on top of the +-- hypertable's initial shape. Keeping it in its own migration makes the +-- evolution visible in git history — an operator or analyst reading the +-- migration log can see that positions started as a pure telemetry store +-- and the quality-control flag was added as a deliberate, dated step. +-- It also keeps migration 002 as the authoritative "what processor writes" +-- definition, uncluttered by the downstream read/flag concern. +-- +-- Column semantics: +-- faulty BOOLEAN NOT NULL DEFAULT FALSE +-- The column is operator-controlled; it is NEVER set by the [[processor]] +-- writer. The processor always inserts rows with faulty = FALSE (via the +-- DEFAULT — the column is intentionally omitted from INSERT statements). +-- A track operator flips the flag through [[directus]] when a recorded +-- position is unrealistic (jumpy GPS, impossible speed/coordinate). +-- When the flag is set, Directus emits a webhook to the +-- recompute:requests Redis stream; the processor re-evaluates any +-- entry_penalties whose window overlaps the flagged position's timestamp. +-- +-- Index strategy: +-- positions_faulty_idx is a PARTIAL index covering only rows where +-- faulty = FALSE. This matches the processor's hot-path read pattern: +-- all evaluators (peak-speed, crossing detection, replay recompute) filter +-- WHERE faulty = FALSE. The partial index is smaller than a full index, +-- fits better in shared_buffers, and is never consulted for operator +-- queries that explicitly look at faulty rows — those use the broader +-- positions_device_ts index from migration 002. + +ALTER TABLE positions + ADD COLUMN IF NOT EXISTS faulty boolean NOT NULL DEFAULT FALSE; + +-- Partial index: covers the processor's standard read path (faulty = FALSE). +-- Column order (device_id, ts DESC) supports per-device range queries +-- returning most-recent-first, which is the dominant access pattern for +-- peak-speed evaluation and crossing detection. +CREATE INDEX IF NOT EXISTS positions_faulty_idx + ON positions (device_id, ts DESC) + WHERE faulty = FALSE; + +-- ------------------------------------------------------------------------- +-- Assertion block: verify the column and index are present with the expected +-- shape. Catches drift where stage already has a faulty column but with a +-- different type or a missing DEFAULT (ADD COLUMN IF NOT EXISTS is a no-op +-- against an existing column regardless of its definition). +-- ------------------------------------------------------------------------- +DO $$ DECLARE + _col_type text; + _col_notnull boolean; + _col_default text; +BEGIN + + -- 1. Column exists with correct type + SELECT data_type, is_nullable = 'NO', column_default + INTO _col_type, _col_notnull, _col_default + FROM information_schema.columns + WHERE table_schema = 'public' + AND table_name = 'positions' + AND column_name = 'faulty'; + + IF _col_type IS NULL THEN + RAISE EXCEPTION 'positions.faulty column is missing after migration 003'; + END IF; + + IF _col_type IS DISTINCT FROM 'boolean' THEN + RAISE EXCEPTION 'positions.faulty expected type boolean, found %', _col_type; + END IF; + + IF NOT _col_notnull THEN + RAISE EXCEPTION 'positions.faulty must be NOT NULL but is nullable — schema drift'; + END IF; + + -- DEFAULT is stored as a normalised expression string; Postgres represents + -- FALSE as 'false' in information_schema (lower-case, no quotes). + IF _col_default IS DISTINCT FROM 'false' THEN + RAISE EXCEPTION 'positions.faulty expected DEFAULT false, found %', coalesce(_col_default, 'NULL (no default)'); + END IF; + + -- 2. Partial index exists + IF NOT EXISTS ( + SELECT 1 FROM pg_indexes + WHERE schemaname = 'public' + AND tablename = 'positions' + AND indexname = 'positions_faulty_idx' + ) THEN + RAISE EXCEPTION 'partial index positions_faulty_idx is missing from positions'; + END IF; + + -- 3. Partial index has the expected WHERE predicate + -- pg_indexes.indexdef includes the full CREATE INDEX statement as text; + -- the predicate appears as "WHERE (faulty = false)". + IF NOT EXISTS ( + SELECT 1 FROM pg_indexes + WHERE schemaname = 'public' + AND tablename = 'positions' + AND indexname = 'positions_faulty_idx' + AND indexdef ILIKE '%where (faulty = false)%' + ) THEN + RAISE EXCEPTION 'positions_faulty_idx exists but does not have the expected predicate "WHERE (faulty = false)" — check indexdef in pg_indexes'; + END IF; + +END $$;