11 Commits

Author SHA1 Message Date
julian e01abfef27 Split db-init into pre-schema and post-schema phases
CI dry-run revealed an architectural ordering bug: db-init/004 and
db-init/005 ALTER TABLE the Directus-managed tables (organization_users,
events, etc.), but db-init runs BEFORE schema-apply creates those
tables. On a fresh CI Postgres this fails with "relation does not
exist." Local dev never tripped this because we'd created the tables
via MCP first.

Fix: introduce a post-schema migration phase. Two db-init runs in the
entrypoint, with schema-apply in between:

  1. apply-db-init.sh   db-init/        → positions hypertable + faulty
                                          column (tables Directus does
                                          NOT manage)
  2. schema-apply.sh                    → creates Directus-managed tables
                                          from snapshots/schema.yaml
  3. apply-db-init.sh   db-init-post/   → composite UNIQUE constraints on
                                          the Directus-managed tables
  4. directus bootstrap
  5. directus start

Files moved:
  db-init/004_junction_unique_constraints.sql →
    db-init-post/001_junction_unique_constraints.sql
  db-init/005_event_participation_unique_constraints.sql →
    db-init-post/002_event_participation_unique_constraints.sql

Each ALTER TABLE in the post-schema migrations is now wrapped in a
pg_constraint existence guard for idempotency. This handles the dev DB
where the constraints already exist (from the original 004/005 runs +
the manual psql recovery during task 1.5's destructive-apply
incident). Old 004/005 rows in migrations_applied become orphans —
harmless.

Updates:
- Dockerfile: COPY db-init-post into the image
- entrypoint.sh: 4-step → 5-step flow with the post-schema run between
  schema-apply and bootstrap
- .gitea/workflows/build.yml: dry-run chains all three pre-boot scripts
  (pre-schema → schema-apply → post-schema); path filter includes
  db-init-post/**
- Task specs 1.4 and 1.5 Done sections: updated to reference the new
  db-init-post/ path (db-init/004 → db-init-post/001, etc.)

The reusable runner script (apply-db-init.sh) didn't need to change —
it already accepts DB_INIT_DIR and uses just the basename for the
guard-table key. The two phases share migrations_applied; filenames
don't collide because pre-schema and post-schema use distinct
descriptive names.

Phase 1 is still "done" — this is a Phase 1 architectural correction
exposed by the CI dry-run, not a new task.
2026-05-02 10:48:06 +02:00
julian ec119af274 Fix CI port collision — remap throwaway Postgres to host port 15432
The runner host typically has another Postgres listening on 5432
(local dev stack, stage instance, etc.), which made the services:
postgres container fail at start with "port already allocated."

Remap the host-side port from 5432:5432 to 15432:5432. The service
container still listens on 5432 internally; only the runner host
binding changes. Dry-run's DB_PORT updated to 15432 to match.

--network host semantics preserved: DB_HOST=localhost reaches the
service on the runner's loopback at the new port.

Why we still need a Postgres container at all: the dry-run gate
applies db-init/*.sql migrations and the directus schema snapshot
against a real DB to catch breakage before pushing the image. No
Postgres = no validation = the gate is bypassed.

Inline comment in the workflow now explains the choice; task spec's
Done section captures the correction so future readers don't
re-discover this.
2026-05-02 10:38:26 +02:00
julian 57624cb997 Task 1.9 — Rally Albania 2026 dogfood seed (Phase 1 complete)
Pre-seed landed via the directus-local MCP server. Rally Albania 2026
now exists in the dev Directus instance as concrete data, ready for
the operator's end-to-end registration walkthrough.

Seeded:
- Organization "Motorsport Club Albania" (slug msc-albania).
- Event "Rally Albania 2026" — discipline rally, 06-06 to 06-13.
- 18 classes from §2.2–§2.5 of the regs:
    M-1..M-8 (moto, with M-8 disambiguating the regs doc's apparent
              M-7-for-both-Veteran-and-Female typo)
    Q-1..Q-3 (quad)
    C-1, C-2, C-A, C-3 (car)
    S-1..S-3 (SSV)
- Test vehicle: 1998 Toyota Land Cruiser 70, plate AA-001-AA, 4500cc.
- Test devices: FMB920 chassis + FMB920 dash backup + FMB003 panic
  button. Plausible IMEIs (Teltonika TAC range).
- Junction rows: organization_vehicles (1), organization_devices (3).

Deliberately NOT seeded — left for operator's manual admin-UI
walkthrough as the dogfood acceptance test:
- organization_users row (admin in MSC Albania as race-director)
- entry row (Toyota in C-2, race_number 301, status registered)
- entry_crew row (admin as pilot)
- entry_devices rows × 3 (chassis + backup vehicle-mounted, body
  device assigned_user_id = admin)

This split validates the schema two ways: programmatic creation works
(via MCP), and the admin UI exposes the same collections with working
dropdowns / required-field validation / composite-unique enforcement.

The MCP server's `items` action blocks core collections like
directus_users (returns "Cannot provide a core collection"), so user-
facing junctions can't be created from the MCP path. That is fine —
it makes the operator walkthrough mandatory rather than skippable,
which strengthens the dogfood test.

---

Phase 1 complete (8/8 → 9/9). Status flips to 🟩 in ROADMAP.

Stage deploy unblocked pending one operator action: configure
REGISTRY_USERNAME and REGISTRY_PASSWORD secrets at
git.dev.microservices.al/trm/directus → Settings → Secrets. Without
those, task 1.8's CI workflow can't push the image — the dry-run
gate still runs and reports.

Project memory at .claude/projects/.../project_rally_albania_2026.md
updated to reflect Phase 1 completion and the seed state.
2026-05-02 10:29:22 +02:00
julian 0f89fea913 Task 1.8 — Gitea CI dry-run workflow
.gitea/workflows/build.yml builds the directus image on path-filtered
pushes to main and validates the boot pipeline against a throwaway
Postgres before pushing the image to the registry. The dry-run is the
gate that catches snapshot drift, broken db-init scripts, or
incompatible schema changes before they reach stage.

Workflow shape (mirrors processor's CI but tailored to Directus):
- Path filter: snapshots/, db-init/, extensions/, scripts/,
  entrypoint.sh, Dockerfile, the workflow file itself.
  Docs-only commits (.planning/, README.md, compose.dev.yaml,
  package.json) do NOT trigger CI.
- Throwaway Postgres via services: block, pinned to the same
  timescale/timescaledb-ha:pg16.6-ts2.17.2-all tag as compose.dev.yaml.
- Plain `docker build` (NOT build-push-action) so the image stays in
  the local daemon for the subsequent docker run dry-run.
- Dry-run: --network host + --entrypoint bash to override the upstream
  entrypoint and run only apply-db-init.sh && schema-apply.sh.
  Skips bootstrap and pm2-runtime — the schema apply is the gate.
- Two image tags: :main (mutable) and :<sha> (immutable).
- Optional Portainer webhook gated on secret presence; curl -fsS so a
  misconfigured URL fails the step explicitly.

Spec corrections folded in (the spec's draft had two contradictions
that would have failed at runtime):
1. DB_HOST=localhost (not 'postgres'). With --network host, service
   containers are reachable on the runner's loopback by their port
   mapping, NOT by service name. Service-name resolution requires the
   default bridge network; --network host overrides it.
2. health-retries 20 (not 10). timescaledb-ha:*-all does more init
   work at boot than vanilla postgres; 50s isn't always enough.

Operator action required in the Gitea repo Settings before first run:
configure REGISTRY_USERNAME and REGISTRY_PASSWORD secrets (required for
push); optionally PORTAINER_WEBHOOK_URL (for auto-deploy).

Live verification deferred to first relevant commit. Documented in the
task spec's Done section: positive (clean snapshot → push succeeds)
and negative (malformed snapshot → halt before push) cases to validate
once CI runs.

ROADMAP marks 1.8 done. Phase 1 progress: 8/9 tasks complete (1.1–1.8);
only 1.9 (Rally Albania 2026 dogfood seed) remains before Phase 1 ships.
2026-05-02 10:04:39 +02:00
julian 52524eb72d Task 1.5 — Event-participation collections
Five collections + 10 relations + 5 composite unique constraints,
captured into snapshots/schema.yaml (now 105 KB, up from 53 KB).

Collections:
- events       — 11 fields incl. organization_id M2O, discipline enum
                  (rally / time-trial / regatta / trail-run / hike),
                  starts_at/ends_at required.
- classes      — 8 fields incl. event_id M2O, code unique within event.
- entries      — 11 fields incl. event_id/vehicle_id (nullable for foot
                  races) /class_id M2O, race_number, status enum with
                  8 lifecycle values, archive on `withdrawn`.
                  team_id deliberately omitted (Phase 2+).
- entry_crew   — junction with role enum
                  (pilot/co-pilot/navigator/mechanic/rider/runner/hiker).
- entry_devices — junction with optional assigned_user_id (panic button
                  body-wear); ON DELETE SET NULL on that field since
                  user removal shouldn't block the device record.

10 M2O relations wired, all ON DELETE RESTRICT except
entry_devices.assigned_user_id (SET NULL).

db-init/005_event_participation_unique_constraints.sql adds composite
UNIQUE on:
  events (organization_id, slug)
  classes (event_id, code)
  entries (event_id, race_number)
  entry_crew (entry_id, user_id)
  entry_devices (entry_id, device_id)

---

Destructive-apply incident (recovered):

First attempt at this task hit a real foot-gun. After creating the 5
collections via MCP, we ran `compose build && up -d`. The image rebuild
baked in the snapshot from task 1.4 (only 7 collections). Boot's
schema-apply step ran `directus schema apply --yes` against that stale
snapshot — saw the 5 new collections in the DB but not in the snapshot
— DELETED THEM, taking the constraints with them.

Recovery: re-created the 5 collections + 10 relations via MCP, ran the
ALTER TABLE statements directly via psql to restore the constraints,
ran schema:snapshot BEFORE any further restart so the YAML reflects
the live state. Documented the operator rule (never rebuild with
uncommitted schema changes) inline in the task spec and in the
directus wiki entity page (separate commit in trm/docs).

Phase 3 hardening on the radar: DIRECTUS_SCHEMA_APPLY_MODE env var
with auto/dry-run/skip modes so dev environments default to non-
destructive behavior.

ROADMAP marks 1.5 done. Phase 1 progress: 7/9 tasks complete (1.1–1.7);
1.8, 1.9 remain.
2026-05-02 09:55:17 +02:00
julian 6f376a479f Task 1.4 — Org-level catalog collections
Seven collections + 3 directus_users custom fields, captured as
snapshots/schema.yaml (53 KB, 2,159 lines).

Collections:
- organizations          — UUID PK, name, slug UNIQUE
- vehicles               — UUID PK, make/model required, year/cc/vin/plate optional
- devices                — UUID PK, imei UNIQUE, model required
- organization_users     — junction with role enum (org-admin, race-director,
                            marshal, timekeeper, participant, viewer)
- organization_vehicles  — junction with registered_at
- organization_devices   — junction with registered_at
- directus_users         — extended with phone, birth_date, nationality

Six M2O relations on the junctions, all ON DELETE RESTRICT (matching
the schema-draft decision: deletion of an org/vehicle/device/user
requires explicit cleanup of dependents).

db-init/004_junction_unique_constraints.sql adds the composite UNIQUE
constraints on the three junctions:
  organization_users  (organization_id, user_id)
  organization_vehicles (organization_id, vehicle_id)
  organization_devices (organization_id, device_id)

Composite uniqueness lives in db-init rather than the Directus snapshot
because Directus's snapshot YAML format only captures single-column
unique constraints (the field-level is_unique flag). The migration file
documents the split inline.

Driven via the directus-local MCP server rather than admin-UI clicking
— programmatic create-collection/create-field/create-relation calls
against the running Directus instance, then `pnpm run schema:snapshot`
to capture the canonical YAML.

Live-verified: db-init/004 applies cleanly on container restart
(0 rows in the empty junctions, no constraint violations); schema-apply
against a snapshot-empty boot still skips correctly; all seven new
collections show up in the admin UI's data model navigation.

Snapshot includes positions and migrations_applied as auto-discovered
ghost entries (Directus introspects all public-schema tables). Harmless
— db-init creates them before schema-apply runs, so snapshot apply just
finds them already present.

ROADMAP marks 1.4 done. Phase 1 progress: 6/9 tasks complete (1.1, 1.2,
1.3, 1.4, 1.6, 1.7); 1.5, 1.8, 1.9 remain.
2026-05-02 09:41:01 +02:00
julian e22d9d489a Tasks 1.6 + 1.7 — schema tooling + real entrypoint flow
Two parallel tasks landing together. The boot pipeline is now wired
end-to-end: db-init → schema apply → directus bootstrap → pm2-runtime.
Live-verified by booting a fresh compose stack to a serving Directus
admin UI on :8055.

Task 1.6 — snapshot tooling:
- scripts/schema-snapshot.sh — host-side, dev-time. Verifies docker
  is on PATH and the directus compose service is running, runs
  `node /directus/cli.js schema snapshot --yes` inside the container,
  copies the YAML out to ./snapshots/schema.yaml. Used after admin-UI
  schema changes to capture the new state for git commit.
- scripts/schema-apply.sh — image-side, boot-time. Reads
  /directus/snapshots/schema.yaml, runs a dry-run preview, then
  applies. Gracefully skips when the snapshot is absent or whitespace-
  only (Phase 1 first-boot path before tasks 1.4/1.5 produce
  collections). SNAPSHOT_PATH env var override for CI flexibility.
- snapshots/README.md — lifecycle doc; warns against hand-editing.

Task 1.7 — real entrypoint flow:
- entrypoint.sh rewritten from Phase 1.1's placeholder to the
  4-step boot per ROADMAP design rule #3:
    1/4 db-init          → /directus/scripts/apply-db-init.sh
    2/4 schema apply     → /directus/scripts/schema-apply.sh
    3/4 directus bootstrap → node /directus/cli.js bootstrap
    4/4 directus start   → exec pm2-runtime start ecosystem.config.cjs
  set -euo pipefail halts boot on any step's non-zero exit. Each step
  emits a [entrypoint] log marker so an operator reading container
  logs sees which step failed.

Bug found and fixed during live verification:
- Both 1.6 scripts initially called bare `directus schema ...` as if
  the CLI were on PATH. Upstream directus/directus:11.17.4 does NOT
  expose `directus` on PATH — invocation is via `node /directus/cli.js`,
  same pattern as the entrypoint's bootstrap step. Both scripts
  corrected. Also added -T to docker compose exec in schema-snapshot.sh
  so the script works in non-TTY contexts (CI).

Phase 5 follow-up (non-blocking) flagged in 07's Done section: Directus
warns "Collection 'positions' doesn't have a primary key column and
will be ignored". The positions table uses UNIQUE INDEX (device_id, ts)
matching processor's pattern, not a PK constraint. Means positions is
not auto-registered as a Directus collection — fine for Phase 1, but
the operator faulty-flag workflow will need a custom endpoint or
manual collection registration in Phase 5.

ROADMAP marks 1.6 + 1.7 done. Phase 1 progress: 5/9 tasks complete
(1.1, 1.2, 1.3, 1.6, 1.7); 1.4, 1.5, 1.8, 1.9 remain.
2026-05-02 09:40:53 +02:00
julian 25a9731070 Task 1.3 — Initial migrations
Three SQL files under db-init/ create the schema processor writes
against. All three apply cleanly via apply-db-init.sh, are idempotent
on re-run, and end with assertion blocks that catch silent
schema drift.

001_extensions.sql — registers timescaledb on the directus database.
  PostGIS deferred to Phase 2 (per Plan A). The timescaledb-ha image
  pre-creates the extension at DB init, so the IF NOT EXISTS guard
  fires as a NOTICE — expected and harmless.

002_positions_hypertable.sql — positions hypertable, exact
  column-by-column match against processor/src/db/migrations/0001_positions.sql.

  Cross-checking against processor surfaced 8 divergences from the
  original task spec; processor wins in every case (it is the writer
  and is in production). The corrections:

    - added ingested_at timestamptz NOT NULL DEFAULT now()
    - added codec text NOT NULL
    - altitude/angle/speed: real NOT NULL (not DOUBLE PRECISION nullable)
    - satellites/priority: NOT NULL
    - removed attributes DEFAULT '{}'::jsonb (processor always writes)
    - replaced PRIMARY KEY with UNIQUE INDEX positions_device_ts
      (idiomatic for TimescaleDB hypertables)
    - chunk interval 1 day, not 7 days
    - two indexes (positions_device_ts + positions_ts), not one composite

  Without these corrections every processor INSERT would have failed
  with NOT NULL violations. Spec deliverables section updated to
  reflect the correct shape so future readers see the right schema.

003_faulty_column.sql — adds the operator-controlled faulty boolean
  flag plus the partial index positions_faulty_idx ON (device_id,
  ts DESC) WHERE faulty = FALSE. The column is set only via Directus
  admin (Phase 4 permissions); processor's writer never touches it.
  The partial index optimises the hot-path read pattern (every
  processor evaluator filters faulty = FALSE); operator queries that
  look at faulty rows specifically use the broader positions_device_ts
  index from 002.

Live-verified 2026-05-01:
  - First apply: 3 applied, 0 skipped, exit 0.
  - Re-run: 0 applied, 3 skipped, exit 0.
  - All 13 columns present with correct types/nullability/defaults.
  - Hypertable registered with 1-day chunk interval.
  - Three expected indexes present.

Non-blocking observation: TimescaleDB's create_hypertable()
auto-created a fourth index (positions_ts_idx) duplicating our
explicit positions_ts. Processor's migration has the same redundancy
so stage already lives with this. Cleanup path documented in the
task spec for Phase 3 hardening (create_default_indexes => FALSE
in the create_hypertable call).

ROADMAP marks 1.3 done; 1.4 next.
2026-05-01 22:52:06 +02:00
julian dec2d190ce Task 1.2 — db-init runner script
scripts/apply-db-init.sh implements the boot-time runner that walks
db-init/*.sql in numeric-prefix order, applies each via psql, and
records successful applications in a migrations_applied guard table
so re-runs are no-ops.

All 7 acceptance criteria pass live against the dev compose stack:
empty dir, missing env var, apply, idempotent re-run, checksum
mismatch, filename collision, broken SQL.

Two retroactive Dockerfile corrections folded in (exposed by the
first live-test attempt of 1.2's script):

1. apk add bash. The directus/directus:11.17.4 base is Alpine and
   ships ash via BusyBox, not bash. The script uses bash-specific
   features (associative arrays, [[ ]], mapfile, BASH_REMATCH) and
   fails at line 69 in sh.

2. .gitattributes added at repo root forcing LF on *.sh, *.sql,
   *.yaml, *.yml. Without it, Windows checkouts with core.autocrlf=true
   (the Git-for-Windows default) silently inject CRLF, causing
   "bad interpreter: /usr/bin/env bash^M" inside the Linux container.
   This failure mode only manifests in the container.

Both corrections are documented in 01-project-scaffold.md's Done
section; 02-db-init-runner.md's Done section captures the live-test
results, the corrected docker compose run --entrypoint commands, and
the gotcha about compose env defaults masking missing-env-var tests.

ROADMAP marks 1.2 done; 1.3 next.
2026-05-01 22:35:17 +02:00
julian 387c3c4cfa Task 1.1 — Project scaffold
Phase 1 task 1.1 lands. Directus 11.17.4 boots locally end-to-end
against a TimescaleDB+PostGIS container; admin UI serves at :8055,
admin bootstrap from env vars works, named volumes preserve data
across down/up cycles.

Scaffold:
- Dockerfile — FROM directus/directus:11.17.4. Pre-installs
  postgresql16-client (ahead of task 1.2's db-init runner needing psql).
  Bakes in /directus/snapshots, /directus/db-init, /directus/scripts,
  /directus/extensions, /directus/entrypoint.sh.
- compose.dev.yaml — db (timescale/timescaledb-ha:pg16.6-ts2.17.2-all)
  + directus (local build), healthchecks, named volumes
  directus-pg-data + directus-uploads.
- entrypoint.sh — placeholder using upstream's actual flow
  (node cli.js bootstrap && pm2-runtime start ecosystem.config.cjs);
  the real db-init -> schema apply -> start wrapper lands in task 1.7.
- package.json — scripts-only (dev, dev:down, dev:reset,
  schema:snapshot, schema:apply, db:init), no runtime deps.
- .env.example — sectioned, fully documented, KEY/SECRET marked
  required with generation hints.
- .gitignore, .dockerignore — match the processor service conventions.
- snapshots/, db-init/, scripts/, extensions/ — empty with .gitkeep,
  filled by later Phase 1 tasks (1.3, 1.6) and Phase 5.

Lessons locked in (against the empirical pnpm dev boot):
- timescale/timescaledb-ha:pg16-latest does NOT exist on Docker Hub.
  Pin a concrete version (we used pg16.6-ts2.17.2-all).
- This image's data directory is /home/postgres/pgdata/data, not
  /pgdata or /var/lib/postgresql/data. PGDATA env var and the volume
  mount must both target it.
- The -all variant bundles PostGIS binaries but the extension is not
  auto-created on the directus database; CREATE EXTENSION lands in
  Phase 2 alongside the geofences/SLZs/waypoints collections.
- The upstream image's CMD is bootstrap + pm2-runtime, not a simple
  cli.js start. Bypassing pm2 would lose crash recovery.

These corrections folded into 01-project-scaffold.md (deliverable line
+ Done section), 08-gitea-ci-dryrun.md (CI service tag), and the
inline comments in compose.dev.yaml so future implementers don't
re-discover them.

Status: ROADMAP marks 1.1 done, Phase 1 in progress, 1.2 next.
2026-05-01 21:29:13 +02:00
julian a8e808e71c Scaffold directus service planning structure
Initial commit. Establishes the .planning/ tree mirroring processor's
shape (ROADMAP.md as nav hub + per-phase folders with READMEs and
granular task files).

Six phases:

1. Slice 1 schema + deploy pipeline — what Rally Albania 2026 needs.
   Org catalog (orgs, users, vehicles, devices) + event participation
   (events, classes, entries, entry_crew, entry_devices). db-init/
   for the positions hypertable + faulty column. snapshot/apply
   tooling. Gitea CI dry-run. Dogfood seed of Rally Albania 2026.
   Nine task files with full Goal / Deliverables / Specification /
   Acceptance criteria / Risks / Done sections.

2. Course definition — stages, segments, geofences, waypoints, SLZs.
   PostGIS extension introduced here.

3. Timing & penalty tables — co-developed with processor Phase 2.
   entry_segment_starts, entry_crossings, entry_penalties,
   stage_results, penalty_formulas.

4. Permissions & policies — Directus 11 dynamic-filter Policies per
   logical role. Deployment-time work, deferred to keep early phases
   focused on the data model.

5. Custom extensions — TypeScript hooks/endpoints implementing the
   cross-plane workflows the schema implies (faulty-flag → Redis
   stream emit, stage-open materializer, etc.).

6. Future / optional — retroactivity preview UI, command-routing
   Flows, audit trails, federation rule import. Not committed.

Non-negotiable design rules captured in ROADMAP.md: schema authority
in Directus + snapshot-as-code + db-init for non-Directus DDL +
sequential idempotent migrations + entrypoint apply order + no
application logic in Flows + permissions deferred to Phase 4.

Architectural anchors point at the wiki at ../docs/wiki/ — the schema
draft, the Rally Albania 2025 source page, plus the existing
processor/postgres-timescaledb/live-channel pages. Each task file
calls out the wiki refs an implementing agent should read first.

README.md mirrors the processor service README structure: quick start,
local Docker test, prod/stage deployment notes, env vars, CI behavior.
2026-05-01 20:42:44 +02:00