5 Commits

Author SHA1 Message Date
julian ef8bd91d77 Reorder boot: bootstrap before schema-apply (and harden schema-apply)
Second CI dry-run failure exposed two more issues:

1. Schema-apply runs against a fresh Postgres → fails with "Directus
   isn't installed on this database. Please run 'directus bootstrap'
   first."  Bootstrap is what creates Directus's system tables; schema
   apply requires those tables to exist.  Local dev never tripped this
   because bootstrap had been done in earlier sessions.

2. `node cli.js schema apply` printed an ERROR but exited 0 in the
   not-installed case.  schema-apply.sh trusted the exit code,
   reported "schema apply complete," and the chain continued — until
   the post-schema migration tried to ALTER TABLE on user tables that
   never got created.

Fixes:

- entrypoint.sh: reorder steps from
    pre-schema → schema-apply → post-schema → bootstrap → start
  to
    pre-schema → bootstrap → schema-apply → post-schema → start
  Bootstrap is idempotent ("Database already initialized, skipping
  install" on warm DB) so adding it earlier costs nothing on warm
  boots and unblocks fresh boots.

- .gitea/workflows/build.yml: dry-run chain updated to mirror the new
  entrypoint order. Bootstrap is now part of the pre-boot validation,
  not skipped for speed. CI dry-run now genuinely covers the same path
  the production entrypoint takes (minus the final pm2-runtime step,
  which doesn't add validation value).

- scripts/schema-apply.sh: defense in depth. After the apply call
  succeeds (exit 0), grep the output for ' ERROR: ' and fail loudly if
  found. Catches the silent-failure pattern Directus's CLI exhibits
  when bootstrap hasn't run. Error message names the likely cause
  (schema-apply before bootstrap) for fast operator triage.

This is the second Phase 1 architectural correction exposed by the CI
dry-run gate. The gate is paying for itself in the very first PR it
runs against.
2026-05-02 10:51:39 +02:00
julian e01abfef27 Split db-init into pre-schema and post-schema phases
CI dry-run revealed an architectural ordering bug: db-init/004 and
db-init/005 ALTER TABLE the Directus-managed tables (organization_users,
events, etc.), but db-init runs BEFORE schema-apply creates those
tables. On a fresh CI Postgres this fails with "relation does not
exist." Local dev never tripped this because we'd created the tables
via MCP first.

Fix: introduce a post-schema migration phase. Two db-init runs in the
entrypoint, with schema-apply in between:

  1. apply-db-init.sh   db-init/        → positions hypertable + faulty
                                          column (tables Directus does
                                          NOT manage)
  2. schema-apply.sh                    → creates Directus-managed tables
                                          from snapshots/schema.yaml
  3. apply-db-init.sh   db-init-post/   → composite UNIQUE constraints on
                                          the Directus-managed tables
  4. directus bootstrap
  5. directus start

Files moved:
  db-init/004_junction_unique_constraints.sql →
    db-init-post/001_junction_unique_constraints.sql
  db-init/005_event_participation_unique_constraints.sql →
    db-init-post/002_event_participation_unique_constraints.sql

Each ALTER TABLE in the post-schema migrations is now wrapped in a
pg_constraint existence guard for idempotency. This handles the dev DB
where the constraints already exist (from the original 004/005 runs +
the manual psql recovery during task 1.5's destructive-apply
incident). Old 004/005 rows in migrations_applied become orphans —
harmless.

Updates:
- Dockerfile: COPY db-init-post into the image
- entrypoint.sh: 4-step → 5-step flow with the post-schema run between
  schema-apply and bootstrap
- .gitea/workflows/build.yml: dry-run chains all three pre-boot scripts
  (pre-schema → schema-apply → post-schema); path filter includes
  db-init-post/**
- Task specs 1.4 and 1.5 Done sections: updated to reference the new
  db-init-post/ path (db-init/004 → db-init-post/001, etc.)

The reusable runner script (apply-db-init.sh) didn't need to change —
it already accepts DB_INIT_DIR and uses just the basename for the
guard-table key. The two phases share migrations_applied; filenames
don't collide because pre-schema and post-schema use distinct
descriptive names.

Phase 1 is still "done" — this is a Phase 1 architectural correction
exposed by the CI dry-run, not a new task.
2026-05-02 10:48:06 +02:00
julian e22d9d489a Tasks 1.6 + 1.7 — schema tooling + real entrypoint flow
Two parallel tasks landing together. The boot pipeline is now wired
end-to-end: db-init → schema apply → directus bootstrap → pm2-runtime.
Live-verified by booting a fresh compose stack to a serving Directus
admin UI on :8055.

Task 1.6 — snapshot tooling:
- scripts/schema-snapshot.sh — host-side, dev-time. Verifies docker
  is on PATH and the directus compose service is running, runs
  `node /directus/cli.js schema snapshot --yes` inside the container,
  copies the YAML out to ./snapshots/schema.yaml. Used after admin-UI
  schema changes to capture the new state for git commit.
- scripts/schema-apply.sh — image-side, boot-time. Reads
  /directus/snapshots/schema.yaml, runs a dry-run preview, then
  applies. Gracefully skips when the snapshot is absent or whitespace-
  only (Phase 1 first-boot path before tasks 1.4/1.5 produce
  collections). SNAPSHOT_PATH env var override for CI flexibility.
- snapshots/README.md — lifecycle doc; warns against hand-editing.

Task 1.7 — real entrypoint flow:
- entrypoint.sh rewritten from Phase 1.1's placeholder to the
  4-step boot per ROADMAP design rule #3:
    1/4 db-init          → /directus/scripts/apply-db-init.sh
    2/4 schema apply     → /directus/scripts/schema-apply.sh
    3/4 directus bootstrap → node /directus/cli.js bootstrap
    4/4 directus start   → exec pm2-runtime start ecosystem.config.cjs
  set -euo pipefail halts boot on any step's non-zero exit. Each step
  emits a [entrypoint] log marker so an operator reading container
  logs sees which step failed.

Bug found and fixed during live verification:
- Both 1.6 scripts initially called bare `directus schema ...` as if
  the CLI were on PATH. Upstream directus/directus:11.17.4 does NOT
  expose `directus` on PATH — invocation is via `node /directus/cli.js`,
  same pattern as the entrypoint's bootstrap step. Both scripts
  corrected. Also added -T to docker compose exec in schema-snapshot.sh
  so the script works in non-TTY contexts (CI).

Phase 5 follow-up (non-blocking) flagged in 07's Done section: Directus
warns "Collection 'positions' doesn't have a primary key column and
will be ignored". The positions table uses UNIQUE INDEX (device_id, ts)
matching processor's pattern, not a PK constraint. Means positions is
not auto-registered as a Directus collection — fine for Phase 1, but
the operator faulty-flag workflow will need a custom endpoint or
manual collection registration in Phase 5.

ROADMAP marks 1.6 + 1.7 done. Phase 1 progress: 5/9 tasks complete
(1.1, 1.2, 1.3, 1.6, 1.7); 1.4, 1.5, 1.8, 1.9 remain.
2026-05-02 09:40:53 +02:00
julian dec2d190ce Task 1.2 — db-init runner script
scripts/apply-db-init.sh implements the boot-time runner that walks
db-init/*.sql in numeric-prefix order, applies each via psql, and
records successful applications in a migrations_applied guard table
so re-runs are no-ops.

All 7 acceptance criteria pass live against the dev compose stack:
empty dir, missing env var, apply, idempotent re-run, checksum
mismatch, filename collision, broken SQL.

Two retroactive Dockerfile corrections folded in (exposed by the
first live-test attempt of 1.2's script):

1. apk add bash. The directus/directus:11.17.4 base is Alpine and
   ships ash via BusyBox, not bash. The script uses bash-specific
   features (associative arrays, [[ ]], mapfile, BASH_REMATCH) and
   fails at line 69 in sh.

2. .gitattributes added at repo root forcing LF on *.sh, *.sql,
   *.yaml, *.yml. Without it, Windows checkouts with core.autocrlf=true
   (the Git-for-Windows default) silently inject CRLF, causing
   "bad interpreter: /usr/bin/env bash^M" inside the Linux container.
   This failure mode only manifests in the container.

Both corrections are documented in 01-project-scaffold.md's Done
section; 02-db-init-runner.md's Done section captures the live-test
results, the corrected docker compose run --entrypoint commands, and
the gotcha about compose env defaults masking missing-env-var tests.

ROADMAP marks 1.2 done; 1.3 next.
2026-05-01 22:35:17 +02:00
julian 387c3c4cfa Task 1.1 — Project scaffold
Phase 1 task 1.1 lands. Directus 11.17.4 boots locally end-to-end
against a TimescaleDB+PostGIS container; admin UI serves at :8055,
admin bootstrap from env vars works, named volumes preserve data
across down/up cycles.

Scaffold:
- Dockerfile — FROM directus/directus:11.17.4. Pre-installs
  postgresql16-client (ahead of task 1.2's db-init runner needing psql).
  Bakes in /directus/snapshots, /directus/db-init, /directus/scripts,
  /directus/extensions, /directus/entrypoint.sh.
- compose.dev.yaml — db (timescale/timescaledb-ha:pg16.6-ts2.17.2-all)
  + directus (local build), healthchecks, named volumes
  directus-pg-data + directus-uploads.
- entrypoint.sh — placeholder using upstream's actual flow
  (node cli.js bootstrap && pm2-runtime start ecosystem.config.cjs);
  the real db-init -> schema apply -> start wrapper lands in task 1.7.
- package.json — scripts-only (dev, dev:down, dev:reset,
  schema:snapshot, schema:apply, db:init), no runtime deps.
- .env.example — sectioned, fully documented, KEY/SECRET marked
  required with generation hints.
- .gitignore, .dockerignore — match the processor service conventions.
- snapshots/, db-init/, scripts/, extensions/ — empty with .gitkeep,
  filled by later Phase 1 tasks (1.3, 1.6) and Phase 5.

Lessons locked in (against the empirical pnpm dev boot):
- timescale/timescaledb-ha:pg16-latest does NOT exist on Docker Hub.
  Pin a concrete version (we used pg16.6-ts2.17.2-all).
- This image's data directory is /home/postgres/pgdata/data, not
  /pgdata or /var/lib/postgresql/data. PGDATA env var and the volume
  mount must both target it.
- The -all variant bundles PostGIS binaries but the extension is not
  auto-created on the directus database; CREATE EXTENSION lands in
  Phase 2 alongside the geofences/SLZs/waypoints collections.
- The upstream image's CMD is bootstrap + pm2-runtime, not a simple
  cli.js start. Bypassing pm2 would lose crash recovery.

These corrections folded into 01-project-scaffold.md (deliverable line
+ Done section), 08-gitea-ci-dryrun.md (CI service tag), and the
inline comments in compose.dev.yaml so future implementers don't
re-discover them.

Status: ROADMAP marks 1.1 done, Phase 1 in progress, 1.2 next.
2026-05-01 21:29:13 +02:00