Commit Graph

4 Commits

Author SHA1 Message Date
julian ef8bd91d77 Reorder boot: bootstrap before schema-apply (and harden schema-apply)
Second CI dry-run failure exposed two more issues:

1. Schema-apply runs against a fresh Postgres → fails with "Directus
   isn't installed on this database. Please run 'directus bootstrap'
   first."  Bootstrap is what creates Directus's system tables; schema
   apply requires those tables to exist.  Local dev never tripped this
   because bootstrap had been done in earlier sessions.

2. `node cli.js schema apply` printed an ERROR but exited 0 in the
   not-installed case.  schema-apply.sh trusted the exit code,
   reported "schema apply complete," and the chain continued — until
   the post-schema migration tried to ALTER TABLE on user tables that
   never got created.

Fixes:

- entrypoint.sh: reorder steps from
    pre-schema → schema-apply → post-schema → bootstrap → start
  to
    pre-schema → bootstrap → schema-apply → post-schema → start
  Bootstrap is idempotent ("Database already initialized, skipping
  install" on warm DB) so adding it earlier costs nothing on warm
  boots and unblocks fresh boots.

- .gitea/workflows/build.yml: dry-run chain updated to mirror the new
  entrypoint order. Bootstrap is now part of the pre-boot validation,
  not skipped for speed. CI dry-run now genuinely covers the same path
  the production entrypoint takes (minus the final pm2-runtime step,
  which doesn't add validation value).

- scripts/schema-apply.sh: defense in depth. After the apply call
  succeeds (exit 0), grep the output for ' ERROR: ' and fail loudly if
  found. Catches the silent-failure pattern Directus's CLI exhibits
  when bootstrap hasn't run. Error message names the likely cause
  (schema-apply before bootstrap) for fast operator triage.

This is the second Phase 1 architectural correction exposed by the CI
dry-run gate. The gate is paying for itself in the very first PR it
runs against.
2026-05-02 10:51:39 +02:00
julian e01abfef27 Split db-init into pre-schema and post-schema phases
CI dry-run revealed an architectural ordering bug: db-init/004 and
db-init/005 ALTER TABLE the Directus-managed tables (organization_users,
events, etc.), but db-init runs BEFORE schema-apply creates those
tables. On a fresh CI Postgres this fails with "relation does not
exist." Local dev never tripped this because we'd created the tables
via MCP first.

Fix: introduce a post-schema migration phase. Two db-init runs in the
entrypoint, with schema-apply in between:

  1. apply-db-init.sh   db-init/        → positions hypertable + faulty
                                          column (tables Directus does
                                          NOT manage)
  2. schema-apply.sh                    → creates Directus-managed tables
                                          from snapshots/schema.yaml
  3. apply-db-init.sh   db-init-post/   → composite UNIQUE constraints on
                                          the Directus-managed tables
  4. directus bootstrap
  5. directus start

Files moved:
  db-init/004_junction_unique_constraints.sql →
    db-init-post/001_junction_unique_constraints.sql
  db-init/005_event_participation_unique_constraints.sql →
    db-init-post/002_event_participation_unique_constraints.sql

Each ALTER TABLE in the post-schema migrations is now wrapped in a
pg_constraint existence guard for idempotency. This handles the dev DB
where the constraints already exist (from the original 004/005 runs +
the manual psql recovery during task 1.5's destructive-apply
incident). Old 004/005 rows in migrations_applied become orphans —
harmless.

Updates:
- Dockerfile: COPY db-init-post into the image
- entrypoint.sh: 4-step → 5-step flow with the post-schema run between
  schema-apply and bootstrap
- .gitea/workflows/build.yml: dry-run chains all three pre-boot scripts
  (pre-schema → schema-apply → post-schema); path filter includes
  db-init-post/**
- Task specs 1.4 and 1.5 Done sections: updated to reference the new
  db-init-post/ path (db-init/004 → db-init-post/001, etc.)

The reusable runner script (apply-db-init.sh) didn't need to change —
it already accepts DB_INIT_DIR and uses just the basename for the
guard-table key. The two phases share migrations_applied; filenames
don't collide because pre-schema and post-schema use distinct
descriptive names.

Phase 1 is still "done" — this is a Phase 1 architectural correction
exposed by the CI dry-run, not a new task.
2026-05-02 10:48:06 +02:00
julian ec119af274 Fix CI port collision — remap throwaway Postgres to host port 15432
The runner host typically has another Postgres listening on 5432
(local dev stack, stage instance, etc.), which made the services:
postgres container fail at start with "port already allocated."

Remap the host-side port from 5432:5432 to 15432:5432. The service
container still listens on 5432 internally; only the runner host
binding changes. Dry-run's DB_PORT updated to 15432 to match.

--network host semantics preserved: DB_HOST=localhost reaches the
service on the runner's loopback at the new port.

Why we still need a Postgres container at all: the dry-run gate
applies db-init/*.sql migrations and the directus schema snapshot
against a real DB to catch breakage before pushing the image. No
Postgres = no validation = the gate is bypassed.

Inline comment in the workflow now explains the choice; task spec's
Done section captures the correction so future readers don't
re-discover this.
2026-05-02 10:38:26 +02:00
julian 0f89fea913 Task 1.8 — Gitea CI dry-run workflow
.gitea/workflows/build.yml builds the directus image on path-filtered
pushes to main and validates the boot pipeline against a throwaway
Postgres before pushing the image to the registry. The dry-run is the
gate that catches snapshot drift, broken db-init scripts, or
incompatible schema changes before they reach stage.

Workflow shape (mirrors processor's CI but tailored to Directus):
- Path filter: snapshots/, db-init/, extensions/, scripts/,
  entrypoint.sh, Dockerfile, the workflow file itself.
  Docs-only commits (.planning/, README.md, compose.dev.yaml,
  package.json) do NOT trigger CI.
- Throwaway Postgres via services: block, pinned to the same
  timescale/timescaledb-ha:pg16.6-ts2.17.2-all tag as compose.dev.yaml.
- Plain `docker build` (NOT build-push-action) so the image stays in
  the local daemon for the subsequent docker run dry-run.
- Dry-run: --network host + --entrypoint bash to override the upstream
  entrypoint and run only apply-db-init.sh && schema-apply.sh.
  Skips bootstrap and pm2-runtime — the schema apply is the gate.
- Two image tags: :main (mutable) and :<sha> (immutable).
- Optional Portainer webhook gated on secret presence; curl -fsS so a
  misconfigured URL fails the step explicitly.

Spec corrections folded in (the spec's draft had two contradictions
that would have failed at runtime):
1. DB_HOST=localhost (not 'postgres'). With --network host, service
   containers are reachable on the runner's loopback by their port
   mapping, NOT by service name. Service-name resolution requires the
   default bridge network; --network host overrides it.
2. health-retries 20 (not 10). timescaledb-ha:*-all does more init
   work at boot than vanilla postgres; 50s isn't always enough.

Operator action required in the Gitea repo Settings before first run:
configure REGISTRY_USERNAME and REGISTRY_PASSWORD secrets (required for
push); optionally PORTAINER_WEBHOOK_URL (for auto-deploy).

Live verification deferred to first relevant commit. Documented in the
task spec's Done section: positive (clean snapshot → push succeeds)
and negative (malformed snapshot → halt before push) cases to validate
once CI runs.

ROADMAP marks 1.8 done. Phase 1 progress: 8/9 tasks complete (1.1–1.8);
only 1.9 (Rally Albania 2026 dogfood seed) remains before Phase 1 ships.
2026-05-02 10:04:39 +02:00