Second CI dry-run failure exposed two more issues:
1. Schema-apply runs against a fresh Postgres → fails with "Directus
isn't installed on this database. Please run 'directus bootstrap'
first." Bootstrap is what creates Directus's system tables; schema
apply requires those tables to exist. Local dev never tripped this
because bootstrap had been done in earlier sessions.
2. `node cli.js schema apply` printed an ERROR but exited 0 in the
not-installed case. schema-apply.sh trusted the exit code,
reported "schema apply complete," and the chain continued — until
the post-schema migration tried to ALTER TABLE on user tables that
never got created.
Fixes:
- entrypoint.sh: reorder steps from
pre-schema → schema-apply → post-schema → bootstrap → start
to
pre-schema → bootstrap → schema-apply → post-schema → start
Bootstrap is idempotent ("Database already initialized, skipping
install" on warm DB) so adding it earlier costs nothing on warm
boots and unblocks fresh boots.
- .gitea/workflows/build.yml: dry-run chain updated to mirror the new
entrypoint order. Bootstrap is now part of the pre-boot validation,
not skipped for speed. CI dry-run now genuinely covers the same path
the production entrypoint takes (minus the final pm2-runtime step,
which doesn't add validation value).
- scripts/schema-apply.sh: defense in depth. After the apply call
succeeds (exit 0), grep the output for ' ERROR: ' and fail loudly if
found. Catches the silent-failure pattern Directus's CLI exhibits
when bootstrap hasn't run. Error message names the likely cause
(schema-apply before bootstrap) for fast operator triage.
This is the second Phase 1 architectural correction exposed by the CI
dry-run gate. The gate is paying for itself in the very first PR it
runs against.
CI dry-run revealed an architectural ordering bug: db-init/004 and
db-init/005 ALTER TABLE the Directus-managed tables (organization_users,
events, etc.), but db-init runs BEFORE schema-apply creates those
tables. On a fresh CI Postgres this fails with "relation does not
exist." Local dev never tripped this because we'd created the tables
via MCP first.
Fix: introduce a post-schema migration phase. Two db-init runs in the
entrypoint, with schema-apply in between:
1. apply-db-init.sh db-init/ → positions hypertable + faulty
column (tables Directus does
NOT manage)
2. schema-apply.sh → creates Directus-managed tables
from snapshots/schema.yaml
3. apply-db-init.sh db-init-post/ → composite UNIQUE constraints on
the Directus-managed tables
4. directus bootstrap
5. directus start
Files moved:
db-init/004_junction_unique_constraints.sql →
db-init-post/001_junction_unique_constraints.sql
db-init/005_event_participation_unique_constraints.sql →
db-init-post/002_event_participation_unique_constraints.sql
Each ALTER TABLE in the post-schema migrations is now wrapped in a
pg_constraint existence guard for idempotency. This handles the dev DB
where the constraints already exist (from the original 004/005 runs +
the manual psql recovery during task 1.5's destructive-apply
incident). Old 004/005 rows in migrations_applied become orphans —
harmless.
Updates:
- Dockerfile: COPY db-init-post into the image
- entrypoint.sh: 4-step → 5-step flow with the post-schema run between
schema-apply and bootstrap
- .gitea/workflows/build.yml: dry-run chains all three pre-boot scripts
(pre-schema → schema-apply → post-schema); path filter includes
db-init-post/**
- Task specs 1.4 and 1.5 Done sections: updated to reference the new
db-init-post/ path (db-init/004 → db-init-post/001, etc.)
The reusable runner script (apply-db-init.sh) didn't need to change —
it already accepts DB_INIT_DIR and uses just the basename for the
guard-table key. The two phases share migrations_applied; filenames
don't collide because pre-schema and post-schema use distinct
descriptive names.
Phase 1 is still "done" — this is a Phase 1 architectural correction
exposed by the CI dry-run, not a new task.
The runner host typically has another Postgres listening on 5432
(local dev stack, stage instance, etc.), which made the services:
postgres container fail at start with "port already allocated."
Remap the host-side port from 5432:5432 to 15432:5432. The service
container still listens on 5432 internally; only the runner host
binding changes. Dry-run's DB_PORT updated to 15432 to match.
--network host semantics preserved: DB_HOST=localhost reaches the
service on the runner's loopback at the new port.
Why we still need a Postgres container at all: the dry-run gate
applies db-init/*.sql migrations and the directus schema snapshot
against a real DB to catch breakage before pushing the image. No
Postgres = no validation = the gate is bypassed.
Inline comment in the workflow now explains the choice; task spec's
Done section captures the correction so future readers don't
re-discover this.
.gitea/workflows/build.yml builds the directus image on path-filtered
pushes to main and validates the boot pipeline against a throwaway
Postgres before pushing the image to the registry. The dry-run is the
gate that catches snapshot drift, broken db-init scripts, or
incompatible schema changes before they reach stage.
Workflow shape (mirrors processor's CI but tailored to Directus):
- Path filter: snapshots/, db-init/, extensions/, scripts/,
entrypoint.sh, Dockerfile, the workflow file itself.
Docs-only commits (.planning/, README.md, compose.dev.yaml,
package.json) do NOT trigger CI.
- Throwaway Postgres via services: block, pinned to the same
timescale/timescaledb-ha:pg16.6-ts2.17.2-all tag as compose.dev.yaml.
- Plain `docker build` (NOT build-push-action) so the image stays in
the local daemon for the subsequent docker run dry-run.
- Dry-run: --network host + --entrypoint bash to override the upstream
entrypoint and run only apply-db-init.sh && schema-apply.sh.
Skips bootstrap and pm2-runtime — the schema apply is the gate.
- Two image tags: :main (mutable) and :<sha> (immutable).
- Optional Portainer webhook gated on secret presence; curl -fsS so a
misconfigured URL fails the step explicitly.
Spec corrections folded in (the spec's draft had two contradictions
that would have failed at runtime):
1. DB_HOST=localhost (not 'postgres'). With --network host, service
containers are reachable on the runner's loopback by their port
mapping, NOT by service name. Service-name resolution requires the
default bridge network; --network host overrides it.
2. health-retries 20 (not 10). timescaledb-ha:*-all does more init
work at boot than vanilla postgres; 50s isn't always enough.
Operator action required in the Gitea repo Settings before first run:
configure REGISTRY_USERNAME and REGISTRY_PASSWORD secrets (required for
push); optionally PORTAINER_WEBHOOK_URL (for auto-deploy).
Live verification deferred to first relevant commit. Documented in the
task spec's Done section: positive (clean snapshot → push succeeds)
and negative (malformed snapshot → halt before push) cases to validate
once CI runs.
ROADMAP marks 1.8 done. Phase 1 progress: 8/9 tasks complete (1.1–1.8);
only 1.9 (Rally Albania 2026 dogfood seed) remains before Phase 1 ships.