Tasks 1.6 + 1.7 — schema tooling + real entrypoint flow

Two parallel tasks landing together. The boot pipeline is now wired
end-to-end: db-init → schema apply → directus bootstrap → pm2-runtime.
Live-verified by booting a fresh compose stack to a serving Directus
admin UI on :8055.

Task 1.6 — snapshot tooling:
- scripts/schema-snapshot.sh — host-side, dev-time. Verifies docker
  is on PATH and the directus compose service is running, runs
  `node /directus/cli.js schema snapshot --yes` inside the container,
  copies the YAML out to ./snapshots/schema.yaml. Used after admin-UI
  schema changes to capture the new state for git commit.
- scripts/schema-apply.sh — image-side, boot-time. Reads
  /directus/snapshots/schema.yaml, runs a dry-run preview, then
  applies. Gracefully skips when the snapshot is absent or whitespace-
  only (Phase 1 first-boot path before tasks 1.4/1.5 produce
  collections). SNAPSHOT_PATH env var override for CI flexibility.
- snapshots/README.md — lifecycle doc; warns against hand-editing.

Task 1.7 — real entrypoint flow:
- entrypoint.sh rewritten from Phase 1.1's placeholder to the
  4-step boot per ROADMAP design rule #3:
    1/4 db-init          → /directus/scripts/apply-db-init.sh
    2/4 schema apply     → /directus/scripts/schema-apply.sh
    3/4 directus bootstrap → node /directus/cli.js bootstrap
    4/4 directus start   → exec pm2-runtime start ecosystem.config.cjs
  set -euo pipefail halts boot on any step's non-zero exit. Each step
  emits a [entrypoint] log marker so an operator reading container
  logs sees which step failed.

Bug found and fixed during live verification:
- Both 1.6 scripts initially called bare `directus schema ...` as if
  the CLI were on PATH. Upstream directus/directus:11.17.4 does NOT
  expose `directus` on PATH — invocation is via `node /directus/cli.js`,
  same pattern as the entrypoint's bootstrap step. Both scripts
  corrected. Also added -T to docker compose exec in schema-snapshot.sh
  so the script works in non-TTY contexts (CI).

Phase 5 follow-up (non-blocking) flagged in 07's Done section: Directus
warns "Collection 'positions' doesn't have a primary key column and
will be ignored". The positions table uses UNIQUE INDEX (device_id, ts)
matching processor's pattern, not a PK constraint. Means positions is
not auto-registered as a Directus collection — fine for Phase 1, but
the operator faulty-flag workflow will need a custom endpoint or
manual collection registration in Phase 5.

ROADMAP marks 1.6 + 1.7 done. Phase 1 progress: 5/9 tasks complete
(1.1, 1.2, 1.3, 1.6, 1.7); 1.4, 1.5, 1.8, 1.9 remain.
This commit is contained in:
2026-05-01 23:14:28 +02:00
parent 25a9731070
commit e22d9d489a
7 changed files with 538 additions and 22 deletions
@@ -82,4 +82,67 @@ Build a production-ready Directus image that bakes in the snapshot, db-init migr
## Done
(Fill in commit SHA + one-line note when this lands.)
Pending commit by user. `entrypoint.sh` replaced with production boot flow 2026-05-01.
**Deliverables produced:**
- `entrypoint.sh` — full boot flow: db-init → schema apply → bootstrap → pm2-runtime start. Mode `100755` preserved.
**Scope boundary honored:**
- Only `entrypoint.sh` was modified. `Dockerfile`, `compose.dev.yaml`, `package.json`, `apply-db-init.sh`, and everything under `scripts/`, `db-init/`, and `snapshots/` were untouched (parallel agent boundary for task 1.6).
**Deviations from task 1.7 spec:**
The task spec (`07-image-and-dockerfile.md`) shows a naive entrypoint with `exec /directus/cli.js start` as the final command. This was superseded by the implementation brief's explicit requirement (and task 1.1 Done section) to use `node /directus/cli.js bootstrap && pm2-runtime start /directus/ecosystem.config.cjs` — the upstream image's actual CMD. The final entrypoint:
1. Calls `bootstrap` as a discrete step 3 (after schema apply), then
2. Uses `exec pm2-runtime start /directus/ecosystem.config.cjs` as step 4.
This matches the ROADMAP design rule #3 apply order and preserves pm2's crash recovery and signal handling. `exec` replaces the bash process so SIGTERM from `docker stop` reaches pm2 directly without traversal through bash.
**Static acceptance criteria (passed):**
- File path: `C:\Users\Administrator\projects\trm\directus\entrypoint.sh`
- Shebang: `#!/usr/bin/env bash`
- `set -euo pipefail` present (line 22)
- `log()` helper uses `printf` — no trailing newline issues
- Apply order: db-init (1/4) → schema apply (2/4) → bootstrap (3/4) → pm2-runtime (4/4)
- `exec pm2-runtime` — bash process replaced; signals reach pm2 directly
- File mode: `100755` confirmed via `git ls-files -s entrypoint.sh` before and after staging
**Parallel agent status (task 1.6):**
`scripts/schema-apply.sh` was NOT present when this task ran — only `scripts/apply-db-init.sh` and `scripts/schema-snapshot.sh` existed in `scripts/`. Step 2/4 of the entrypoint calls `/directus/scripts/schema-apply.sh`. With `set -euo pipefail`, a missing script causes bash to exit non-zero at that line before attempting execution (the shell resolves the command, finds it executable, then the kernel `exec` fails with ENOENT → bash reports the error and exits 127). This means the full boot sequence **cannot be live-tested until task 1.6's `schema-apply.sh` lands**. The implementation is correct; the missing dependency is a parallel-agent timing issue, not a bug.
**Acceptance criteria — live testing deferred:**
Live acceptance criteria (Docker boot, curl health check, restart verification) cannot be completed until `scripts/schema-apply.sh` is produced by task 1.6. Re-run the full acceptance suite after both task 1.6 and 1.7 PRs land:
- `docker compose -f compose.dev.yaml down -v`
- `docker compose -f compose.dev.yaml build`
- `docker compose -f compose.dev.yaml up -d`
- Watch for: `[entrypoint] step 1/4` → `[db-init]` output → `[entrypoint] step 2/4` → schema-apply log → `[entrypoint] step 3/4` → bootstrap log → `[entrypoint] step 4/4` → PM2 startup → server at `:8055`
- `curl http://localhost:8055/server/health` → 200
- `docker compose -f compose.dev.yaml restart directus` → clean re-boot with "already initialized" paths
**Live-verification result (2026-05-01) — all four steps fired in order, server up at :8055:**
```
[entrypoint] step 1/4: db-init → 3 applied, 0 skipped
[entrypoint] step 2/4: directus schema apply → snapshot not found, skipping (correct for Phase 1)
[entrypoint] step 3/4: directus bootstrap → system tables created, first admin role + user added
[entrypoint] step 4/4: directus start (pm2-runtime)
PM2 log: App [directus:0] online
Server started at http://0.0.0.0:8055
```
**Bug fix during live verification:** the parallel `schema-apply.sh` invoked `directus` as if it were on PATH. The upstream image does NOT expose `directus` on PATH — invocation is via `node /directus/cli.js`. See task 1.6's Done section for the fix detail. Entrypoint itself was unaffected; only `schema-apply.sh` needed the change.
**Phase 5 follow-up note (not blocking Phase 1):**
Boot logs include `WARN: Collection "positions" doesn't have a primary key column and will be ignored` — three times (during bootstrap migrations + once at startup). Directus auto-discovers tables in the public schema and tries to register them as collections, but skips ones without a PRIMARY KEY constraint. The positions table uses `UNIQUE INDEX (device_id, ts)` instead of a PK (matching processor's pattern, see task 1.3 Done). Result: positions is **not** auto-registered as a Directus collection, so the cross-plane operator workflow (operator flips `faulty` flag via admin UI) cannot use the auto-collection path.
This is acceptable for Phase 1 (no operator UI yet). Phase 5 (custom extensions) needs a different mechanism for the faulty-flag workflow:
- **Option A**: a custom Directus endpoint (`POST /positions/:id/flag-faulty`) that performs the UPDATE directly via the database service. Bypasses Directus's collection abstraction; thin wrapper around SQL.
- **Option B**: register positions in `directus_collections` manually with a composite primary key configured (`device_id`, `ts`). Some Directus versions support this; verify against 11.17.4.
- **Option C**: add an `id BIGSERIAL PRIMARY KEY` surrogate column to positions. Cleanest for Directus, but introduces a column processor doesn't write and slightly increases per-row storage.
Phase 5's task file should pin one of these options before extension work begins.