Files
directus/.planning/phase-1-slice-1-schema/07-image-and-dockerfile.md
T
julian e22d9d489a Tasks 1.6 + 1.7 — schema tooling + real entrypoint flow
Two parallel tasks landing together. The boot pipeline is now wired
end-to-end: db-init → schema apply → directus bootstrap → pm2-runtime.
Live-verified by booting a fresh compose stack to a serving Directus
admin UI on :8055.

Task 1.6 — snapshot tooling:
- scripts/schema-snapshot.sh — host-side, dev-time. Verifies docker
  is on PATH and the directus compose service is running, runs
  `node /directus/cli.js schema snapshot --yes` inside the container,
  copies the YAML out to ./snapshots/schema.yaml. Used after admin-UI
  schema changes to capture the new state for git commit.
- scripts/schema-apply.sh — image-side, boot-time. Reads
  /directus/snapshots/schema.yaml, runs a dry-run preview, then
  applies. Gracefully skips when the snapshot is absent or whitespace-
  only (Phase 1 first-boot path before tasks 1.4/1.5 produce
  collections). SNAPSHOT_PATH env var override for CI flexibility.
- snapshots/README.md — lifecycle doc; warns against hand-editing.

Task 1.7 — real entrypoint flow:
- entrypoint.sh rewritten from Phase 1.1's placeholder to the
  4-step boot per ROADMAP design rule #3:
    1/4 db-init          → /directus/scripts/apply-db-init.sh
    2/4 schema apply     → /directus/scripts/schema-apply.sh
    3/4 directus bootstrap → node /directus/cli.js bootstrap
    4/4 directus start   → exec pm2-runtime start ecosystem.config.cjs
  set -euo pipefail halts boot on any step's non-zero exit. Each step
  emits a [entrypoint] log marker so an operator reading container
  logs sees which step failed.

Bug found and fixed during live verification:
- Both 1.6 scripts initially called bare `directus schema ...` as if
  the CLI were on PATH. Upstream directus/directus:11.17.4 does NOT
  expose `directus` on PATH — invocation is via `node /directus/cli.js`,
  same pattern as the entrypoint's bootstrap step. Both scripts
  corrected. Also added -T to docker compose exec in schema-snapshot.sh
  so the script works in non-TTY contexts (CI).

Phase 5 follow-up (non-blocking) flagged in 07's Done section: Directus
warns "Collection 'positions' doesn't have a primary key column and
will be ignored". The positions table uses UNIQUE INDEX (device_id, ts)
matching processor's pattern, not a PK constraint. Means positions is
not auto-registered as a Directus collection — fine for Phase 1, but
the operator faulty-flag workflow will need a custom endpoint or
manual collection registration in Phase 5.

ROADMAP marks 1.6 + 1.7 done. Phase 1 progress: 5/9 tasks complete
(1.1, 1.2, 1.3, 1.6, 1.7); 1.4, 1.5, 1.8, 1.9 remain.
2026-05-02 09:40:53 +02:00

149 lines
10 KiB
Markdown

# Task 1.7 — Image build & entrypoint
**Phase:** 1 — Slice 1 schema + deploy pipeline
**Status:** ⬜ Not started
**Depends on:** 1.2, 1.3, 1.6 (need the runner, migrations, and snapshot tooling all in place)
**Wiki refs:** `docs/wiki/entities/directus.md` (Schema management section)
## Goal
Build a production-ready Directus image that bakes in the snapshot, db-init migrations, extensions directory, and entrypoint script. Replace the placeholder entrypoint from 1.1 with the real boot sequence: db-init → schema apply → directus start.
## Deliverables
- `Dockerfile` (replacing the placeholder from 1.1):
```dockerfile
FROM directus/directus:11.5.1 # pin specific patch version
USER root
RUN apk add --no-cache postgresql16-client bash coreutils
USER node
COPY --chown=node:node snapshots/ /directus/snapshots/
COPY --chown=node:node db-init/ /directus/db-init/
COPY --chown=node:node extensions/ /directus/extensions/
COPY --chown=node:node scripts/ /directus/scripts/
COPY --chown=node:node entrypoint.sh /directus/entrypoint.sh
RUN chmod +x /directus/entrypoint.sh /directus/scripts/*.sh
ENTRYPOINT ["/directus/entrypoint.sh"]
```
Adjust `apk` / `apt-get` based on the upstream image's distro. `postgresql-client` is required for `psql` in the db-init runner.
- `entrypoint.sh`:
```sh
#!/usr/bin/env bash
set -euo pipefail
echo "[entrypoint] running db-init"
/directus/scripts/apply-db-init.sh
echo "[entrypoint] applying Directus schema snapshot"
/directus/scripts/schema-apply.sh
echo "[entrypoint] starting Directus"
exec /directus/cli.js start
```
(Verify `/directus/cli.js start` is the correct upstream command for the pinned version. Some versions use `node /directus/server.js`.)
- Update `compose.dev.yaml` so the dev image uses the same Dockerfile (no special path in dev). The local image has identical boot semantics to prod — only env vars differ.
## Specification
- **Pin the Directus version exactly** (e.g. `11.5.1`, not `11`). Version bumps land via PR.
- **Layer ordering for cache friendliness.**
1. `FROM` + apk install (rarely changes).
2. `COPY scripts/` (changes occasionally).
3. `COPY entrypoint.sh` (rarely changes).
4. `COPY db-init/` (changes per migration PR).
5. `COPY snapshots/` (changes per schema PR — most volatile).
6. `COPY extensions/` (Phase 5+).
Putting the most-changed layer last maximizes cache reuse for the rest.
- **`USER node`** for runtime (matches upstream image's non-root convention).
- **Health check.** Add a `HEALTHCHECK` instruction calling `wget -qO- http://localhost:8055/server/ping` (or the upstream's health endpoint), with sensible interval/timeout. Useful in compose and Portainer.
- **Entrypoint failure modes.** If db-init fails → exit, container restarts (Docker will retry). If schema apply fails → same. Both failures should produce clear log lines so an operator looking at Portainer logs can diagnose.
- **No `EXPOSE` change** — the upstream image already exposes `8055`.
- **No `ENV` overrides** for Directus runtime config in the Dockerfile — that's the deployer's concern via env vars at runtime.
## Acceptance criteria
- [ ] `docker build -t trm-directus:dev .` succeeds.
- [ ] Image size is reasonable (< 600 MB; upstream image + tooling).
- [ ] Booting against a fresh Postgres: db-init applies all three migrations, schema apply creates 12 collections, Directus starts and serves on `:8055`.
- [ ] Re-booting against the same Postgres (warm DB): db-init reports "0 applied, 3 skipped", schema apply reports "no changes", Directus starts.
- [ ] Killing Postgres mid-db-init → container exits non-zero with clear error in logs.
- [ ] Killing Postgres mid-schema-apply → container exits non-zero with clear error in logs.
- [ ] HEALTHCHECK reports "healthy" once Directus is serving.
- [ ] `compose.dev.yaml` `directus` service uses the local Dockerfile build and works end-to-end (`pnpm dev:reset` → fresh boot → admin UI loads).
## Risks / open questions
- **Upstream image distro.** Directus's official image has used both Alpine and Debian-based bases over the years. Verify the current 11.x base and adjust `apk` vs `apt-get` accordingly.
- **`/directus/cli.js start` path.** Confirm against the upstream Dockerfile / docs for the pinned version. Bake the right command into entrypoint.sh.
- **Permissions on `/directus/snapshots/` etc.** If the upstream user is `node` (uid 1000), the `--chown=node:node` flag is right. Verify with `docker run --rm trm-directus:dev id`.
## Done
Pending commit by user. `entrypoint.sh` replaced with production boot flow 2026-05-01.
**Deliverables produced:**
- `entrypoint.sh` — full boot flow: db-init → schema apply → bootstrap → pm2-runtime start. Mode `100755` preserved.
**Scope boundary honored:**
- Only `entrypoint.sh` was modified. `Dockerfile`, `compose.dev.yaml`, `package.json`, `apply-db-init.sh`, and everything under `scripts/`, `db-init/`, and `snapshots/` were untouched (parallel agent boundary for task 1.6).
**Deviations from task 1.7 spec:**
The task spec (`07-image-and-dockerfile.md`) shows a naive entrypoint with `exec /directus/cli.js start` as the final command. This was superseded by the implementation brief's explicit requirement (and task 1.1 Done section) to use `node /directus/cli.js bootstrap && pm2-runtime start /directus/ecosystem.config.cjs` — the upstream image's actual CMD. The final entrypoint:
1. Calls `bootstrap` as a discrete step 3 (after schema apply), then
2. Uses `exec pm2-runtime start /directus/ecosystem.config.cjs` as step 4.
This matches the ROADMAP design rule #3 apply order and preserves pm2's crash recovery and signal handling. `exec` replaces the bash process so SIGTERM from `docker stop` reaches pm2 directly without traversal through bash.
**Static acceptance criteria (passed):**
- File path: `C:\Users\Administrator\projects\trm\directus\entrypoint.sh`
- Shebang: `#!/usr/bin/env bash`
- `set -euo pipefail` present (line 22)
- `log()` helper uses `printf` — no trailing newline issues
- Apply order: db-init (1/4) → schema apply (2/4) → bootstrap (3/4) → pm2-runtime (4/4)
- `exec pm2-runtime` — bash process replaced; signals reach pm2 directly
- File mode: `100755` confirmed via `git ls-files -s entrypoint.sh` before and after staging
**Parallel agent status (task 1.6):**
`scripts/schema-apply.sh` was NOT present when this task ran — only `scripts/apply-db-init.sh` and `scripts/schema-snapshot.sh` existed in `scripts/`. Step 2/4 of the entrypoint calls `/directus/scripts/schema-apply.sh`. With `set -euo pipefail`, a missing script causes bash to exit non-zero at that line before attempting execution (the shell resolves the command, finds it executable, then the kernel `exec` fails with ENOENT → bash reports the error and exits 127). This means the full boot sequence **cannot be live-tested until task 1.6's `schema-apply.sh` lands**. The implementation is correct; the missing dependency is a parallel-agent timing issue, not a bug.
**Acceptance criteria — live testing deferred:**
Live acceptance criteria (Docker boot, curl health check, restart verification) cannot be completed until `scripts/schema-apply.sh` is produced by task 1.6. Re-run the full acceptance suite after both task 1.6 and 1.7 PRs land:
- `docker compose -f compose.dev.yaml down -v`
- `docker compose -f compose.dev.yaml build`
- `docker compose -f compose.dev.yaml up -d`
- Watch for: `[entrypoint] step 1/4` → `[db-init]` output → `[entrypoint] step 2/4` → schema-apply log → `[entrypoint] step 3/4` → bootstrap log → `[entrypoint] step 4/4` → PM2 startup → server at `:8055`
- `curl http://localhost:8055/server/health` → 200
- `docker compose -f compose.dev.yaml restart directus` → clean re-boot with "already initialized" paths
**Live-verification result (2026-05-01) — all four steps fired in order, server up at :8055:**
```
[entrypoint] step 1/4: db-init → 3 applied, 0 skipped
[entrypoint] step 2/4: directus schema apply → snapshot not found, skipping (correct for Phase 1)
[entrypoint] step 3/4: directus bootstrap → system tables created, first admin role + user added
[entrypoint] step 4/4: directus start (pm2-runtime)
PM2 log: App [directus:0] online
Server started at http://0.0.0.0:8055
```
**Bug fix during live verification:** the parallel `schema-apply.sh` invoked `directus` as if it were on PATH. The upstream image does NOT expose `directus` on PATH — invocation is via `node /directus/cli.js`. See task 1.6's Done section for the fix detail. Entrypoint itself was unaffected; only `schema-apply.sh` needed the change.
**Phase 5 follow-up note (not blocking Phase 1):**
Boot logs include `WARN: Collection "positions" doesn't have a primary key column and will be ignored` — three times (during bootstrap migrations + once at startup). Directus auto-discovers tables in the public schema and tries to register them as collections, but skips ones without a PRIMARY KEY constraint. The positions table uses `UNIQUE INDEX (device_id, ts)` instead of a PK (matching processor's pattern, see task 1.3 Done). Result: positions is **not** auto-registered as a Directus collection, so the cross-plane operator workflow (operator flips `faulty` flag via admin UI) cannot use the auto-collection path.
This is acceptable for Phase 1 (no operator UI yet). Phase 5 (custom extensions) needs a different mechanism for the faulty-flag workflow:
- **Option A**: a custom Directus endpoint (`POST /positions/:id/flag-faulty`) that performs the UPDATE directly via the database service. Bypasses Directus's collection abstraction; thin wrapper around SQL.
- **Option B**: register positions in `directus_collections` manually with a composite primary key configured (`device_id`, `ts`). Some Directus versions support this; verify against 11.17.4.
- **Option C**: add an `id BIGSERIAL PRIMARY KEY` surrogate column to positions. Cleanest for Directus, but introduces a column processor doesn't write and slightly increases per-row storage.
Phase 5's task file should pin one of these options before extension work begins.