From f9b96efc6bc5d094c8dfbc030691e2c4884cb31d Mon Sep 17 00:00:00 2001 From: Julian Cuni Date: Sat, 2 May 2026 12:20:13 +0200 Subject: [PATCH] Document directus deployment + internal-only network model MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit trm/directus Phase 1 image is on the registry; trm/deploy's compose.yaml has been extended with a directus service block that shares the existing postgres service with processor (different tables, no contention). Bringing the architecture wiki up to date. wiki/entities/directus.md updates: - New "Deployment" section: links to the deploy compose, names the shared-Postgres model with processor, spells out the 5-step boot pipeline (db-init pre-schema → bootstrap → schema apply → db-init post-schema → start), notes first-boot (~60-90 s) vs warm-boot (~10 s) timing, points at deploy/README.md's first-deploy checklist. - New "Network exposure" subsection: directus is internal-only on stage / prod (expose: 8055 not ports:). A reverse proxy on the host or attached to trm_default terminates TLS and forwards the public domain to http://directus:8055. The asymmetry with tcp-ingestion (which must host-publish for GPS devices) is named. The dev compose's deliberate divergence (host-publishes 8055 for local iteration) is noted. - Schema management section: db-init split into pre-schema (db-init/) and post-schema (db-init-post/) phases. Post-schema landed because the composite UNIQUE constraints target Directus-managed tables that don't exist until schema apply runs. Both phases run via the same apply-db-init.sh with DB_INIT_DIR overridden between calls. - Destructive-apply hazard callout: corrected entrypoint step reference (now step 3/5, not 2/4) after the bootstrap-before-apply reorder that landed during CI iterations. log.md entry records the three CI iterations that surfaced three distinct production-breaking bugs (port collision; ordering + silent ERROR exit; ghost-collection apply conflict) — all caught by the dry-run gate before reaching stage. Ghost-collection stripping is now automated in scripts/schema-snapshot.sh so future captures don't regress. --- log.md | 11 +++++++++++ wiki/entities/directus.md | 39 ++++++++++++++++++++++++++++++++++++--- 2 files changed, 47 insertions(+), 3 deletions(-) diff --git a/log.md b/log.md index 7b9b045..49ff22f 100644 --- a/log.md +++ b/log.md @@ -97,3 +97,14 @@ Wired the source into [[directus-schema-draft]] (added to `sources:` frontmatter Open follow-ups flagged on the source page: §12.11 SLZ formula lives in the Supplementary Regulations (not the general regs), so we shouldn't hardcode a default; M-7 numbering bug (Veteran and Female driver share the code — likely a typo); neutralization zones (§8.12) not yet modeled in the schema. Index updated: new source row. No new entity/concept pages created — the doc supports existing pages rather than introducing new domain objects. + +## [2026-05-02] note | Directus deployment wired; entity page updated + +`trm/directus` Phase 1 shipped its image to the registry and the `trm/deploy` `compose.yaml` was extended with a `directus` service block (sharing the existing `postgres` service with [[processor]]). Updated [[directus]] entity page to reflect operational reality: + +- New "Deployment" section: links to the deploy compose, explains the shared-Postgres model with [[processor]], spells out the 5-step boot pipeline (db-init pre-schema → bootstrap → schema apply → db-init post-schema → start), notes first-boot vs warm-boot timing. +- Schema management section: db-init split into pre-schema (`db-init/`) and post-schema (`db-init-post/`) phases. Post-schema landed because the composite UNIQUE constraints target Directus-managed tables that don't exist until schema apply runs. +- Destructive-apply hazard callout: corrected entrypoint step reference (now step 3/5, not 2/4) after the bootstrap-before-apply reorder. +- New "Network exposure" subsection inside Deployment: directus is internal-only on stage / prod (`expose: 8055` not `ports:`). A reverse proxy (Traefik / Caddy / nginx) on the host or attached to `trm_default` terminates TLS and forwards the public domain to `http://directus:8055`. The asymmetry with [[tcp-ingestion]] (which must host-publish for GPS devices) is named, and the dev compose's deliberate divergence is noted. + +Three CI iterations on the directus repo's first push exposed three distinct production-breaking bugs (port collision; bootstrap-before-apply ordering + silent ERROR exit; ghost-collection apply conflict). The dry-run gate caught all of them before the image touched stage. The "ghost-collection" stripping is now automated in `scripts/schema-snapshot.sh` so future captures don't regress. diff --git a/wiki/entities/directus.md b/wiki/entities/directus.md index ac788d5..ee8506f 100644 --- a/wiki/entities/directus.md +++ b/wiki/entities/directus.md @@ -2,7 +2,7 @@ title: Directus type: entity created: 2026-04-30 -updated: 2026-05-01 +updated: 2026-05-02 sources: [gps-tracking-architecture, teltonika-ingestion-architecture] tags: [service, business-plane, api] --- @@ -61,13 +61,16 @@ See [[live-channel-architecture]] for the full design, including why this split Schema changes flow through Directus's native snapshot mechanism, kept under git. Two artifact directories: - **`snapshots/schema.yaml`** — Directus collections, fields, relations. Generated locally with `directus schema snapshot`. Applied at container startup with `directus schema apply --yes`. Idempotent — applies only the diff against the running DB. -- **`db-init/*.sql`** — schema Directus does not manage: the [[postgres-timescaledb]] positions hypertable, the `faulty` column, indexes that need PostGIS-specific syntax, or any DDL that predates Directus knowing about a collection. Numbered (`001_`, `002_`, …) and applied by a sidecar container or one-shot job ahead of `directus schema apply`. Tracked via a `migrations_applied` guard table to skip already-run files. +- **`db-init/*.sql`** (pre-schema) — schema Directus does not manage and that needs to exist *before* `directus schema apply` runs: the [[postgres-timescaledb]] positions hypertable, the `faulty` column, future PostGIS extension. Numbered (`001_`, `002_`, …). +- **`db-init-post/*.sql`** (post-schema) — DDL that targets Directus-managed tables and therefore must run *after* schema apply has created them: composite UNIQUE constraints (which the snapshot YAML format cannot capture). Numbered independently; the runner's `migrations_applied` guard table is shared. + +Both phases run via the same `apply-db-init.sh` script with `DB_INIT_DIR` overridden between calls. Each migration is wrapped in idempotent guards (`IF NOT EXISTS` / `pg_constraint` checks) so it's safe to absorb into environments where the constraint was already applied out-of-band. Local dev edits the schema in the admin UI, then snapshots before commit. CI builds the image with both directories baked in, spins a throwaway Postgres, and dry-runs `apply` to catch breakage before deploy. Production (Portainer) runs the same apply at container start; multi-env separation is a connection string, not different artifacts. This treats `schema.yaml` as the source of truth and the admin UI as its editor. Don't hand-edit `schema.yaml`; round-trip through the UI to keep the format consistent. -> **⚠️ Destructive-apply hazard.** `directus schema apply --yes` enforces the snapshot as the single source of truth: anything in the running DB that is *not* in the snapshot gets **deleted** during apply. This is correct for fresh-environment provisioning and prod, but a foot-gun during active schema development. The boot pipeline runs apply on every container start (entrypoint step 2/4 — see [[processor]] for the analogous staged-apply pattern). +> **⚠️ Destructive-apply hazard.** `directus schema apply --yes` enforces the snapshot as the single source of truth: anything in the running DB that is *not* in the snapshot gets **deleted** during apply. This is correct for fresh-environment provisioning and prod, but a foot-gun during active schema development. The boot pipeline runs apply on every container start (entrypoint step 3/5 — pre-schema db-init → bootstrap → schema apply → post-schema db-init → start; see [[processor]] for the analogous staged-apply pattern). > > **Operator rule:** *Never restart or rebuild the Directus container while there are uncommitted schema changes.* The flow is always: change in admin UI / via MCP → `pnpm run schema:snapshot` → commit → only then rebuild/restart. > @@ -77,6 +80,36 @@ This treats `schema.yaml` as the source of truth and the admin UI as its editor. Directus owns the `commands` collection and is the **single auth surface** for outbound device commands. The SPA inserts command rows; a Directus Flow routes them via Redis to the Ingestion instance holding the device's socket. See [[phase-2-commands]]. +## Deployment + +Wired into the platform stack at [`trm/deploy`](https://git.dev.microservices.al/trm/deploy)'s `compose.yaml` alongside [[redis-streams]] (the `redis` service), [[tcp-ingestion]], [[processor]], and [[postgres-timescaledb]] (the shared `postgres` service). Image built and pushed by [`trm/directus`](https://git.dev.microservices.al/trm/directus)'s Gitea workflow on every push to `main` that touches `snapshots/`, `db-init/`, `db-init-post/`, `extensions/`, `scripts/`, `entrypoint.sh`, `Dockerfile`, or the workflow file itself. CI dry-run gate validates the full boot pipeline against a throwaway Postgres before the image is published. + +Directus and [[processor]] share the same Postgres instance — different tables, no contention. Schema authority is split (positions hypertable owned by [[processor]]'s migration runner, everything else by Directus's snapshot), but the database is one. See [[postgres-timescaledb]] for the writer-side split. + +### Boot pipeline (5 steps) + +``` +1. db-init pre-schema → positions hypertable, faulty column, timescaledb extension +2. directus bootstrap → installs Directus system tables, seeds first admin if empty +3. directus schema apply → creates user collections from snapshots/schema.yaml +4. db-init post-schema → composite UNIQUE constraints on the user collections +5. pm2-runtime start → server up at :8055 +``` + +Steps 2–3 must be in this order: schema apply requires bootstrap to have created `directus_collections` first. Step 4 must run after step 3: the constraints reference tables Directus just created. The CI dry-run runs steps 1–4 (skips step 5 — pm2 boot adds time, tests nothing new beyond what 1–4 already validated). + +First boot on a fresh DB takes ~60–90 s (most of it is Directus's internal migrations during step 2). Warm boots are ~10 s — every step is idempotent. + +### Network exposure + +Internal-only on the deploy stack. The container exposes `:8055` to the `trm_default` Compose network but is **not** host-published. A reverse proxy (Traefik / Caddy / nginx) running on the host or attached to the same network terminates TLS and forwards the public domain to `http://directus:8055`. The proxy itself is not part of the trm stack — add it as a sibling Portainer stack or run it on the host. Direct host exposure of an admin UI is a privileged surface (full CRUD + permission policies + Flow execution) and is deliberately avoided. [[tcp-ingestion]] is the asymmetry — GPS devices connect to it directly so its TCP port must be host-published. + +The dev compose in `trm/directus` (`compose.dev.yaml`) does host-publish `:8055` for local iteration. Stage / prod do not. + +### First-deploy operator checklist + +Lives in `deploy/README.md`'s "First-deploy checklist" section. Generates per-environment `KEY` / `SECRET` / admin-user secrets, sets Portainer stack env vars, watches the boot logs, verifies the 12 user collections landed via the admin UI. The schema-as-code rule (no admin-UI schema edits on stage — they'll be DROPPED on next rebuild) is restated where it matters. + ## Failure mode Crash → telemetry continues to flow into the database; admin UI and SPA are unavailable; no telemetry is lost. See [[failure-domains]].