trm/docs

Files

T

julian f9b96efc6b Document directus deployment + internal-only network model

trm/directus Phase 1 image is on the registry; trm/deploy's
compose.yaml has been extended with a directus service block that
shares the existing postgres service with processor (different
tables, no contention). Bringing the architecture wiki up to date.

wiki/entities/directus.md updates:

- New "Deployment" section: links to the deploy compose, names the
  shared-Postgres model with processor, spells out the 5-step boot
  pipeline (db-init pre-schema → bootstrap → schema apply →
  db-init post-schema → start), notes first-boot (~60-90 s) vs
  warm-boot (~10 s) timing, points at deploy/README.md's first-deploy
  checklist.

- New "Network exposure" subsection: directus is internal-only on
  stage / prod (expose: 8055 not ports:). A reverse proxy on the
  host or attached to trm_default terminates TLS and forwards the
  public domain to http://directus:8055. The asymmetry with
  tcp-ingestion (which must host-publish for GPS devices) is named.
  The dev compose's deliberate divergence (host-publishes 8055 for
  local iteration) is noted.

- Schema management section: db-init split into pre-schema (db-init/)
  and post-schema (db-init-post/) phases. Post-schema landed because
  the composite UNIQUE constraints target Directus-managed tables
  that don't exist until schema apply runs. Both phases run via the
  same apply-db-init.sh with DB_INIT_DIR overridden between calls.

- Destructive-apply hazard callout: corrected entrypoint step
  reference (now step 3/5, not 2/4) after the bootstrap-before-apply
  reorder that landed during CI iterations.

log.md entry records the three CI iterations that surfaced three
distinct production-breaking bugs (port collision; ordering + silent
ERROR exit; ghost-collection apply conflict) — all caught by the
dry-run gate before reaching stage. Ghost-collection stripping is
now automated in scripts/schema-snapshot.sh so future captures
don't regress.

2026-05-02 12:20:13 +02:00

9.9 KiB

Raw Permalink Blame History

title, type, created, updated, sources, tags

title

type

created

updated

sources

Directus

The business plane. Owns the relational schema, exposes it through auto-generated REST/GraphQL APIs, enforces role-based permissions, and provides the admin UI for back-office users.

What Directus owns

Schema management — collections, fields, relations, migrations.
API generation — REST and GraphQL endpoints, no boilerplate.
Authentication and authorization — users, roles, permissions, JWT issuance.
Real-time — WebSocket subscriptions on collections for live UIs.
Workflow automation — Flows for orchestrating side effects (notifications, integrations).
Admin UI — complete back-office interface for operators.

What Directus is NOT

Not in the telemetry hot path. Does not accept device connections, run a geofence engine, or hold per-device runtime state. Mixing those responsibilities into the same process would couple deployment lifecycles and contaminate failure domains. See plane-separation.

Schema ownership vs. write access

Directus is the schema owner even though processor writes directly to the database. New tables, columns, and relations are defined through Directus. Reasons:

Auto-generated admin UI and APIs are derived from the schema Directus knows about. Tables created outside Directus are invisible to it.
Permissions are configured per-collection in Directus.
Audit columns (created_at, updated_at, user_created) follow Directus conventions; bypassing them inconsistently leads to subtle UI bugs.

This is a normal Directus deployment pattern — it does not require sole write access, only schema authority.

Extensions

Used for things that genuinely belong in the business layer:

Hooks that react to data changes (e.g. on event-write, trigger a notification Flow).
Custom endpoints for permission-gated, audited operations that are not throughput-critical.
Custom admin UI panels for back-office workflows (data review, manual overrides, bulk ops).
Flows for declarative orchestration.

Not used for long-running listeners, persistent network sockets, or anything in the telemetry hot path.

Real-time delivery

Directus's WebSocket subscriptions push live data to the react-spa for writes that go through Directus's own services (REST, GraphQL, Admin UI, Flows, custom endpoints). The mechanism is action hooks (action('items.create', ...)) firing from the ItemsService, not Postgres-level change detection.

This means direct database writes from processor are not visible to Directus's subscription system. The platform handles this with two cleanly-separated WebSocket channels:

directus's WebSocket — broadcasts business-plane events: timing edits, configuration changes, manual entries, anything operators do through the admin UI or via directus's API.
processor's WebSocket — broadcasts the high-volume telemetry firehose: live position updates fanned out from redis-streams directly to subscribed react-spa clients. Authentication uses Directus-issued JWTs; per-subscription authorization delegates to Directus once at subscribe time.

See live-channel-architecture for the full design, including why this split is preferable to routing telemetry writes through directus's API or running a bridging extension inside directus.

Schema management — snapshot/apply pipeline

Schema changes flow through Directus's native snapshot mechanism, kept under git. Two artifact directories:

snapshots/schema.yaml — Directus collections, fields, relations. Generated locally with directus schema snapshot. Applied at container startup with directus schema apply --yes. Idempotent — applies only the diff against the running DB.
db-init/*.sql (pre-schema) — schema Directus does not manage and that needs to exist before directus schema apply runs: the postgres-timescaledb positions hypertable, the faulty column, future PostGIS extension. Numbered (001_, 002_, …).
db-init-post/*.sql (post-schema) — DDL that targets Directus-managed tables and therefore must run after schema apply has created them: composite UNIQUE constraints (which the snapshot YAML format cannot capture). Numbered independently; the runner's migrations_applied guard table is shared.

Both phases run via the same apply-db-init.sh script with DB_INIT_DIR overridden between calls. Each migration is wrapped in idempotent guards (IF NOT EXISTS / pg_constraint checks) so it's safe to absorb into environments where the constraint was already applied out-of-band.

Local dev edits the schema in the admin UI, then snapshots before commit. CI builds the image with both directories baked in, spins a throwaway Postgres, and dry-runs apply to catch breakage before deploy. Production (Portainer) runs the same apply at container start; multi-env separation is a connection string, not different artifacts.

This treats schema.yaml as the source of truth and the admin UI as its editor. Don't hand-edit schema.yaml; round-trip through the UI to keep the format consistent.

⚠️ Destructive-apply hazard. directus schema apply --yes enforces the snapshot as the single source of truth: anything in the running DB that is not in the snapshot gets deleted during apply. This is correct for fresh-environment provisioning and prod, but a foot-gun during active schema development. The boot pipeline runs apply on every container start (entrypoint step 3/5 — pre-schema db-init → bootstrap → schema apply → post-schema db-init → start; see processor for the analogous staged-apply pattern).

Operator rule: Never restart or rebuild the Directus container while there are uncommitted schema changes. The flow is always: change in admin UI / via MCP → pnpm run schema:snapshot → commit → only then rebuild/restart.

A real incident hit this during Phase 1 task 1.5: 5 newly-created collections were destroyed by a rebuild because the baked-in snapshot was stale. Recovery was straightforward in dev (recreate via MCP, snapshot, commit) but would be data-loss in prod. CI dry-run (Phase 1 task 1.8) catches snapshot drift before it reaches stage. A long-term mitigation — DIRECTUS_SCHEMA_APPLY_MODE env var with auto / dry-run / skip modes — is on the Phase 3 hardening roadmap.

Phase 2 role

Directus owns the commands collection and is the single auth surface for outbound device commands. The SPA inserts command rows; a Directus Flow routes them via Redis to the Ingestion instance holding the device's socket. See phase-2-commands.

Deployment

Wired into the platform stack at trm/deploy's compose.yaml alongside redis-streams (the redis service), tcp-ingestion, processor, and postgres-timescaledb (the shared postgres service). Image built and pushed by trm/directus's Gitea workflow on every push to main that touches snapshots/, db-init/, db-init-post/, extensions/, scripts/, entrypoint.sh, Dockerfile, or the workflow file itself. CI dry-run gate validates the full boot pipeline against a throwaway Postgres before the image is published.

Directus and processor share the same Postgres instance — different tables, no contention. Schema authority is split (positions hypertable owned by processor's migration runner, everything else by Directus's snapshot), but the database is one. See postgres-timescaledb for the writer-side split.

Boot pipeline (5 steps)

1. db-init pre-schema    → positions hypertable, faulty column, timescaledb extension
2. directus bootstrap    → installs Directus system tables, seeds first admin if empty
3. directus schema apply → creates user collections from snapshots/schema.yaml
4. db-init post-schema   → composite UNIQUE constraints on the user collections
5. pm2-runtime start     → server up at :8055

Steps 2–3 must be in this order: schema apply requires bootstrap to have created directus_collections first. Step 4 must run after step 3: the constraints reference tables Directus just created. The CI dry-run runs steps 1–4 (skips step 5 — pm2 boot adds time, tests nothing new beyond what 1–4 already validated).

First boot on a fresh DB takes ~60–90 s (most of it is Directus's internal migrations during step 2). Warm boots are ~10 s — every step is idempotent.

Network exposure

Internal-only on the deploy stack. The container exposes :8055 to the trm_default Compose network but is not host-published. A reverse proxy (Traefik / Caddy / nginx) running on the host or attached to the same network terminates TLS and forwards the public domain to http://directus:8055. The proxy itself is not part of the trm stack — add it as a sibling Portainer stack or run it on the host. Direct host exposure of an admin UI is a privileged surface (full CRUD + permission policies + Flow execution) and is deliberately avoided. tcp-ingestion is the asymmetry — GPS devices connect to it directly so its TCP port must be host-published.

The dev compose in trm/directus (compose.dev.yaml) does host-publish :8055 for local iteration. Stage / prod do not.

First-deploy operator checklist

Lives in deploy/README.md's "First-deploy checklist" section. Generates per-environment KEY / SECRET / admin-user secrets, sets Portainer stack env vars, watches the boot logs, verifies the 12 user collections landed via the admin UI. The schema-as-code rule (no admin-UI schema edits on stage — they'll be DROPPED on next rebuild) is restated where it matters.

Failure mode

Crash → telemetry continues to flow into the database; admin UI and SPA are unavailable; no telemetry is lost. See failure-domains.

9.9 KiB Raw Permalink Blame History Unescape Escape