Files
docs/wiki/entities/directus.md
T
julian f9b96efc6b Document directus deployment + internal-only network model
trm/directus Phase 1 image is on the registry; trm/deploy's
compose.yaml has been extended with a directus service block that
shares the existing postgres service with processor (different
tables, no contention). Bringing the architecture wiki up to date.

wiki/entities/directus.md updates:

- New "Deployment" section: links to the deploy compose, names the
  shared-Postgres model with processor, spells out the 5-step boot
  pipeline (db-init pre-schema → bootstrap → schema apply →
  db-init post-schema → start), notes first-boot (~60-90 s) vs
  warm-boot (~10 s) timing, points at deploy/README.md's first-deploy
  checklist.

- New "Network exposure" subsection: directus is internal-only on
  stage / prod (expose: 8055 not ports:). A reverse proxy on the
  host or attached to trm_default terminates TLS and forwards the
  public domain to http://directus:8055. The asymmetry with
  tcp-ingestion (which must host-publish for GPS devices) is named.
  The dev compose's deliberate divergence (host-publishes 8055 for
  local iteration) is noted.

- Schema management section: db-init split into pre-schema (db-init/)
  and post-schema (db-init-post/) phases. Post-schema landed because
  the composite UNIQUE constraints target Directus-managed tables
  that don't exist until schema apply runs. Both phases run via the
  same apply-db-init.sh with DB_INIT_DIR overridden between calls.

- Destructive-apply hazard callout: corrected entrypoint step
  reference (now step 3/5, not 2/4) after the bootstrap-before-apply
  reorder that landed during CI iterations.

log.md entry records the three CI iterations that surfaced three
distinct production-breaking bugs (port collision; ordering + silent
ERROR exit; ghost-collection apply conflict) — all caught by the
dry-run gate before reaching stage. Ghost-collection stripping is
now automated in scripts/schema-snapshot.sh so future captures
don't regress.
2026-05-02 12:20:13 +02:00

116 lines
9.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: Directus
type: entity
created: 2026-04-30
updated: 2026-05-02
sources: [gps-tracking-architecture, teltonika-ingestion-architecture]
tags: [service, business-plane, api]
---
# Directus
The **business plane**. Owns the relational schema, exposes it through auto-generated REST/GraphQL APIs, enforces role-based permissions, and provides the admin UI for back-office users.
## What Directus owns
- **Schema management** — collections, fields, relations, migrations.
- **API generation** — REST and GraphQL endpoints, no boilerplate.
- **Authentication and authorization** — users, roles, permissions, JWT issuance.
- **Real-time** — WebSocket subscriptions on collections for live UIs.
- **Workflow automation** — Flows for orchestrating side effects (notifications, integrations).
- **Admin UI** — complete back-office interface for operators.
## What Directus is NOT
Not in the telemetry hot path. Does not accept device connections, run a geofence engine, or hold per-device runtime state. Mixing those responsibilities into the same process would couple deployment lifecycles and contaminate failure domains. See [[plane-separation]].
## Schema ownership vs. write access
Directus is the schema **owner** even though [[processor]] writes directly to the database. New tables, columns, and relations are defined through Directus. Reasons:
- Auto-generated admin UI and APIs are derived from the schema Directus knows about. Tables created outside Directus are invisible to it.
- Permissions are configured per-collection in Directus.
- Audit columns (created_at, updated_at, user_created) follow Directus conventions; bypassing them inconsistently leads to subtle UI bugs.
This is a normal Directus deployment pattern — it does not require sole write access, only schema authority.
## Extensions
Used for things that genuinely belong in the business layer:
- **Hooks** that react to data changes (e.g. on event-write, trigger a notification Flow).
- **Custom endpoints** for permission-gated, audited operations that are not throughput-critical.
- **Custom admin UI panels** for back-office workflows (data review, manual overrides, bulk ops).
- **Flows** for declarative orchestration.
**Not** used for long-running listeners, persistent network sockets, or anything in the telemetry hot path.
## Real-time delivery
Directus's WebSocket subscriptions push live data to the [[react-spa]] **for writes that go through Directus's own services** (REST, GraphQL, Admin UI, Flows, custom endpoints). The mechanism is action hooks (`action('items.create', ...)`) firing from the `ItemsService`, not Postgres-level change detection.
This means **direct database writes from [[processor]] are not visible** to Directus's subscription system. The platform handles this with two cleanly-separated WebSocket channels:
- **[[directus]]'s WebSocket** — broadcasts business-plane events: timing edits, configuration changes, manual entries, anything operators do through the admin UI or via [[directus]]'s API.
- **[[processor]]'s WebSocket** — broadcasts the high-volume telemetry firehose: live position updates fanned out from [[redis-streams]] directly to subscribed [[react-spa]] clients. Authentication uses Directus-issued JWTs; per-subscription authorization delegates to Directus once at subscribe time.
See [[live-channel-architecture]] for the full design, including why this split is preferable to routing telemetry writes through [[directus]]'s API or running a bridging extension inside [[directus]].
## Schema management — snapshot/apply pipeline
Schema changes flow through Directus's native snapshot mechanism, kept under git. Two artifact directories:
- **`snapshots/schema.yaml`** — Directus collections, fields, relations. Generated locally with `directus schema snapshot`. Applied at container startup with `directus schema apply --yes`. Idempotent — applies only the diff against the running DB.
- **`db-init/*.sql`** (pre-schema) — schema Directus does not manage and that needs to exist *before* `directus schema apply` runs: the [[postgres-timescaledb]] positions hypertable, the `faulty` column, future PostGIS extension. Numbered (`001_`, `002_`, …).
- **`db-init-post/*.sql`** (post-schema) — DDL that targets Directus-managed tables and therefore must run *after* schema apply has created them: composite UNIQUE constraints (which the snapshot YAML format cannot capture). Numbered independently; the runner's `migrations_applied` guard table is shared.
Both phases run via the same `apply-db-init.sh` script with `DB_INIT_DIR` overridden between calls. Each migration is wrapped in idempotent guards (`IF NOT EXISTS` / `pg_constraint` checks) so it's safe to absorb into environments where the constraint was already applied out-of-band.
Local dev edits the schema in the admin UI, then snapshots before commit. CI builds the image with both directories baked in, spins a throwaway Postgres, and dry-runs `apply` to catch breakage before deploy. Production (Portainer) runs the same apply at container start; multi-env separation is a connection string, not different artifacts.
This treats `schema.yaml` as the source of truth and the admin UI as its editor. Don't hand-edit `schema.yaml`; round-trip through the UI to keep the format consistent.
> **⚠️ Destructive-apply hazard.** `directus schema apply --yes` enforces the snapshot as the single source of truth: anything in the running DB that is *not* in the snapshot gets **deleted** during apply. This is correct for fresh-environment provisioning and prod, but a foot-gun during active schema development. The boot pipeline runs apply on every container start (entrypoint step 3/5 — pre-schema db-init → bootstrap → schema apply → post-schema db-init → start; see [[processor]] for the analogous staged-apply pattern).
>
> **Operator rule:** *Never restart or rebuild the Directus container while there are uncommitted schema changes.* The flow is always: change in admin UI / via MCP → `pnpm run schema:snapshot` → commit → only then rebuild/restart.
>
> A real incident hit this during Phase 1 task 1.5: 5 newly-created collections were destroyed by a rebuild because the baked-in snapshot was stale. Recovery was straightforward in dev (recreate via MCP, snapshot, commit) but would be data-loss in prod. CI dry-run (Phase 1 task 1.8) catches snapshot drift before it reaches stage. A long-term mitigation — `DIRECTUS_SCHEMA_APPLY_MODE` env var with `auto` / `dry-run` / `skip` modes — is on the Phase 3 hardening roadmap.
## Phase 2 role
Directus owns the `commands` collection and is the **single auth surface** for outbound device commands. The SPA inserts command rows; a Directus Flow routes them via Redis to the Ingestion instance holding the device's socket. See [[phase-2-commands]].
## Deployment
Wired into the platform stack at [`trm/deploy`](https://git.dev.microservices.al/trm/deploy)'s `compose.yaml` alongside [[redis-streams]] (the `redis` service), [[tcp-ingestion]], [[processor]], and [[postgres-timescaledb]] (the shared `postgres` service). Image built and pushed by [`trm/directus`](https://git.dev.microservices.al/trm/directus)'s Gitea workflow on every push to `main` that touches `snapshots/`, `db-init/`, `db-init-post/`, `extensions/`, `scripts/`, `entrypoint.sh`, `Dockerfile`, or the workflow file itself. CI dry-run gate validates the full boot pipeline against a throwaway Postgres before the image is published.
Directus and [[processor]] share the same Postgres instance — different tables, no contention. Schema authority is split (positions hypertable owned by [[processor]]'s migration runner, everything else by Directus's snapshot), but the database is one. See [[postgres-timescaledb]] for the writer-side split.
### Boot pipeline (5 steps)
```
1. db-init pre-schema → positions hypertable, faulty column, timescaledb extension
2. directus bootstrap → installs Directus system tables, seeds first admin if empty
3. directus schema apply → creates user collections from snapshots/schema.yaml
4. db-init post-schema → composite UNIQUE constraints on the user collections
5. pm2-runtime start → server up at :8055
```
Steps 23 must be in this order: schema apply requires bootstrap to have created `directus_collections` first. Step 4 must run after step 3: the constraints reference tables Directus just created. The CI dry-run runs steps 14 (skips step 5 — pm2 boot adds time, tests nothing new beyond what 14 already validated).
First boot on a fresh DB takes ~6090 s (most of it is Directus's internal migrations during step 2). Warm boots are ~10 s — every step is idempotent.
### Network exposure
Internal-only on the deploy stack. The container exposes `:8055` to the `trm_default` Compose network but is **not** host-published. A reverse proxy (Traefik / Caddy / nginx) running on the host or attached to the same network terminates TLS and forwards the public domain to `http://directus:8055`. The proxy itself is not part of the trm stack — add it as a sibling Portainer stack or run it on the host. Direct host exposure of an admin UI is a privileged surface (full CRUD + permission policies + Flow execution) and is deliberately avoided. [[tcp-ingestion]] is the asymmetry — GPS devices connect to it directly so its TCP port must be host-published.
The dev compose in `trm/directus` (`compose.dev.yaml`) does host-publish `:8055` for local iteration. Stage / prod do not.
### First-deploy operator checklist
Lives in `deploy/README.md`'s "First-deploy checklist" section. Generates per-environment `KEY` / `SECRET` / admin-user secrets, sets Portainer stack env vars, watches the boot logs, verifies the 12 user collections landed via the admin UI. The schema-as-code rule (no admin-UI schema edits on stage — they'll be DROPPED on next rebuild) is restated where it matters.
## Failure mode
Crash → telemetry continues to flow into the database; admin UI and SPA are unavailable; no telemetry is lost. See [[failure-domains]].