--- title: Live channel architecture type: concept created: 2026-05-01 updated: 2026-05-03 sources: [] tags: [architecture, realtime, websocket, telemetry-plane, decision] --- # Live channel architecture How live position data reaches the [[react-spa]] without violating [[plane-separation]] or coupling to [[directus]]'s failure domain. ## The question The SPA needs sub-second updates of device positions for live race views. Three things are non-negotiable: 1. The [[processor]] hot path stays direct-to-database — no API hop, no event-loop pressure on Directus. 2. [[directus]] is not in the telemetry hot path (per [[plane-separation]]). 3. The live channel must be authenticated and authorization-aware — only users with permission to see an event's positions get pushed updates. The naïve assumption is that [[directus]]'s built-in WebSocket subscriptions cover this. They do not. **Directus's subscription system only fires events for writes that go through its own `ItemsService`** (REST/GraphQL/Admin UI mutations). Direct `INSERT`s from the [[processor]] are invisible to subscribers — verified against Directus's documentation and source. The bridging assumption was wrong. This page documents how the platform actually delivers live positions. ## Options considered | Option | Live channel works | Hot path stays fast | Plane separation | Failure domain | |---|---|---|---|---| | Route Processor writes through Directus REST | Yes (Directus broadcasts own writes) | Compromised — every write through Directus event loop | Compromised | Coupled — Directus down blocks ingestion | | Bridge extension inside Directus (Redis → `WebSocketService.broadcast`) | Yes | Compromised — Directus runs the firehose consumer | Compromised | Coupled — Directus crash kills live channel | | **Processor exposes its own WebSocket endpoint** (chosen) | Yes | Preserved | Preserved | Decoupled — Directus down blocks only new authorizations | Option 3 wins because it preserves the architectural invariants that motivated [[plane-separation]] in the first place, while still leaning on [[directus]] for authentication and authorization. ## Chosen design Two cleanly-separated WebSocket channels, each playing to its strength: ``` ┌─ Telemetry plane ─────────────────────────┐ ┌─ Business plane ──────────────────────┐ │ │ │ │ │ Device → tcp-ingestion → Redis │ │ SPA admin action │ │ ↓ │ │ ↓ │ │ Processor │ │ Directus REST │ │ ↙ ↘ │ │ ↓ │ │ Postgres Processor's │ │ Postgres + Directus's WebSocket │ │ WebSocket │ │ ↓ │ │ ↓ │ │ SPA (admin UI, │ │ SPA │ │ leaderboard refresh, │ │ (live map) │ │ timing edits) │ └───────────────────────────────────────────┘ └───────────────────────────────────────┘ ``` - **High-volume telemetry** (positions): the Processor writes directly to Postgres and *also* fans out the same records to subscribed SPA clients over its own WebSocket endpoint. Stays in the telemetry plane end-to-end. - **Low-volume domain events** (timing records, stage results, manual entries, configuration): written via Directus's REST API; Directus's built-in subscription system broadcasts them through its WebSocket. Stays in the business plane. Each kind of data takes the path that fits it. No bridges, no extensions inside Directus. ## Authorization flow The Processor's WebSocket endpoint validates connections through Directus, but never asks Directus per record. The handshake is **cookie-based and same-origin** — see [[processor-ws-contract]] §"Auth handshake" for the wire-level spec. ``` 1. SPA opens wss:///ws-live (relative URL; same origin as Directus). Browser auto-attaches the httpOnly Directus session cookie. 2. Processor reads the entire Cookie header from the upgrade request and forwards it to Directus GET /users/me. 200 → bind the connection to (id, role). 401/403 → close the socket with code 4401 (unauthorized). 3. SPA sends {type: 'subscribe', topic: 'event:'}. 4. Processor checks the user's organization_users membership against the event's organization_id (one cached lookup per event). 200 → store {client → topic}; reply with the latest-position snapshot. 403 → reply with {type: 'error', code: 'forbidden'}. 5. For every position arriving on Redis, match against in-memory subscriptions and push to matched clients. Zero Directus calls in the hot path. ``` Connection-time auth is amortized over session lifetime. Permission re-checks happen on subscription change, not on every record. The hot path is bounded by `O(positions × subscribed-clients-per-event)` and runs entirely on the Processor's event loop with in-memory state. > Earlier revisions of this page described JWT-in-URL auth. That predated [[react-spa]]'s switch to Directus SDK session-mode auth (see log entry 2026-05-02 "Auth-mode wiki realignment"). The current implementation is cookie-based; tokens never appear in WebSocket URLs (which would land them in proxy logs). ## Failure modes | Failure | Effect on durable storage | Effect on live channel | |---|---|---| | Processor crashes | Records pile up in Redis; Phase 3 [[failure-domains]] resumption picks them up | Live channel dies until recovery | | Directus crashes | Unaffected (Processor writes direct to DB) | Existing connections keep working with cached permissions; **new subscriptions cannot be authorized** | | Postgres crashes | Writes block; Redis buffers up to `MAXLEN` | Unaffected — fan-out is independent of DB state | | Redis crashes | Whole pipeline stops | — | The Directus-down case is the architecturally important one. Routing writes through Directus would mean ingestion blocks. The chosen design keeps ingestion alive and only loses the ability to authorize *new* subscriptions — a much gentler failure. ## Multi-instance Processor Phase 3 of [[processor]] adds a second instance for HA. Each instance has its own connected SPA clients. A position arriving on instance A wouldn't naturally reach a client connected to instance B unless the broadcast path crosses instances. The clean shape: each Processor reads the [[redis-streams]] stream on **two consumer groups**: - `processor` — the durable-write group (work-split: only one instance handles each record for the DB write). - `live-broadcast-{instance_id}` — a per-instance fan-out group (every instance reads every record for fan-out). DB writes deduplicate by virtue of the consumer-group split; live broadcast deduplicates by virtue of clients being connected to exactly one instance. The Processor's [[redis-streams]] consumer code structure should anticipate this even at single-instance pilot scale. ## Scale considerations At pilot scale (≤500 devices per event, tens of viewers), the dominant costs are: - **Connection-time auth round-trips to Directus** — a few hundred per minute peak (race start). Trivial. - **In-memory subscription matching** — `O(records × subscribers)`; for 500 records/sec × 20 subscribers per event, ~10k messages/sec fan-out. Sustained on Node. When this becomes wrong: - Sustained > ~10k WebSocket messages/sec total → consider sharding the broadcast path or extracting to a dedicated gateway service. - Connection-time auth becomes a thundering herd at race start with thousands of viewers → cache the `/users/me` validation result for the connection's lifetime and shorten the Directus permission check via a token-with-scope pattern. Pilot scale doesn't need this; revisit when measured. - Multi-data-center deployment → revisit the consumer-group fan-out strategy; per-region broadcast may be cleaner than global. The escape hatch is well-defined: lift the WebSocket endpoint code out of the Processor into a standalone service that subscribes to the same `live-broadcast-*` consumer group. The Redis-stream-in / WebSocket-out contract doesn't change; only the host process does. ## What this means for adjacent components - [[processor]] grows a public-facing WebSocket endpoint in addition to its existing Redis consumer and Postgres writer. - [[directus]] keeps its built-in WebSocket subscriptions for tables it writes to. Its real-time delivery section no longer claims to broadcast direct writes from [[processor]] — that's a documented mistake corrected in this revision. - [[react-spa]] connects to two WebSocket endpoints: Directus at `/ws-business` for admin/business updates, Processor at `/ws-live` for live position firehose. Same-origin httpOnly Directus session cookie on both — no separate auth artifact for the live channel. Consumer-side throughput discipline (rAF coalescing of incoming positions before reducer dispatch) is documented in [[maps-architecture]] — without it the per-message dispatch pattern observed in [[traccar-maps-architecture]] cascades through selectors and `setData` at every position arrival. - The deploy stack publishes the Processor's WebSocket port (with TLS termination at a reverse proxy in front). ## Why not a single WebSocket endpoint It would be tempting to fold everything into a single SPA-facing WebSocket — either Processor or Directus. Both fail: - **Single Processor WebSocket** would require Processor to broadcast Directus-managed events, meaning Processor needs to subscribe to Directus's writes — which is exactly the problem we're avoiding for positions, in reverse. - **Single Directus WebSocket** is the bridge-extension option; it loses plane separation. Two endpoints, each serving the writes its plane manages, is the architecturally honest answer. ## Open questions - **Auth caching strategy.** Currently every WebSocket connection round-trips to Directus's `/users/me` (~20ms over the internal network) to validate the forwarded session cookie. At pilot scale (≤500 viewers, low reconnect rate) this is trivial. Caching the validation per-connection-lifetime is the cheap optimisation; a stateless verification path (shared signing secret) is the heavier one. Defer until measurements demand it. - **Subscription model.** Per-event, per-stage, per-organization, or arbitrary filter expressions? The simplest pilot model is "subscribe to one event by ID"; extensions land when SPA UX demands them. - **Permission staleness.** If a user is removed from an organization mid-session, do their existing subscriptions silently keep delivering until reconnect? Either re-validate periodically, or accept "trust the session" for pilot.