--- title: Live channel architecture type: concept created: 2026-05-01 updated: 2026-05-01 sources: [] tags: [architecture, realtime, websocket, telemetry-plane, decision] --- # Live channel architecture How live position data reaches the [[react-spa]] without violating [[plane-separation]] or coupling to [[directus]]'s failure domain. ## The question The SPA needs sub-second updates of device positions for live race views. Three things are non-negotiable: 1. The [[processor]] hot path stays direct-to-database — no API hop, no event-loop pressure on Directus. 2. [[directus]] is not in the telemetry hot path (per [[plane-separation]]). 3. The live channel must be authenticated and authorization-aware — only users with permission to see an event's positions get pushed updates. The naïve assumption is that [[directus]]'s built-in WebSocket subscriptions cover this. They do not. **Directus's subscription system only fires events for writes that go through its own `ItemsService`** (REST/GraphQL/Admin UI mutations). Direct `INSERT`s from the [[processor]] are invisible to subscribers — verified against Directus's documentation and source. The bridging assumption was wrong. This page documents how the platform actually delivers live positions. ## Options considered | Option | Live channel works | Hot path stays fast | Plane separation | Failure domain | |---|---|---|---|---| | Route Processor writes through Directus REST | Yes (Directus broadcasts own writes) | Compromised — every write through Directus event loop | Compromised | Coupled — Directus down blocks ingestion | | Bridge extension inside Directus (Redis → `WebSocketService.broadcast`) | Yes | Compromised — Directus runs the firehose consumer | Compromised | Coupled — Directus crash kills live channel | | **Processor exposes its own WebSocket endpoint** (chosen) | Yes | Preserved | Preserved | Decoupled — Directus down blocks only new authorizations | Option 3 wins because it preserves the architectural invariants that motivated [[plane-separation]] in the first place, while still leaning on [[directus]] for authentication and authorization. ## Chosen design Two cleanly-separated WebSocket channels, each playing to its strength: ``` ┌─ Telemetry plane ─────────────────────────┐ ┌─ Business plane ──────────────────────┐ │ │ │ │ │ Device → tcp-ingestion → Redis │ │ SPA admin action │ │ ↓ │ │ ↓ │ │ Processor │ │ Directus REST │ │ ↙ ↘ │ │ ↓ │ │ Postgres Processor's │ │ Postgres + Directus's WebSocket │ │ WebSocket │ │ ↓ │ │ ↓ │ │ SPA (admin UI, │ │ SPA │ │ leaderboard refresh, │ │ (live map) │ │ timing edits) │ └───────────────────────────────────────────┘ └───────────────────────────────────────┘ ``` - **High-volume telemetry** (positions): the Processor writes directly to Postgres and *also* fans out the same records to subscribed SPA clients over its own WebSocket endpoint. Stays in the telemetry plane end-to-end. - **Low-volume domain events** (timing records, stage results, manual entries, configuration): written via Directus's REST API; Directus's built-in subscription system broadcasts them through its WebSocket. Stays in the business plane. Each kind of data takes the path that fits it. No bridges, no extensions inside Directus. ## Authorization flow The Processor's WebSocket endpoint validates connections through Directus, but never asks Directus per record. ``` 1. SPA opens wss://processor.../live with a Directus-issued JWT. 2. Processor validates the JWT (round-trip to Directus's /users/me, or local verification with Directus's signing secret). Failure → close socket. 3. SPA sends {type: 'subscribe', event_id: 42}. 4. Processor calls Directus once: GET /items/events/42 with the user's token. 200 → allow subscription, store {client → event_id} in memory. 403 → reject subscription with a clear error. 5. For every position arriving on Redis, match against in-memory subscriptions and push to matched clients. Zero Directus calls in the hot path. ``` Connection-time auth is amortized over session lifetime. Permission re-checks happen on subscription change, not on every record. The hot path is bounded by `O(positions × subscribed-clients-per-event)` and runs entirely on the Processor's event loop with in-memory state. ## Failure modes | Failure | Effect on durable storage | Effect on live channel | |---|---|---| | Processor crashes | Records pile up in Redis; Phase 3 [[failure-domains]] resumption picks them up | Live channel dies until recovery | | Directus crashes | Unaffected (Processor writes direct to DB) | Existing connections keep working with cached permissions; **new subscriptions cannot be authorized** | | Postgres crashes | Writes block; Redis buffers up to `MAXLEN` | Unaffected — fan-out is independent of DB state | | Redis crashes | Whole pipeline stops | — | The Directus-down case is the architecturally important one. Routing writes through Directus would mean ingestion blocks. The chosen design keeps ingestion alive and only loses the ability to authorize *new* subscriptions — a much gentler failure. ## Multi-instance Processor Phase 3 of [[processor]] adds a second instance for HA. Each instance has its own connected SPA clients. A position arriving on instance A wouldn't naturally reach a client connected to instance B unless the broadcast path crosses instances. The clean shape: each Processor reads the [[redis-streams]] stream on **two consumer groups**: - `processor` — the durable-write group (work-split: only one instance handles each record for the DB write). - `live-broadcast-{instance_id}` — a per-instance fan-out group (every instance reads every record for fan-out). DB writes deduplicate by virtue of the consumer-group split; live broadcast deduplicates by virtue of clients being connected to exactly one instance. The Processor's [[redis-streams]] consumer code structure should anticipate this even at single-instance pilot scale. ## Scale considerations At pilot scale (≤500 devices per event, tens of viewers), the dominant costs are: - **Connection-time auth round-trips to Directus** — a few hundred per minute peak (race start). Trivial. - **In-memory subscription matching** — `O(records × subscribers)`; for 500 records/sec × 20 subscribers per event, ~10k messages/sec fan-out. Sustained on Node. When this becomes wrong: - Sustained > ~10k WebSocket messages/sec total → consider sharding the broadcast path or extracting to a dedicated gateway service. - Connection-time auth becomes a thundering herd at race start with thousands of viewers → cache JWT verification locally and shorten the Directus permission check via a token-with-scope pattern. - Multi-data-center deployment → revisit the consumer-group fan-out strategy; per-region broadcast may be cleaner than global. The escape hatch is well-defined: lift the WebSocket endpoint code out of the Processor into a standalone service that subscribes to the same `live-broadcast-*` consumer group. The Redis-stream-in / WebSocket-out contract doesn't change; only the host process does. ## What this means for adjacent components - [[processor]] grows a public-facing WebSocket endpoint in addition to its existing Redis consumer and Postgres writer. - [[directus]] keeps its built-in WebSocket subscriptions for tables it writes to. Its real-time delivery section no longer claims to broadcast direct writes from [[processor]] — that's a documented mistake corrected in this revision. - [[react-spa]] connects to two WebSocket endpoints: Directus for admin/business updates, Processor for live position firehose. Same JWT-based auth on both. Consumer-side throughput discipline (rAF coalescing of incoming positions before reducer dispatch) is documented in [[maps-architecture]] — without it the per-message dispatch pattern observed in [[traccar-maps-architecture]] cascades through selectors and `setData` at every position arrival. - The deploy stack publishes the Processor's WebSocket port (with TLS termination at a reverse proxy in front). ## Why not a single WebSocket endpoint It would be tempting to fold everything into a single SPA-facing WebSocket — either Processor or Directus. Both fail: - **Single Processor WebSocket** would require Processor to broadcast Directus-managed events, meaning Processor needs to subscribe to Directus's writes — which is exactly the problem we're avoiding for positions, in reverse. - **Single Directus WebSocket** is the bridge-extension option; it loses plane separation. Two endpoints, each serving the writes its plane manages, is the architecturally honest answer. ## Open questions - **JWT validation strategy.** Round-trip to Directus's `/users/me` (no shared secret, ~20ms per connection) vs. local verification with Directus's signing key (no round-trip, but a secret to share). Pilot can start with round-trip; revisit if connection rates climb. - **Subscription model.** Per-event, per-stage, per-organization, or arbitrary filter expressions? The simplest pilot model is "subscribe to one event by ID"; extensions land when SPA UX demands them. - **Permission staleness.** If a user is removed from an organization mid-session, do their existing subscriptions silently keep delivering until reconnect? Either re-validate periodically, or accept "trust the session" for pilot.