142 lines
11 KiB
Markdown
142 lines
11 KiB
Markdown
---
|
||
title: Live channel architecture
|
||
type: concept
|
||
created: 2026-05-01
|
||
updated: 2026-05-03
|
||
sources: []
|
||
tags: [architecture, realtime, websocket, telemetry-plane, decision]
|
||
---
|
||
|
||
# Live channel architecture
|
||
|
||
How live position data reaches the [[react-spa]] without violating [[plane-separation]] or coupling to [[directus]]'s failure domain.
|
||
|
||
## The question
|
||
|
||
The SPA needs sub-second updates of device positions for live race views. Three things are non-negotiable:
|
||
|
||
1. The [[processor]] hot path stays direct-to-database — no API hop, no event-loop pressure on Directus.
|
||
2. [[directus]] is not in the telemetry hot path (per [[plane-separation]]).
|
||
3. The live channel must be authenticated and authorization-aware — only users with permission to see an event's positions get pushed updates.
|
||
|
||
The naïve assumption is that [[directus]]'s built-in WebSocket subscriptions cover this. They do not. **Directus's subscription system only fires events for writes that go through its own `ItemsService`** (REST/GraphQL/Admin UI mutations). Direct `INSERT`s from the [[processor]] are invisible to subscribers — verified against Directus's documentation and source. The bridging assumption was wrong.
|
||
|
||
This page documents how the platform actually delivers live positions.
|
||
|
||
## Options considered
|
||
|
||
| Option | Live channel works | Hot path stays fast | Plane separation | Failure domain |
|
||
|---|---|---|---|---|
|
||
| Route Processor writes through Directus REST | Yes (Directus broadcasts own writes) | Compromised — every write through Directus event loop | Compromised | Coupled — Directus down blocks ingestion |
|
||
| Bridge extension inside Directus (Redis → `WebSocketService.broadcast`) | Yes | Compromised — Directus runs the firehose consumer | Compromised | Coupled — Directus crash kills live channel |
|
||
| **Processor exposes its own WebSocket endpoint** (chosen) | Yes | Preserved | Preserved | Decoupled — Directus down blocks only new authorizations |
|
||
|
||
Option 3 wins because it preserves the architectural invariants that motivated [[plane-separation]] in the first place, while still leaning on [[directus]] for authentication and authorization.
|
||
|
||
## Chosen design
|
||
|
||
Two cleanly-separated WebSocket channels, each playing to its strength:
|
||
|
||
```
|
||
┌─ Telemetry plane ─────────────────────────┐ ┌─ Business plane ──────────────────────┐
|
||
│ │ │ │
|
||
│ Device → tcp-ingestion → Redis │ │ SPA admin action │
|
||
│ ↓ │ │ ↓ │
|
||
│ Processor │ │ Directus REST │
|
||
│ ↙ ↘ │ │ ↓ │
|
||
│ Postgres Processor's │ │ Postgres + Directus's WebSocket │
|
||
│ WebSocket │ │ ↓ │
|
||
│ ↓ │ │ SPA (admin UI, │
|
||
│ SPA │ │ leaderboard refresh, │
|
||
│ (live map) │ │ timing edits) │
|
||
└───────────────────────────────────────────┘ └───────────────────────────────────────┘
|
||
```
|
||
|
||
- **High-volume telemetry** (positions): the Processor writes directly to Postgres and *also* fans out the same records to subscribed SPA clients over its own WebSocket endpoint. Stays in the telemetry plane end-to-end.
|
||
- **Low-volume domain events** (timing records, stage results, manual entries, configuration): written via Directus's REST API; Directus's built-in subscription system broadcasts them through its WebSocket. Stays in the business plane.
|
||
|
||
Each kind of data takes the path that fits it. No bridges, no extensions inside Directus.
|
||
|
||
## Authorization flow
|
||
|
||
The Processor's WebSocket endpoint validates connections through Directus, but never asks Directus per record. The handshake is **cookie-based and same-origin** — see [[processor-ws-contract]] §"Auth handshake" for the wire-level spec.
|
||
|
||
```
|
||
1. SPA opens wss://<origin>/ws-live (relative URL; same origin as Directus).
|
||
Browser auto-attaches the httpOnly Directus session cookie.
|
||
2. Processor reads the entire Cookie header from the upgrade request and
|
||
forwards it to Directus GET /users/me.
|
||
200 → bind the connection to (id, role).
|
||
401/403 → close the socket with code 4401 (unauthorized).
|
||
3. SPA sends {type: 'subscribe', topic: 'event:<uuid>'}.
|
||
4. Processor checks the user's organization_users membership against the
|
||
event's organization_id (one cached lookup per event).
|
||
200 → store {client → topic}; reply with the latest-position snapshot.
|
||
403 → reply with {type: 'error', code: 'forbidden'}.
|
||
5. For every position arriving on Redis, match against in-memory subscriptions
|
||
and push to matched clients. Zero Directus calls in the hot path.
|
||
```
|
||
|
||
Connection-time auth is amortized over session lifetime. Permission re-checks happen on subscription change, not on every record. The hot path is bounded by `O(positions × subscribed-clients-per-event)` and runs entirely on the Processor's event loop with in-memory state.
|
||
|
||
> Earlier revisions of this page described JWT-in-URL auth. That predated [[react-spa]]'s switch to Directus SDK session-mode auth (see log entry 2026-05-02 "Auth-mode wiki realignment"). The current implementation is cookie-based; tokens never appear in WebSocket URLs (which would land them in proxy logs).
|
||
|
||
## Failure modes
|
||
|
||
| Failure | Effect on durable storage | Effect on live channel |
|
||
|---|---|---|
|
||
| Processor crashes | Records pile up in Redis; Phase 3 [[failure-domains]] resumption picks them up | Live channel dies until recovery |
|
||
| Directus crashes | Unaffected (Processor writes direct to DB) | Existing connections keep working with cached permissions; **new subscriptions cannot be authorized** |
|
||
| Postgres crashes | Writes block; Redis buffers up to `MAXLEN` | Unaffected — fan-out is independent of DB state |
|
||
| Redis crashes | Whole pipeline stops | — |
|
||
|
||
The Directus-down case is the architecturally important one. Routing writes through Directus would mean ingestion blocks. The chosen design keeps ingestion alive and only loses the ability to authorize *new* subscriptions — a much gentler failure.
|
||
|
||
## Multi-instance Processor
|
||
|
||
Phase 3 of [[processor]] adds a second instance for HA. Each instance has its own connected SPA clients. A position arriving on instance A wouldn't naturally reach a client connected to instance B unless the broadcast path crosses instances.
|
||
|
||
The clean shape: each Processor reads the [[redis-streams]] stream on **two consumer groups**:
|
||
|
||
- `processor` — the durable-write group (work-split: only one instance handles each record for the DB write).
|
||
- `live-broadcast-{instance_id}` — a per-instance fan-out group (every instance reads every record for fan-out).
|
||
|
||
DB writes deduplicate by virtue of the consumer-group split; live broadcast deduplicates by virtue of clients being connected to exactly one instance. The Processor's [[redis-streams]] consumer code structure should anticipate this even at single-instance pilot scale.
|
||
|
||
## Scale considerations
|
||
|
||
At pilot scale (≤500 devices per event, tens of viewers), the dominant costs are:
|
||
|
||
- **Connection-time auth round-trips to Directus** — a few hundred per minute peak (race start). Trivial.
|
||
- **In-memory subscription matching** — `O(records × subscribers)`; for 500 records/sec × 20 subscribers per event, ~10k messages/sec fan-out. Sustained on Node.
|
||
|
||
When this becomes wrong:
|
||
|
||
- Sustained > ~10k WebSocket messages/sec total → consider sharding the broadcast path or extracting to a dedicated gateway service.
|
||
- Connection-time auth becomes a thundering herd at race start with thousands of viewers → cache the `/users/me` validation result for the connection's lifetime and shorten the Directus permission check via a token-with-scope pattern. Pilot scale doesn't need this; revisit when measured.
|
||
- Multi-data-center deployment → revisit the consumer-group fan-out strategy; per-region broadcast may be cleaner than global.
|
||
|
||
The escape hatch is well-defined: lift the WebSocket endpoint code out of the Processor into a standalone service that subscribes to the same `live-broadcast-*` consumer group. The Redis-stream-in / WebSocket-out contract doesn't change; only the host process does.
|
||
|
||
## What this means for adjacent components
|
||
|
||
- [[processor]] grows a public-facing WebSocket endpoint in addition to its existing Redis consumer and Postgres writer.
|
||
- [[directus]] keeps its built-in WebSocket subscriptions for tables it writes to. Its real-time delivery section no longer claims to broadcast direct writes from [[processor]] — that's a documented mistake corrected in this revision.
|
||
- [[react-spa]] connects to two WebSocket endpoints: Directus at `/ws-business` for admin/business updates, Processor at `/ws-live` for live position firehose. Same-origin httpOnly Directus session cookie on both — no separate auth artifact for the live channel. Consumer-side throughput discipline (rAF coalescing of incoming positions before reducer dispatch) is documented in [[maps-architecture]] — without it the per-message dispatch pattern observed in [[traccar-maps-architecture]] cascades through selectors and `setData` at every position arrival.
|
||
- The deploy stack publishes the Processor's WebSocket port (with TLS termination at a reverse proxy in front).
|
||
|
||
## Why not a single WebSocket endpoint
|
||
|
||
It would be tempting to fold everything into a single SPA-facing WebSocket — either Processor or Directus. Both fail:
|
||
|
||
- **Single Processor WebSocket** would require Processor to broadcast Directus-managed events, meaning Processor needs to subscribe to Directus's writes — which is exactly the problem we're avoiding for positions, in reverse.
|
||
- **Single Directus WebSocket** is the bridge-extension option; it loses plane separation.
|
||
|
||
Two endpoints, each serving the writes its plane manages, is the architecturally honest answer.
|
||
|
||
## Open questions
|
||
|
||
- **Auth caching strategy.** Currently every WebSocket connection round-trips to Directus's `/users/me` (~20ms over the internal network) to validate the forwarded session cookie. At pilot scale (≤500 viewers, low reconnect rate) this is trivial. Caching the validation per-connection-lifetime is the cheap optimisation; a stateless verification path (shared signing secret) is the heavier one. Defer until measurements demand it.
|
||
- **Subscription model.** Per-event, per-stage, per-organization, or arbitrary filter expressions? The simplest pilot model is "subscribe to one event by ID"; extensions land when SPA UX demands them.
|
||
- **Permission staleness.** If a user is removed from an organization mid-session, do their existing subscriptions silently keep delivering until reconnect? Either re-validate periodically, or accept "trust the session" for pilot.
|