docs/wiki/concepts/live-channel-architecture.md

---
title: Live channel architecture
type: concept
created: 2026-05-01
updated: 2026-05-03
sources: []
tags: [architecture, realtime, websocket, telemetry-plane, decision]
---

# Live channel architecture

How live position data reaches the [[react-spa]] without violating [[plane-separation]] or coupling to [[directus]]'s failure domain.

## The question

The SPA needs sub-second updates of device positions for live race views. Three things are non-negotiable:

1. The [[processor]] hot path stays direct-to-database — no API hop, no event-loop pressure on Directus.
2. [[directus]] is not in the telemetry hot path (per [[plane-separation]]).
3. The live channel must be authenticated and authorization-aware — only users with permission to see an event's positions get pushed updates.

The naïve assumption is that [[directus]]'s built-in WebSocket subscriptions cover this. They do not. **Directus's subscription system only fires events for writes that go through its own `ItemsService`** (REST/GraphQL/Admin UI mutations). Direct `INSERT`s from the [[processor]] are invisible to subscribers — verified against Directus's documentation and source. The bridging assumption was wrong.

This page documents how the platform actually delivers live positions.

## Options considered

| Option | Live channel works | Hot path stays fast | Plane separation | Failure domain |
|---|---|---|---|---|
| Route Processor writes through Directus REST | Yes (Directus broadcasts own writes) | Compromised — every write through Directus event loop | Compromised | Coupled — Directus down blocks ingestion |
| Bridge extension inside Directus (Redis → `WebSocketService.broadcast`) | Yes | Compromised — Directus runs the firehose consumer | Compromised | Coupled — Directus crash kills live channel |
| **Processor exposes its own WebSocket endpoint** (chosen) | Yes | Preserved | Preserved | Decoupled — Directus down blocks only new authorizations |

Option 3 wins because it preserves the architectural invariants that motivated [[plane-separation]] in the first place, while still leaning on [[directus]] for authentication and authorization.

## Chosen design

Two cleanly-separated WebSocket channels, each playing to its strength:

```
┌─ Telemetry plane ─────────────────────────┐    ┌─ Business plane ──────────────────────┐
│                                           │    │                                       │
│  Device → tcp-ingestion → Redis           │    │  SPA admin action                     │
│                              ↓            │    │                ↓                      │
│                          Processor        │    │           Directus REST               │
│                         ↙        ↘        │    │                ↓                      │
│                  Postgres    Processor's  │    │      Postgres + Directus's WebSocket  │
│                              WebSocket    │    │                ↓                      │
│                                  ↓        │    │           SPA (admin UI,              │
│                              SPA          │    │            leaderboard refresh,       │
│                              (live map)   │    │            timing edits)              │
└───────────────────────────────────────────┘    └───────────────────────────────────────┘
```

- **High-volume telemetry** (positions): the Processor writes directly to Postgres and *also* fans out the same records to subscribed SPA clients over its own WebSocket endpoint. Stays in the telemetry plane end-to-end.
- **Low-volume domain events** (timing records, stage results, manual entries, configuration): written via Directus's REST API; Directus's built-in subscription system broadcasts them through its WebSocket. Stays in the business plane.

Each kind of data takes the path that fits it. No bridges, no extensions inside Directus.

## Authorization flow

The Processor's WebSocket endpoint validates connections through Directus, but never asks Directus per record. The handshake is **cookie-based and same-origin** — see [[processor-ws-contract]] §"Auth handshake" for the wire-level spec.

```
1. SPA opens wss://<origin>/ws-live (relative URL; same origin as Directus).
   Browser auto-attaches the httpOnly Directus session cookie.
2. Processor reads the entire Cookie header from the upgrade request and
   forwards it to Directus GET /users/me.
   200 → bind the connection to (id, role).
   401/403 → close the socket with code 4401 (unauthorized).
3. SPA sends {type: 'subscribe', topic: 'event:<uuid>'}.
4. Processor checks the user's organization_users membership against the
   event's organization_id (one cached lookup per event).
   200 → store {client → topic}; reply with the latest-position snapshot.
   403 → reply with {type: 'error', code: 'forbidden'}.
5. For every position arriving on Redis, match against in-memory subscriptions
   and push to matched clients. Zero Directus calls in the hot path.
```

Connection-time auth is amortized over session lifetime. Permission re-checks happen on subscription change, not on every record. The hot path is bounded by `O(positions × subscribed-clients-per-event)` and runs entirely on the Processor's event loop with in-memory state.

> Earlier revisions of this page described JWT-in-URL auth. That predated [[react-spa]]'s switch to Directus SDK session-mode auth (see log entry 2026-05-02 "Auth-mode wiki realignment"). The current implementation is cookie-based; tokens never appear in WebSocket URLs (which would land them in proxy logs).

## Failure modes

| Failure | Effect on durable storage | Effect on live channel |
|---|---|---|
| Processor crashes | Records pile up in Redis; Phase 3 [[failure-domains]] resumption picks them up | Live channel dies until recovery |
| Directus crashes | Unaffected (Processor writes direct to DB) | Existing connections keep working with cached permissions; **new subscriptions cannot be authorized** |
| Postgres crashes | Writes block; Redis buffers up to `MAXLEN` | Unaffected — fan-out is independent of DB state |
| Redis crashes | Whole pipeline stops | — |

The Directus-down case is the architecturally important one. Routing writes through Directus would mean ingestion blocks. The chosen design keeps ingestion alive and only loses the ability to authorize *new* subscriptions — a much gentler failure.

## Multi-instance Processor

Phase 3 of [[processor]] adds a second instance for HA. Each instance has its own connected SPA clients. A position arriving on instance A wouldn't naturally reach a client connected to instance B unless the broadcast path crosses instances.

The clean shape: each Processor reads the [[redis-streams]] stream on **two consumer groups**:

- `processor` — the durable-write group (work-split: only one instance handles each record for the DB write).
- `live-broadcast-{instance_id}` — a per-instance fan-out group (every instance reads every record for fan-out).

DB writes deduplicate by virtue of the consumer-group split; live broadcast deduplicates by virtue of clients being connected to exactly one instance. The Processor's [[redis-streams]] consumer code structure should anticipate this even at single-instance pilot scale.

## Scale considerations

At pilot scale (≤500 devices per event, tens of viewers), the dominant costs are:

- **Connection-time auth round-trips to Directus** — a few hundred per minute peak (race start). Trivial.
- **In-memory subscription matching** — `O(records × subscribers)`; for 500 records/sec × 20 subscribers per event, ~10k messages/sec fan-out. Sustained on Node.

When this becomes wrong:

- Sustained > ~10k WebSocket messages/sec total → consider sharding the broadcast path or extracting to a dedicated gateway service.
- Connection-time auth becomes a thundering herd at race start with thousands of viewers → cache the `/users/me` validation result for the connection's lifetime and shorten the Directus permission check via a token-with-scope pattern. Pilot scale doesn't need this; revisit when measured.
- Multi-data-center deployment → revisit the consumer-group fan-out strategy; per-region broadcast may be cleaner than global.

The escape hatch is well-defined: lift the WebSocket endpoint code out of the Processor into a standalone service that subscribes to the same `live-broadcast-*` consumer group. The Redis-stream-in / WebSocket-out contract doesn't change; only the host process does.

## What this means for adjacent components

- [[processor]] grows a public-facing WebSocket endpoint in addition to its existing Redis consumer and Postgres writer.
- [[directus]] keeps its built-in WebSocket subscriptions for tables it writes to. Its real-time delivery section no longer claims to broadcast direct writes from [[processor]] — that's a documented mistake corrected in this revision.
- [[react-spa]] connects to two WebSocket endpoints: Directus at `/ws-business` for admin/business updates, Processor at `/ws-live` for live position firehose. Same-origin httpOnly Directus session cookie on both — no separate auth artifact for the live channel. Consumer-side throughput discipline (rAF coalescing of incoming positions before reducer dispatch) is documented in [[maps-architecture]] — without it the per-message dispatch pattern observed in [[traccar-maps-architecture]] cascades through selectors and `setData` at every position arrival.
- The deploy stack publishes the Processor's WebSocket port (with TLS termination at a reverse proxy in front).

## Why not a single WebSocket endpoint

It would be tempting to fold everything into a single SPA-facing WebSocket — either Processor or Directus. Both fail:

- **Single Processor WebSocket** would require Processor to broadcast Directus-managed events, meaning Processor needs to subscribe to Directus's writes — which is exactly the problem we're avoiding for positions, in reverse.
- **Single Directus WebSocket** is the bridge-extension option; it loses plane separation.

Two endpoints, each serving the writes its plane manages, is the architecturally honest answer.

## Open questions

- **Auth caching strategy.** Currently every WebSocket connection round-trips to Directus's `/users/me` (~20ms over the internal network) to validate the forwarded session cookie. At pilot scale (≤500 viewers, low reconnect rate) this is trivial. Caching the validation per-connection-lifetime is the cheap optimisation; a stateless verification path (shared signing secret) is the heavier one. Defer until measurements demand it.
- **Subscription model.** Per-event, per-stage, per-organization, or arbitrary filter expressions? The simplest pilot model is "subscribe to one event by ID"; extensions land when SPA UX demands them.
- **Permission staleness.** If a user is removed from an organization mid-session, do their existing subscriptions silently keep delivering until reconnect? Either re-validate periodically, or accept "trust the session" for pilot.