docs: update log and wiki entries for Phase 1.5 live broadcast implementation and incident resolution

This commit is contained in:
2026-05-03 19:33:15 +02:00
parent 875327bed7
commit 6ef4e9e9ee
3 changed files with 70 additions and 20 deletions
+18 -12
View File
@@ -2,7 +2,7 @@
title: Live channel architecture
type: concept
created: 2026-05-01
updated: 2026-05-01
updated: 2026-05-03
sources: []
tags: [architecture, realtime, websocket, telemetry-plane, decision]
---
@@ -59,22 +59,28 @@ Each kind of data takes the path that fits it. No bridges, no extensions inside
## Authorization flow
The Processor's WebSocket endpoint validates connections through Directus, but never asks Directus per record.
The Processor's WebSocket endpoint validates connections through Directus, but never asks Directus per record. The handshake is **cookie-based and same-origin** — see [[processor-ws-contract]] §"Auth handshake" for the wire-level spec.
```
1. SPA opens wss://processor.../live with a Directus-issued JWT.
2. Processor validates the JWT (round-trip to Directus's /users/me, or local
verification with Directus's signing secret). Failure → close socket.
3. SPA sends {type: 'subscribe', event_id: 42}.
4. Processor calls Directus once: GET /items/events/42 with the user's token.
200 → allow subscription, store {client → event_id} in memory.
403 → reject subscription with a clear error.
1. SPA opens wss://<origin>/ws-live (relative URL; same origin as Directus).
Browser auto-attaches the httpOnly Directus session cookie.
2. Processor reads the entire Cookie header from the upgrade request and
forwards it to Directus GET /users/me.
200 → bind the connection to (id, role).
401/403 → close the socket with code 4401 (unauthorized).
3. SPA sends {type: 'subscribe', topic: 'event:<uuid>'}.
4. Processor checks the user's organization_users membership against the
event's organization_id (one cached lookup per event).
200 → store {client → topic}; reply with the latest-position snapshot.
403 → reply with {type: 'error', code: 'forbidden'}.
5. For every position arriving on Redis, match against in-memory subscriptions
and push to matched clients. Zero Directus calls in the hot path.
```
Connection-time auth is amortized over session lifetime. Permission re-checks happen on subscription change, not on every record. The hot path is bounded by `O(positions × subscribed-clients-per-event)` and runs entirely on the Processor's event loop with in-memory state.
> Earlier revisions of this page described JWT-in-URL auth. That predated [[react-spa]]'s switch to Directus SDK session-mode auth (see log entry 2026-05-02 "Auth-mode wiki realignment"). The current implementation is cookie-based; tokens never appear in WebSocket URLs (which would land them in proxy logs).
## Failure modes
| Failure | Effect on durable storage | Effect on live channel |
@@ -107,7 +113,7 @@ At pilot scale (≤500 devices per event, tens of viewers), the dominant costs a
When this becomes wrong:
- Sustained > ~10k WebSocket messages/sec total → consider sharding the broadcast path or extracting to a dedicated gateway service.
- Connection-time auth becomes a thundering herd at race start with thousands of viewers → cache JWT verification locally and shorten the Directus permission check via a token-with-scope pattern.
- Connection-time auth becomes a thundering herd at race start with thousands of viewers → cache the `/users/me` validation result for the connection's lifetime and shorten the Directus permission check via a token-with-scope pattern. Pilot scale doesn't need this; revisit when measured.
- Multi-data-center deployment → revisit the consumer-group fan-out strategy; per-region broadcast may be cleaner than global.
The escape hatch is well-defined: lift the WebSocket endpoint code out of the Processor into a standalone service that subscribes to the same `live-broadcast-*` consumer group. The Redis-stream-in / WebSocket-out contract doesn't change; only the host process does.
@@ -116,7 +122,7 @@ The escape hatch is well-defined: lift the WebSocket endpoint code out of the Pr
- [[processor]] grows a public-facing WebSocket endpoint in addition to its existing Redis consumer and Postgres writer.
- [[directus]] keeps its built-in WebSocket subscriptions for tables it writes to. Its real-time delivery section no longer claims to broadcast direct writes from [[processor]] — that's a documented mistake corrected in this revision.
- [[react-spa]] connects to two WebSocket endpoints: Directus for admin/business updates, Processor for live position firehose. Same JWT-based auth on both. Consumer-side throughput discipline (rAF coalescing of incoming positions before reducer dispatch) is documented in [[maps-architecture]] — without it the per-message dispatch pattern observed in [[traccar-maps-architecture]] cascades through selectors and `setData` at every position arrival.
- [[react-spa]] connects to two WebSocket endpoints: Directus at `/ws-business` for admin/business updates, Processor at `/ws-live` for live position firehose. Same-origin httpOnly Directus session cookie on both — no separate auth artifact for the live channel. Consumer-side throughput discipline (rAF coalescing of incoming positions before reducer dispatch) is documented in [[maps-architecture]] — without it the per-message dispatch pattern observed in [[traccar-maps-architecture]] cascades through selectors and `setData` at every position arrival.
- The deploy stack publishes the Processor's WebSocket port (with TLS termination at a reverse proxy in front).
## Why not a single WebSocket endpoint
@@ -130,6 +136,6 @@ Two endpoints, each serving the writes its plane manages, is the architecturally
## Open questions
- **JWT validation strategy.** Round-trip to Directus's `/users/me` (no shared secret, ~20ms per connection) vs. local verification with Directus's signing key (no round-trip, but a secret to share). Pilot can start with round-trip; revisit if connection rates climb.
- **Auth caching strategy.** Currently every WebSocket connection round-trips to Directus's `/users/me` (~20ms over the internal network) to validate the forwarded session cookie. At pilot scale (≤500 viewers, low reconnect rate) this is trivial. Caching the validation per-connection-lifetime is the cheap optimisation; a stateless verification path (shared signing secret) is the heavier one. Defer until measurements demand it.
- **Subscription model.** Per-event, per-stage, per-organization, or arbitrary filter expressions? The simplest pilot model is "subscribe to one event by ID"; extensions land when SPA UX demands them.
- **Permission staleness.** If a user is removed from an organization mid-session, do their existing subscriptions silently keep delivering until reconnect? Either re-validate periodically, or accept "trust the session" for pilot.