f92595a62a
Catches up the wiki with several pieces of work accumulated during this session. INGEST: TRACCAR_MAPS_ARCHITECTURE.md - raw/TRACCAR_MAPS_ARCHITECTURE.md (source doc, read-only). - wiki/sources/traccar-maps-architecture.md — TL;DR + key claims + notable quotes + TRM divergences (PostGIS-native GeoJSON, rAF coalescer, Zustand, longer trail, racing sprite set). - wiki/concepts/maps-architecture.md — distilled patterns for the SPA's map subsystem: singleton MapLibre + side-effect-only Map* components + two GeoJSON sources + style-swap mapReady gate + sprite preload + WS- to-map data flow (with rAF coalescer) + geofence editing + camera control trio. - wiki/entities/react-spa.md — corrected the "talks exclusively to Directus" contradiction with [[live-channel-architecture]] (SPA connects to two endpoints — Directus + Processor); locked stack (raw MapLibre over react-map-gl, Zustand over Redux); added Auth section. - wiki/concepts/live-channel-architecture.md — single sentence cross- referencing [[maps-architecture]] for consumer-side throughput discipline. - index.md — Sources + Concepts entries. SYNTHESIS: processor-ws-contract - wiki/synthesis/processor-ws-contract.md — wire-level spec for the live-position WebSocket: endpoint, transport, auth handshake, subscribe/snapshot/streaming/unsubscribe protocol, reconnect, multi- instance behaviour, connection limits, versioning, open questions. Implementation-agnostic; the producer is cookie-name-agnostic so the spec doesn't pin to a specific Directus auth mode. - index.md — Synthesis entry. AUTH-MODE REALIGNMENT (cookie -> session) - SPA implementation surfaced that Directus SDK 'cookie' mode doesn't survive a hard reload cleanly. Switched the SPA to 'session' mode (separate commit in trm/spa). Wiki updates here: - wiki/entities/react-spa.md §Auth pattern — describes session mode (single httpOnly session cookie, no separate access token, no /auth/refresh dance). Added "Mode choice context" note. - wiki/synthesis/processor-ws-contract.md §Auth handshake — emphasises the producer is cookie-name-agnostic; reframed "Cookie refresh while connected" as "Session expiry while connected". Plus all the chronological log.md entries documenting the above plus Phase 1.5 planning, SPA Phase 1 planning, and stage verify+seed work from earlier in the session. Skipped from this commit: .claude/agent-memory/* (user-local agent state, not project content); .gitignore (already-modified by user outside this session's scope). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
136 lines
10 KiB
Markdown
136 lines
10 KiB
Markdown
---
|
||
title: Live channel architecture
|
||
type: concept
|
||
created: 2026-05-01
|
||
updated: 2026-05-01
|
||
sources: []
|
||
tags: [architecture, realtime, websocket, telemetry-plane, decision]
|
||
---
|
||
|
||
# Live channel architecture
|
||
|
||
How live position data reaches the [[react-spa]] without violating [[plane-separation]] or coupling to [[directus]]'s failure domain.
|
||
|
||
## The question
|
||
|
||
The SPA needs sub-second updates of device positions for live race views. Three things are non-negotiable:
|
||
|
||
1. The [[processor]] hot path stays direct-to-database — no API hop, no event-loop pressure on Directus.
|
||
2. [[directus]] is not in the telemetry hot path (per [[plane-separation]]).
|
||
3. The live channel must be authenticated and authorization-aware — only users with permission to see an event's positions get pushed updates.
|
||
|
||
The naïve assumption is that [[directus]]'s built-in WebSocket subscriptions cover this. They do not. **Directus's subscription system only fires events for writes that go through its own `ItemsService`** (REST/GraphQL/Admin UI mutations). Direct `INSERT`s from the [[processor]] are invisible to subscribers — verified against Directus's documentation and source. The bridging assumption was wrong.
|
||
|
||
This page documents how the platform actually delivers live positions.
|
||
|
||
## Options considered
|
||
|
||
| Option | Live channel works | Hot path stays fast | Plane separation | Failure domain |
|
||
|---|---|---|---|---|
|
||
| Route Processor writes through Directus REST | Yes (Directus broadcasts own writes) | Compromised — every write through Directus event loop | Compromised | Coupled — Directus down blocks ingestion |
|
||
| Bridge extension inside Directus (Redis → `WebSocketService.broadcast`) | Yes | Compromised — Directus runs the firehose consumer | Compromised | Coupled — Directus crash kills live channel |
|
||
| **Processor exposes its own WebSocket endpoint** (chosen) | Yes | Preserved | Preserved | Decoupled — Directus down blocks only new authorizations |
|
||
|
||
Option 3 wins because it preserves the architectural invariants that motivated [[plane-separation]] in the first place, while still leaning on [[directus]] for authentication and authorization.
|
||
|
||
## Chosen design
|
||
|
||
Two cleanly-separated WebSocket channels, each playing to its strength:
|
||
|
||
```
|
||
┌─ Telemetry plane ─────────────────────────┐ ┌─ Business plane ──────────────────────┐
|
||
│ │ │ │
|
||
│ Device → tcp-ingestion → Redis │ │ SPA admin action │
|
||
│ ↓ │ │ ↓ │
|
||
│ Processor │ │ Directus REST │
|
||
│ ↙ ↘ │ │ ↓ │
|
||
│ Postgres Processor's │ │ Postgres + Directus's WebSocket │
|
||
│ WebSocket │ │ ↓ │
|
||
│ ↓ │ │ SPA (admin UI, │
|
||
│ SPA │ │ leaderboard refresh, │
|
||
│ (live map) │ │ timing edits) │
|
||
└───────────────────────────────────────────┘ └───────────────────────────────────────┘
|
||
```
|
||
|
||
- **High-volume telemetry** (positions): the Processor writes directly to Postgres and *also* fans out the same records to subscribed SPA clients over its own WebSocket endpoint. Stays in the telemetry plane end-to-end.
|
||
- **Low-volume domain events** (timing records, stage results, manual entries, configuration): written via Directus's REST API; Directus's built-in subscription system broadcasts them through its WebSocket. Stays in the business plane.
|
||
|
||
Each kind of data takes the path that fits it. No bridges, no extensions inside Directus.
|
||
|
||
## Authorization flow
|
||
|
||
The Processor's WebSocket endpoint validates connections through Directus, but never asks Directus per record.
|
||
|
||
```
|
||
1. SPA opens wss://processor.../live with a Directus-issued JWT.
|
||
2. Processor validates the JWT (round-trip to Directus's /users/me, or local
|
||
verification with Directus's signing secret). Failure → close socket.
|
||
3. SPA sends {type: 'subscribe', event_id: 42}.
|
||
4. Processor calls Directus once: GET /items/events/42 with the user's token.
|
||
200 → allow subscription, store {client → event_id} in memory.
|
||
403 → reject subscription with a clear error.
|
||
5. For every position arriving on Redis, match against in-memory subscriptions
|
||
and push to matched clients. Zero Directus calls in the hot path.
|
||
```
|
||
|
||
Connection-time auth is amortized over session lifetime. Permission re-checks happen on subscription change, not on every record. The hot path is bounded by `O(positions × subscribed-clients-per-event)` and runs entirely on the Processor's event loop with in-memory state.
|
||
|
||
## Failure modes
|
||
|
||
| Failure | Effect on durable storage | Effect on live channel |
|
||
|---|---|---|
|
||
| Processor crashes | Records pile up in Redis; Phase 3 [[failure-domains]] resumption picks them up | Live channel dies until recovery |
|
||
| Directus crashes | Unaffected (Processor writes direct to DB) | Existing connections keep working with cached permissions; **new subscriptions cannot be authorized** |
|
||
| Postgres crashes | Writes block; Redis buffers up to `MAXLEN` | Unaffected — fan-out is independent of DB state |
|
||
| Redis crashes | Whole pipeline stops | — |
|
||
|
||
The Directus-down case is the architecturally important one. Routing writes through Directus would mean ingestion blocks. The chosen design keeps ingestion alive and only loses the ability to authorize *new* subscriptions — a much gentler failure.
|
||
|
||
## Multi-instance Processor
|
||
|
||
Phase 3 of [[processor]] adds a second instance for HA. Each instance has its own connected SPA clients. A position arriving on instance A wouldn't naturally reach a client connected to instance B unless the broadcast path crosses instances.
|
||
|
||
The clean shape: each Processor reads the [[redis-streams]] stream on **two consumer groups**:
|
||
|
||
- `processor` — the durable-write group (work-split: only one instance handles each record for the DB write).
|
||
- `live-broadcast-{instance_id}` — a per-instance fan-out group (every instance reads every record for fan-out).
|
||
|
||
DB writes deduplicate by virtue of the consumer-group split; live broadcast deduplicates by virtue of clients being connected to exactly one instance. The Processor's [[redis-streams]] consumer code structure should anticipate this even at single-instance pilot scale.
|
||
|
||
## Scale considerations
|
||
|
||
At pilot scale (≤500 devices per event, tens of viewers), the dominant costs are:
|
||
|
||
- **Connection-time auth round-trips to Directus** — a few hundred per minute peak (race start). Trivial.
|
||
- **In-memory subscription matching** — `O(records × subscribers)`; for 500 records/sec × 20 subscribers per event, ~10k messages/sec fan-out. Sustained on Node.
|
||
|
||
When this becomes wrong:
|
||
|
||
- Sustained > ~10k WebSocket messages/sec total → consider sharding the broadcast path or extracting to a dedicated gateway service.
|
||
- Connection-time auth becomes a thundering herd at race start with thousands of viewers → cache JWT verification locally and shorten the Directus permission check via a token-with-scope pattern.
|
||
- Multi-data-center deployment → revisit the consumer-group fan-out strategy; per-region broadcast may be cleaner than global.
|
||
|
||
The escape hatch is well-defined: lift the WebSocket endpoint code out of the Processor into a standalone service that subscribes to the same `live-broadcast-*` consumer group. The Redis-stream-in / WebSocket-out contract doesn't change; only the host process does.
|
||
|
||
## What this means for adjacent components
|
||
|
||
- [[processor]] grows a public-facing WebSocket endpoint in addition to its existing Redis consumer and Postgres writer.
|
||
- [[directus]] keeps its built-in WebSocket subscriptions for tables it writes to. Its real-time delivery section no longer claims to broadcast direct writes from [[processor]] — that's a documented mistake corrected in this revision.
|
||
- [[react-spa]] connects to two WebSocket endpoints: Directus for admin/business updates, Processor for live position firehose. Same JWT-based auth on both. Consumer-side throughput discipline (rAF coalescing of incoming positions before reducer dispatch) is documented in [[maps-architecture]] — without it the per-message dispatch pattern observed in [[traccar-maps-architecture]] cascades through selectors and `setData` at every position arrival.
|
||
- The deploy stack publishes the Processor's WebSocket port (with TLS termination at a reverse proxy in front).
|
||
|
||
## Why not a single WebSocket endpoint
|
||
|
||
It would be tempting to fold everything into a single SPA-facing WebSocket — either Processor or Directus. Both fail:
|
||
|
||
- **Single Processor WebSocket** would require Processor to broadcast Directus-managed events, meaning Processor needs to subscribe to Directus's writes — which is exactly the problem we're avoiding for positions, in reverse.
|
||
- **Single Directus WebSocket** is the bridge-extension option; it loses plane separation.
|
||
|
||
Two endpoints, each serving the writes its plane manages, is the architecturally honest answer.
|
||
|
||
## Open questions
|
||
|
||
- **JWT validation strategy.** Round-trip to Directus's `/users/me` (no shared secret, ~20ms per connection) vs. local verification with Directus's signing key (no round-trip, but a secret to share). Pilot can start with round-trip; revisit if connection rates climb.
|
||
- **Subscription model.** Per-event, per-stage, per-organization, or arbitrary filter expressions? The simplest pilot model is "subscribe to one event by ID"; extensions land when SPA UX demands them.
|
||
- **Permission staleness.** If a user is removed from an organization mid-session, do their existing subscriptions silently keep delivering until reconnect? Either re-validate periodically, or accept "trust the session" for pilot.
|