Files
docs/wiki/synthesis/processor-ws-contract.md
julian f92595a62a docs: TRACCAR ingest + processor-ws-contract synthesis + auth-mode realignment
Catches up the wiki with several pieces of work accumulated during this
session.

INGEST: TRACCAR_MAPS_ARCHITECTURE.md
- raw/TRACCAR_MAPS_ARCHITECTURE.md (source doc, read-only).
- wiki/sources/traccar-maps-architecture.md — TL;DR + key claims +
  notable quotes + TRM divergences (PostGIS-native GeoJSON, rAF
  coalescer, Zustand, longer trail, racing sprite set).
- wiki/concepts/maps-architecture.md — distilled patterns for the SPA's
  map subsystem: singleton MapLibre + side-effect-only Map* components +
  two GeoJSON sources + style-swap mapReady gate + sprite preload + WS-
  to-map data flow (with rAF coalescer) + geofence editing + camera
  control trio.
- wiki/entities/react-spa.md — corrected the "talks exclusively to
  Directus" contradiction with [[live-channel-architecture]] (SPA
  connects to two endpoints — Directus + Processor); locked stack (raw
  MapLibre over react-map-gl, Zustand over Redux); added Auth section.
- wiki/concepts/live-channel-architecture.md — single sentence cross-
  referencing [[maps-architecture]] for consumer-side throughput
  discipline.
- index.md — Sources + Concepts entries.

SYNTHESIS: processor-ws-contract
- wiki/synthesis/processor-ws-contract.md — wire-level spec for the
  live-position WebSocket: endpoint, transport, auth handshake,
  subscribe/snapshot/streaming/unsubscribe protocol, reconnect, multi-
  instance behaviour, connection limits, versioning, open questions.
  Implementation-agnostic; the producer is cookie-name-agnostic so the
  spec doesn't pin to a specific Directus auth mode.
- index.md — Synthesis entry.

AUTH-MODE REALIGNMENT (cookie -> session)
- SPA implementation surfaced that Directus SDK 'cookie' mode doesn't
  survive a hard reload cleanly. Switched the SPA to 'session' mode
  (separate commit in trm/spa). Wiki updates here:
- wiki/entities/react-spa.md §Auth pattern — describes session mode
  (single httpOnly session cookie, no separate access token, no
  /auth/refresh dance). Added "Mode choice context" note.
- wiki/synthesis/processor-ws-contract.md §Auth handshake — emphasises
  the producer is cookie-name-agnostic; reframed "Cookie refresh while
  connected" as "Session expiry while connected".

Plus all the chronological log.md entries documenting the above plus
Phase 1.5 planning, SPA Phase 1 planning, and stage verify+seed work
from earlier in the session.

Skipped from this commit: .claude/agent-memory/* (user-local agent
state, not project content); .gitignore (already-modified by user
outside this session's scope).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:15:09 +02:00

257 lines
14 KiB
Markdown

---
title: Processor WebSocket contract
type: synthesis
created: 2026-05-02
updated: 2026-05-02
sources: [gps-tracking-architecture, traccar-maps-architecture]
tags: [websocket, protocol, contract, telemetry-plane, decision]
---
# Processor WebSocket contract
The wire-level specification of the WebSocket endpoint that fans live position updates from [[processor]] (or its eventual replacement gateway — see Implementation status) to [[react-spa]] clients. Both sides build against this contract; changes require a coordinated update on both sides.
This page is the protocol spec. The architectural rationale lives in [[live-channel-architecture]]; the consumer-side rendering pattern in [[maps-architecture]]; the inheritance from a working production reference in [[traccar-maps-architecture]].
## Implementation status
**Planned as `processor` Phase 1.5 — Live broadcast.** Six tasks in `trm/processor/.planning/phase-1-5-live-broadcast/`: WS server scaffold + heartbeat, cookie auth handshake, subscription registry & per-event authorization, broadcast consumer group & fan-out, snapshot-on-subscribe, integration test. Status ⬜ Not started; sequenced as 1.5.1 → 1.5.2 → 1.5.3 → (1.5.4 ‖ 1.5.5) → 1.5.6.
The endpoint is hosted *inside* the Processor process (as [[processor]] and [[live-channel-architecture]] specify). Lifting it into a separate `live-gateway` service is the documented escape hatch in [[live-channel-architecture]] §"Scale considerations" if sustained > 10k WS messages/sec demands it — not the starting point.
This contract is implementation-agnostic in the sense that the wire format wouldn't change if we ever did lift the endpoint out — only the host process would. SPA work can build against the contract independently of the Processor task sequence as long as it doesn't ship to stage before Phase 1.5 lands.
## Endpoint
```
wss://<one-public-origin>/processor/ws
```
Served behind the same reverse proxy that fronts [[directus]] and the [[react-spa]] static bundle. **Single origin is non-negotiable** — same-origin is what allows the auth cookie to flow with the WebSocket upgrade request (see Auth handshake below).
The path `/processor/ws` is illustrative; final path determined by the proxy routing rules. Whatever it is, the SPA reaches it as a relative URL, never a cross-origin URL.
## Transport
- **Protocol:** WebSocket (RFC 6455) over TLS at the edge. Internal hop from the proxy to the producer is plain WS on the `trm_default` Compose network.
- **Subprotocol:** none required. Future versions may add a `Sec-WebSocket-Protocol` of `trm.live.v1` if we need to negotiate versions; for now the path is the version.
- **Frame format:** text frames, JSON-encoded. No binary frames. (If we ever need to ship raw position bytes for a high-frequency optimisation, that's a v2 concern.)
- **Heartbeat:** the producer sends a ping every 30 s; the consumer responds. Consumer-side liveness is enforced by `setInterval` checking time-since-last-message > 60s ⇒ reconnect.
## Auth handshake
Cookie-based, same-origin, validated against [[directus]] once at connection time. The SPA uses the Directus SDK in session mode (see [[react-spa]] §"Auth pattern"); the producer is cookie-name-agnostic and just forwards whatever cookie header the upgrade carries.
```
1. Browser opens WebSocket to wss://<origin>/processor/ws.
Same-origin → browser automatically attaches the httpOnly session cookie
issued by Directus's /auth/login (session mode).
2. Producer reads the entire Cookie header from the upgrade request.
GET /users/me to Directus, forwarding the header verbatim.
200 → user identity (id, role, etc.) is bound to the connection.
401/403 → close the WebSocket with code 4401 (unauthorized).
3. Connection is now authenticated. The producer holds (connectionId → user)
in memory. No further per-message auth.
```
Implementation notes:
- **Cookie validation cache.** `/users/me` round-trip per connection is fine at pilot scale (≤500 viewers). At higher scale, cache the validation result for the connection's lifetime; on logout / session expiry the SPA reconnects, which re-validates.
- **No JWT in URL.** Don't pass tokens in query strings — they end up in proxy logs. Cookie is the only credential.
- **Why cookie not Authorization header.** Browsers don't let you set Authorization on a WebSocket upgrade. Cookies flow automatically. Same-origin is what makes this work.
- **Cookie-name-agnostic.** The producer never parses individual cookies; it forwards the whole header to `/users/me` and lets Directus identify the session. This keeps the producer working unchanged if Directus's cookie name or auth-mode default ever changes.
## Subscription model
After authentication, the SPA subscribes to event-scoped topics. One connection can hold multiple subscriptions; per-event authorization is checked once at subscribe time.
### Topic format
```
event:<eventId>
```
`<eventId>` is the UUID of an `events` row. Authorization: the user must have a record in `organization_users` for the event's organization (any role). Phase 4 of [[directus]] (permissions) will tighten this; for now membership is enough.
Future topic shapes (not in v1):
- `device:<deviceId>` — single-device follow.
- `entry:<entryId>` — follow a specific competitor across stages.
- `org:<orgId>` — broad org-wide watch (admin-only).
The protocol is forward-compatible: any string-typed topic is valid; producer rejects unknown shapes with `error/unknown-topic`.
### Subscribe
```json
// Client → Server
{
"type": "subscribe",
"topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
"id": "client-correlation-id-1"
}
```
`id` is optional; if present, the server echoes it on the response so the client can correlate.
### Server response — subscribed
```json
// Server → Client
{
"type": "subscribed",
"topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
"id": "client-correlation-id-1",
"snapshot": [
{ "deviceId": "cbed320e...", "lat": 41.327, "lon": 19.819, "ts": 1714654800000, "speed": 42.3, "course": 187, "accuracy": 5.0, "attributes": {} },
{ "deviceId": "f6114c7e...", "lat": 41.328, "lon": 19.820, "ts": 1714654799000, "speed": 38.1, "course": 184, "accuracy": 4.5, "attributes": {} }
]
}
```
The snapshot is the **latest known position per device** registered to the event (via `entry_devices``entries``events`). Without it, the SPA opens to a black map until devices report — feels broken.
### Server response — error
```json
// Server → Client
{
"type": "error",
"topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
"id": "client-correlation-id-1",
"code": "forbidden",
"message": "User does not belong to the event's organization."
}
```
Error codes (initial set; extensible):
| Code | Meaning |
|---|---|
| `forbidden` | User authenticated but not authorized for this topic. |
| `not-found` | Topic refers to a non-existent entity (event id has no row). |
| `unknown-topic` | Topic format not recognised. |
| `rate-limited` | Subscribe rate exceeded (Phase 3 hardening; reserved). |
### Streaming updates
After `subscribed`, the server pushes one message per position-of-interest:
```json
// Server → Client
{
"type": "position",
"topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
"deviceId": "cbed320e-1e94-488a-93c3-41060fcb06bc",
"lat": 41.32791,
"lon": 19.81947,
"ts": 1714654801000,
"speed": 42.5,
"course": 188,
"accuracy": 5.0,
"attributes": {}
}
```
Field semantics:
| Field | Type | Required | Notes |
|---|---|---|---|
| `type` | `"position"` | yes | Discriminator. |
| `topic` | string | yes | Echoes the subscription. Allows multiplexing on one connection. |
| `deviceId` | uuid | yes | The `devices.id` (not the IMEI). SPA looks up device → entry → vehicle/crew via TanStack Query against [[directus]]. |
| `lat` / `lon` | number (degrees, WGS84) | yes | GPS coordinates. **Coordinate order in JSON is `lat`/`lon`** (not `[lon,lat]` GeoJSON ordering — that conversion happens in the SPA). |
| `ts` | number (epoch milliseconds, UTC) | yes | Authoritative timestamp from the device's GPS fix. **Always use this, never `Date.now()` on the client.** |
| `speed` | number (km/h) | optional | Omitted if device reports speed=0 with invalid GPS fix (per [[teltonika]] convention). |
| `course` | number (degrees, 0=N, clockwise) | optional | Heading. Omitted if unknown. |
| `accuracy` | number (metres) | optional | Position accuracy radius for the [[react-spa]]'s accuracy-circle layer. |
| `attributes` | object | optional, default `{}` | The decoded IO bag. Phase 1 ships the raw IO map; Phase 2 of [[processor]] adds named attributes per [[io-element-bag]]. SPA must tolerate empty / unknown shapes. |
The producer should **omit fields rather than send `null`** for absent values. Reduces JSON size and removes ambiguity (null = "we don't know" vs missing = "device didn't report").
### Unsubscribe
```json
// Client → Server
{
"type": "unsubscribe",
"topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
"id": "client-correlation-id-2"
}
```
Server response:
```json
// Server → Client
{
"type": "unsubscribed",
"topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
"id": "client-correlation-id-2"
}
```
The connection stays open with whatever other subscriptions are active. Closing the WebSocket is the cleanup-everything path.
## Reconnect semantics
The client reconnects on close (other than code 4401). Backoff: 1s, 2s, 4s, 8s, 16s, then 30s steady. Cap at 30s.
On reconnect, the client **must re-subscribe to all previously-active topics**. The server treats reconnect as a fresh connection; subscription state lives in memory only.
The server should accept reconnects from the same user without rate-limiting at pilot scale. Phase 3 may add a per-user concurrent-connection cap.
## Multi-instance behaviour
When [[processor]] (or the gateway service) runs more than one replica:
- Each instance reads the [[redis-streams]] telemetry stream on **two consumer groups**:
- `processor` — the durable-write group (work-split: only one instance handles each record for the DB write).
- `live-broadcast-{instance_id}` — a per-instance fan-out group (every instance reads every record for fan-out).
- Connected clients are bound to one instance via the load balancer; that instance fans out to its own clients only. No cross-instance broadcasting needed.
- The reconnect is what handles instance failure — client reconnects, gets re-load-balanced to a healthy instance, re-subscribes.
This design is documented in [[live-channel-architecture]] §"Multi-instance Processor".
## Connection limits and back-pressure
Pilot-scale targets (subject to revision after first dogfood):
| Metric | Target |
|---|---|
| Concurrent connections per instance | 100 |
| Subscriptions per connection | 4 (one event + room for future per-device follow) |
| Position messages per second per connection | ≤ 500 (race start with 500 devices reporting at 1Hz) |
| End-to-end latency (Redis stream → client) | p95 < 500ms |
| Reconnect storm tolerance | 200 reconnects/sec for 5 seconds (race start surge) |
If a slow consumer can't drain its queue, the server **drops oldest position messages** for that connection (per-device; latest position is always preserved). Position data is always-fresh — backlog isn't valuable. Only `subscribed`/`unsubscribed`/`error` control messages are guaranteed delivery.
## Versioning
This is `v1`. Breaking changes (renaming fields, changing semantics) require:
1. New endpoint path (`/processor/ws/v2`).
2. Update this synthesis page to document both versions.
3. Deprecation window: v1 stays online for ≥ one full event cycle after v2 lands.
Non-breaking additions (new optional fields, new message types, new error codes) ship in v1 without ceremony — both sides should ignore unknown fields and unknown `type` values.
## Open questions
- **Session expiry while connected.** Directus session cookies have a finite lifetime. The WebSocket connection's already-validated identity is unaffected for as long as the connection stays open — the producer authorised once at upgrade and doesn't re-check. If the session expires server-side, the SPA's next REST call (or its periodic `/users/me` ping, if added) will fail with 401, the SPA will redirect to login, and on re-login the SPA reconnects the WebSocket — which re-validates. Pilot answer: producer never re-validates mid-connection. Phase 3 hardening can revisit if real-world session durations make this feel wrong.
- **Device-to-event resolution snapshot freshness.** The snapshot includes "every device registered to the event"; that registration set may change while a client is subscribed. Initial answer: subscription holds the registration set captured at subscribe time; new entries added mid-event don't appear until the client reconnects. Acceptable for pilot.
- **Faulty-flag visibility.** When an operator flips a position's `faulty=true` flag in [[directus]], should the live channel emit a correction? Current answer: no — faulty flagging is post-hoc operator review, not a live concern. Live map shows whatever was streamed at the time. The recompute pipeline ([[processor]] faulty position handling) corrects derived data, not the live history.
- **Replay-mode endpoint.** Out of v1 scope. A future `event:<id>:replay` topic could stream historical positions at a chosen speed. Defer.
## Cross-references
- [[live-channel-architecture]] — architectural rationale and dual-channel design.
- [[processor]] — the entity nominally hosting this endpoint (subject to the Implementation status note above).
- [[react-spa]] — the consumer.
- [[maps-architecture]] — consumer-side throughput discipline (rAF coalescer) that this contract is consumed through.
- [[traccar-maps-architecture]] — the working production reference whose WS contract shape this draws from (with refinements for our needs).
- [[directus]] — auth source (cookie validator) and the data source for event/device/org metadata the SPA looks up alongside the live stream.