f92595a62a
Catches up the wiki with several pieces of work accumulated during this session. INGEST: TRACCAR_MAPS_ARCHITECTURE.md - raw/TRACCAR_MAPS_ARCHITECTURE.md (source doc, read-only). - wiki/sources/traccar-maps-architecture.md — TL;DR + key claims + notable quotes + TRM divergences (PostGIS-native GeoJSON, rAF coalescer, Zustand, longer trail, racing sprite set). - wiki/concepts/maps-architecture.md — distilled patterns for the SPA's map subsystem: singleton MapLibre + side-effect-only Map* components + two GeoJSON sources + style-swap mapReady gate + sprite preload + WS- to-map data flow (with rAF coalescer) + geofence editing + camera control trio. - wiki/entities/react-spa.md — corrected the "talks exclusively to Directus" contradiction with [[live-channel-architecture]] (SPA connects to two endpoints — Directus + Processor); locked stack (raw MapLibre over react-map-gl, Zustand over Redux); added Auth section. - wiki/concepts/live-channel-architecture.md — single sentence cross- referencing [[maps-architecture]] for consumer-side throughput discipline. - index.md — Sources + Concepts entries. SYNTHESIS: processor-ws-contract - wiki/synthesis/processor-ws-contract.md — wire-level spec for the live-position WebSocket: endpoint, transport, auth handshake, subscribe/snapshot/streaming/unsubscribe protocol, reconnect, multi- instance behaviour, connection limits, versioning, open questions. Implementation-agnostic; the producer is cookie-name-agnostic so the spec doesn't pin to a specific Directus auth mode. - index.md — Synthesis entry. AUTH-MODE REALIGNMENT (cookie -> session) - SPA implementation surfaced that Directus SDK 'cookie' mode doesn't survive a hard reload cleanly. Switched the SPA to 'session' mode (separate commit in trm/spa). Wiki updates here: - wiki/entities/react-spa.md §Auth pattern — describes session mode (single httpOnly session cookie, no separate access token, no /auth/refresh dance). Added "Mode choice context" note. - wiki/synthesis/processor-ws-contract.md §Auth handshake — emphasises the producer is cookie-name-agnostic; reframed "Cookie refresh while connected" as "Session expiry while connected". Plus all the chronological log.md entries documenting the above plus Phase 1.5 planning, SPA Phase 1 planning, and stage verify+seed work from earlier in the session. Skipped from this commit: .claude/agent-memory/* (user-local agent state, not project content); .gitignore (already-modified by user outside this session's scope). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
257 lines
14 KiB
Markdown
257 lines
14 KiB
Markdown
---
|
|
title: Processor WebSocket contract
|
|
type: synthesis
|
|
created: 2026-05-02
|
|
updated: 2026-05-02
|
|
sources: [gps-tracking-architecture, traccar-maps-architecture]
|
|
tags: [websocket, protocol, contract, telemetry-plane, decision]
|
|
---
|
|
|
|
# Processor WebSocket contract
|
|
|
|
The wire-level specification of the WebSocket endpoint that fans live position updates from [[processor]] (or its eventual replacement gateway — see Implementation status) to [[react-spa]] clients. Both sides build against this contract; changes require a coordinated update on both sides.
|
|
|
|
This page is the protocol spec. The architectural rationale lives in [[live-channel-architecture]]; the consumer-side rendering pattern in [[maps-architecture]]; the inheritance from a working production reference in [[traccar-maps-architecture]].
|
|
|
|
## Implementation status
|
|
|
|
**Planned as `processor` Phase 1.5 — Live broadcast.** Six tasks in `trm/processor/.planning/phase-1-5-live-broadcast/`: WS server scaffold + heartbeat, cookie auth handshake, subscription registry & per-event authorization, broadcast consumer group & fan-out, snapshot-on-subscribe, integration test. Status ⬜ Not started; sequenced as 1.5.1 → 1.5.2 → 1.5.3 → (1.5.4 ‖ 1.5.5) → 1.5.6.
|
|
|
|
The endpoint is hosted *inside* the Processor process (as [[processor]] and [[live-channel-architecture]] specify). Lifting it into a separate `live-gateway` service is the documented escape hatch in [[live-channel-architecture]] §"Scale considerations" if sustained > 10k WS messages/sec demands it — not the starting point.
|
|
|
|
This contract is implementation-agnostic in the sense that the wire format wouldn't change if we ever did lift the endpoint out — only the host process would. SPA work can build against the contract independently of the Processor task sequence as long as it doesn't ship to stage before Phase 1.5 lands.
|
|
|
|
## Endpoint
|
|
|
|
```
|
|
wss://<one-public-origin>/processor/ws
|
|
```
|
|
|
|
Served behind the same reverse proxy that fronts [[directus]] and the [[react-spa]] static bundle. **Single origin is non-negotiable** — same-origin is what allows the auth cookie to flow with the WebSocket upgrade request (see Auth handshake below).
|
|
|
|
The path `/processor/ws` is illustrative; final path determined by the proxy routing rules. Whatever it is, the SPA reaches it as a relative URL, never a cross-origin URL.
|
|
|
|
## Transport
|
|
|
|
- **Protocol:** WebSocket (RFC 6455) over TLS at the edge. Internal hop from the proxy to the producer is plain WS on the `trm_default` Compose network.
|
|
- **Subprotocol:** none required. Future versions may add a `Sec-WebSocket-Protocol` of `trm.live.v1` if we need to negotiate versions; for now the path is the version.
|
|
- **Frame format:** text frames, JSON-encoded. No binary frames. (If we ever need to ship raw position bytes for a high-frequency optimisation, that's a v2 concern.)
|
|
- **Heartbeat:** the producer sends a ping every 30 s; the consumer responds. Consumer-side liveness is enforced by `setInterval` checking time-since-last-message > 60s ⇒ reconnect.
|
|
|
|
## Auth handshake
|
|
|
|
Cookie-based, same-origin, validated against [[directus]] once at connection time. The SPA uses the Directus SDK in session mode (see [[react-spa]] §"Auth pattern"); the producer is cookie-name-agnostic and just forwards whatever cookie header the upgrade carries.
|
|
|
|
```
|
|
1. Browser opens WebSocket to wss://<origin>/processor/ws.
|
|
Same-origin → browser automatically attaches the httpOnly session cookie
|
|
issued by Directus's /auth/login (session mode).
|
|
|
|
2. Producer reads the entire Cookie header from the upgrade request.
|
|
GET /users/me to Directus, forwarding the header verbatim.
|
|
200 → user identity (id, role, etc.) is bound to the connection.
|
|
401/403 → close the WebSocket with code 4401 (unauthorized).
|
|
|
|
3. Connection is now authenticated. The producer holds (connectionId → user)
|
|
in memory. No further per-message auth.
|
|
```
|
|
|
|
Implementation notes:
|
|
|
|
- **Cookie validation cache.** `/users/me` round-trip per connection is fine at pilot scale (≤500 viewers). At higher scale, cache the validation result for the connection's lifetime; on logout / session expiry the SPA reconnects, which re-validates.
|
|
- **No JWT in URL.** Don't pass tokens in query strings — they end up in proxy logs. Cookie is the only credential.
|
|
- **Why cookie not Authorization header.** Browsers don't let you set Authorization on a WebSocket upgrade. Cookies flow automatically. Same-origin is what makes this work.
|
|
- **Cookie-name-agnostic.** The producer never parses individual cookies; it forwards the whole header to `/users/me` and lets Directus identify the session. This keeps the producer working unchanged if Directus's cookie name or auth-mode default ever changes.
|
|
|
|
## Subscription model
|
|
|
|
After authentication, the SPA subscribes to event-scoped topics. One connection can hold multiple subscriptions; per-event authorization is checked once at subscribe time.
|
|
|
|
### Topic format
|
|
|
|
```
|
|
event:<eventId>
|
|
```
|
|
|
|
`<eventId>` is the UUID of an `events` row. Authorization: the user must have a record in `organization_users` for the event's organization (any role). Phase 4 of [[directus]] (permissions) will tighten this; for now membership is enough.
|
|
|
|
Future topic shapes (not in v1):
|
|
|
|
- `device:<deviceId>` — single-device follow.
|
|
- `entry:<entryId>` — follow a specific competitor across stages.
|
|
- `org:<orgId>` — broad org-wide watch (admin-only).
|
|
|
|
The protocol is forward-compatible: any string-typed topic is valid; producer rejects unknown shapes with `error/unknown-topic`.
|
|
|
|
### Subscribe
|
|
|
|
```json
|
|
// Client → Server
|
|
{
|
|
"type": "subscribe",
|
|
"topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
|
|
"id": "client-correlation-id-1"
|
|
}
|
|
```
|
|
|
|
`id` is optional; if present, the server echoes it on the response so the client can correlate.
|
|
|
|
### Server response — subscribed
|
|
|
|
```json
|
|
// Server → Client
|
|
{
|
|
"type": "subscribed",
|
|
"topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
|
|
"id": "client-correlation-id-1",
|
|
"snapshot": [
|
|
{ "deviceId": "cbed320e...", "lat": 41.327, "lon": 19.819, "ts": 1714654800000, "speed": 42.3, "course": 187, "accuracy": 5.0, "attributes": {} },
|
|
{ "deviceId": "f6114c7e...", "lat": 41.328, "lon": 19.820, "ts": 1714654799000, "speed": 38.1, "course": 184, "accuracy": 4.5, "attributes": {} }
|
|
]
|
|
}
|
|
```
|
|
|
|
The snapshot is the **latest known position per device** registered to the event (via `entry_devices` → `entries` → `events`). Without it, the SPA opens to a black map until devices report — feels broken.
|
|
|
|
### Server response — error
|
|
|
|
```json
|
|
// Server → Client
|
|
{
|
|
"type": "error",
|
|
"topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
|
|
"id": "client-correlation-id-1",
|
|
"code": "forbidden",
|
|
"message": "User does not belong to the event's organization."
|
|
}
|
|
```
|
|
|
|
Error codes (initial set; extensible):
|
|
|
|
| Code | Meaning |
|
|
|---|---|
|
|
| `forbidden` | User authenticated but not authorized for this topic. |
|
|
| `not-found` | Topic refers to a non-existent entity (event id has no row). |
|
|
| `unknown-topic` | Topic format not recognised. |
|
|
| `rate-limited` | Subscribe rate exceeded (Phase 3 hardening; reserved). |
|
|
|
|
### Streaming updates
|
|
|
|
After `subscribed`, the server pushes one message per position-of-interest:
|
|
|
|
```json
|
|
// Server → Client
|
|
{
|
|
"type": "position",
|
|
"topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
|
|
"deviceId": "cbed320e-1e94-488a-93c3-41060fcb06bc",
|
|
"lat": 41.32791,
|
|
"lon": 19.81947,
|
|
"ts": 1714654801000,
|
|
"speed": 42.5,
|
|
"course": 188,
|
|
"accuracy": 5.0,
|
|
"attributes": {}
|
|
}
|
|
```
|
|
|
|
Field semantics:
|
|
|
|
| Field | Type | Required | Notes |
|
|
|---|---|---|---|
|
|
| `type` | `"position"` | yes | Discriminator. |
|
|
| `topic` | string | yes | Echoes the subscription. Allows multiplexing on one connection. |
|
|
| `deviceId` | uuid | yes | The `devices.id` (not the IMEI). SPA looks up device → entry → vehicle/crew via TanStack Query against [[directus]]. |
|
|
| `lat` / `lon` | number (degrees, WGS84) | yes | GPS coordinates. **Coordinate order in JSON is `lat`/`lon`** (not `[lon,lat]` GeoJSON ordering — that conversion happens in the SPA). |
|
|
| `ts` | number (epoch milliseconds, UTC) | yes | Authoritative timestamp from the device's GPS fix. **Always use this, never `Date.now()` on the client.** |
|
|
| `speed` | number (km/h) | optional | Omitted if device reports speed=0 with invalid GPS fix (per [[teltonika]] convention). |
|
|
| `course` | number (degrees, 0=N, clockwise) | optional | Heading. Omitted if unknown. |
|
|
| `accuracy` | number (metres) | optional | Position accuracy radius for the [[react-spa]]'s accuracy-circle layer. |
|
|
| `attributes` | object | optional, default `{}` | The decoded IO bag. Phase 1 ships the raw IO map; Phase 2 of [[processor]] adds named attributes per [[io-element-bag]]. SPA must tolerate empty / unknown shapes. |
|
|
|
|
The producer should **omit fields rather than send `null`** for absent values. Reduces JSON size and removes ambiguity (null = "we don't know" vs missing = "device didn't report").
|
|
|
|
### Unsubscribe
|
|
|
|
```json
|
|
// Client → Server
|
|
{
|
|
"type": "unsubscribe",
|
|
"topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
|
|
"id": "client-correlation-id-2"
|
|
}
|
|
```
|
|
|
|
Server response:
|
|
|
|
```json
|
|
// Server → Client
|
|
{
|
|
"type": "unsubscribed",
|
|
"topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
|
|
"id": "client-correlation-id-2"
|
|
}
|
|
```
|
|
|
|
The connection stays open with whatever other subscriptions are active. Closing the WebSocket is the cleanup-everything path.
|
|
|
|
## Reconnect semantics
|
|
|
|
The client reconnects on close (other than code 4401). Backoff: 1s, 2s, 4s, 8s, 16s, then 30s steady. Cap at 30s.
|
|
|
|
On reconnect, the client **must re-subscribe to all previously-active topics**. The server treats reconnect as a fresh connection; subscription state lives in memory only.
|
|
|
|
The server should accept reconnects from the same user without rate-limiting at pilot scale. Phase 3 may add a per-user concurrent-connection cap.
|
|
|
|
## Multi-instance behaviour
|
|
|
|
When [[processor]] (or the gateway service) runs more than one replica:
|
|
|
|
- Each instance reads the [[redis-streams]] telemetry stream on **two consumer groups**:
|
|
- `processor` — the durable-write group (work-split: only one instance handles each record for the DB write).
|
|
- `live-broadcast-{instance_id}` — a per-instance fan-out group (every instance reads every record for fan-out).
|
|
- Connected clients are bound to one instance via the load balancer; that instance fans out to its own clients only. No cross-instance broadcasting needed.
|
|
- The reconnect is what handles instance failure — client reconnects, gets re-load-balanced to a healthy instance, re-subscribes.
|
|
|
|
This design is documented in [[live-channel-architecture]] §"Multi-instance Processor".
|
|
|
|
## Connection limits and back-pressure
|
|
|
|
Pilot-scale targets (subject to revision after first dogfood):
|
|
|
|
| Metric | Target |
|
|
|---|---|
|
|
| Concurrent connections per instance | 100 |
|
|
| Subscriptions per connection | 4 (one event + room for future per-device follow) |
|
|
| Position messages per second per connection | ≤ 500 (race start with 500 devices reporting at 1Hz) |
|
|
| End-to-end latency (Redis stream → client) | p95 < 500ms |
|
|
| Reconnect storm tolerance | 200 reconnects/sec for 5 seconds (race start surge) |
|
|
|
|
If a slow consumer can't drain its queue, the server **drops oldest position messages** for that connection (per-device; latest position is always preserved). Position data is always-fresh — backlog isn't valuable. Only `subscribed`/`unsubscribed`/`error` control messages are guaranteed delivery.
|
|
|
|
## Versioning
|
|
|
|
This is `v1`. Breaking changes (renaming fields, changing semantics) require:
|
|
|
|
1. New endpoint path (`/processor/ws/v2`).
|
|
2. Update this synthesis page to document both versions.
|
|
3. Deprecation window: v1 stays online for ≥ one full event cycle after v2 lands.
|
|
|
|
Non-breaking additions (new optional fields, new message types, new error codes) ship in v1 without ceremony — both sides should ignore unknown fields and unknown `type` values.
|
|
|
|
## Open questions
|
|
|
|
- **Session expiry while connected.** Directus session cookies have a finite lifetime. The WebSocket connection's already-validated identity is unaffected for as long as the connection stays open — the producer authorised once at upgrade and doesn't re-check. If the session expires server-side, the SPA's next REST call (or its periodic `/users/me` ping, if added) will fail with 401, the SPA will redirect to login, and on re-login the SPA reconnects the WebSocket — which re-validates. Pilot answer: producer never re-validates mid-connection. Phase 3 hardening can revisit if real-world session durations make this feel wrong.
|
|
- **Device-to-event resolution snapshot freshness.** The snapshot includes "every device registered to the event"; that registration set may change while a client is subscribed. Initial answer: subscription holds the registration set captured at subscribe time; new entries added mid-event don't appear until the client reconnects. Acceptable for pilot.
|
|
- **Faulty-flag visibility.** When an operator flips a position's `faulty=true` flag in [[directus]], should the live channel emit a correction? Current answer: no — faulty flagging is post-hoc operator review, not a live concern. Live map shows whatever was streamed at the time. The recompute pipeline ([[processor]] faulty position handling) corrects derived data, not the live history.
|
|
- **Replay-mode endpoint.** Out of v1 scope. A future `event:<id>:replay` topic could stream historical positions at a chosen speed. Defer.
|
|
|
|
## Cross-references
|
|
|
|
- [[live-channel-architecture]] — architectural rationale and dual-channel design.
|
|
- [[processor]] — the entity nominally hosting this endpoint (subject to the Implementation status note above).
|
|
- [[react-spa]] — the consumer.
|
|
- [[maps-architecture]] — consumer-side throughput discipline (rAF coalescer) that this contract is consumed through.
|
|
- [[traccar-maps-architecture]] — the working production reference whose WS contract shape this draws from (with refinements for our needs).
|
|
- [[directus]] — auth source (cookie validator) and the data source for event/device/org metadata the SPA looks up alongside the live stream.
|