Files
docs/wiki/synthesis/processor-ws-contract.md
T
julian f92595a62a docs: TRACCAR ingest + processor-ws-contract synthesis + auth-mode realignment
Catches up the wiki with several pieces of work accumulated during this
session.

INGEST: TRACCAR_MAPS_ARCHITECTURE.md
- raw/TRACCAR_MAPS_ARCHITECTURE.md (source doc, read-only).
- wiki/sources/traccar-maps-architecture.md — TL;DR + key claims +
  notable quotes + TRM divergences (PostGIS-native GeoJSON, rAF
  coalescer, Zustand, longer trail, racing sprite set).
- wiki/concepts/maps-architecture.md — distilled patterns for the SPA's
  map subsystem: singleton MapLibre + side-effect-only Map* components +
  two GeoJSON sources + style-swap mapReady gate + sprite preload + WS-
  to-map data flow (with rAF coalescer) + geofence editing + camera
  control trio.
- wiki/entities/react-spa.md — corrected the "talks exclusively to
  Directus" contradiction with [[live-channel-architecture]] (SPA
  connects to two endpoints — Directus + Processor); locked stack (raw
  MapLibre over react-map-gl, Zustand over Redux); added Auth section.
- wiki/concepts/live-channel-architecture.md — single sentence cross-
  referencing [[maps-architecture]] for consumer-side throughput
  discipline.
- index.md — Sources + Concepts entries.

SYNTHESIS: processor-ws-contract
- wiki/synthesis/processor-ws-contract.md — wire-level spec for the
  live-position WebSocket: endpoint, transport, auth handshake,
  subscribe/snapshot/streaming/unsubscribe protocol, reconnect, multi-
  instance behaviour, connection limits, versioning, open questions.
  Implementation-agnostic; the producer is cookie-name-agnostic so the
  spec doesn't pin to a specific Directus auth mode.
- index.md — Synthesis entry.

AUTH-MODE REALIGNMENT (cookie -> session)
- SPA implementation surfaced that Directus SDK 'cookie' mode doesn't
  survive a hard reload cleanly. Switched the SPA to 'session' mode
  (separate commit in trm/spa). Wiki updates here:
- wiki/entities/react-spa.md §Auth pattern — describes session mode
  (single httpOnly session cookie, no separate access token, no
  /auth/refresh dance). Added "Mode choice context" note.
- wiki/synthesis/processor-ws-contract.md §Auth handshake — emphasises
  the producer is cookie-name-agnostic; reframed "Cookie refresh while
  connected" as "Session expiry while connected".

Plus all the chronological log.md entries documenting the above plus
Phase 1.5 planning, SPA Phase 1 planning, and stage verify+seed work
from earlier in the session.

Skipped from this commit: .claude/agent-memory/* (user-local agent
state, not project content); .gitignore (already-modified by user
outside this session's scope).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:15:09 +02:00

14 KiB

title, type, created, updated, sources, tags
title type created updated sources tags
Processor WebSocket contract synthesis 2026-05-02 2026-05-02
gps-tracking-architecture
traccar-maps-architecture
websocket
protocol
contract
telemetry-plane
decision

Processor WebSocket contract

The wire-level specification of the WebSocket endpoint that fans live position updates from processor (or its eventual replacement gateway — see Implementation status) to react-spa clients. Both sides build against this contract; changes require a coordinated update on both sides.

This page is the protocol spec. The architectural rationale lives in live-channel-architecture; the consumer-side rendering pattern in maps-architecture; the inheritance from a working production reference in traccar-maps-architecture.

Implementation status

Planned as processor Phase 1.5 — Live broadcast. Six tasks in trm/processor/.planning/phase-1-5-live-broadcast/: WS server scaffold + heartbeat, cookie auth handshake, subscription registry & per-event authorization, broadcast consumer group & fan-out, snapshot-on-subscribe, integration test. Status Not started; sequenced as 1.5.1 → 1.5.2 → 1.5.3 → (1.5.4 ‖ 1.5.5) → 1.5.6.

The endpoint is hosted inside the Processor process (as processor and live-channel-architecture specify). Lifting it into a separate live-gateway service is the documented escape hatch in live-channel-architecture §"Scale considerations" if sustained > 10k WS messages/sec demands it — not the starting point.

This contract is implementation-agnostic in the sense that the wire format wouldn't change if we ever did lift the endpoint out — only the host process would. SPA work can build against the contract independently of the Processor task sequence as long as it doesn't ship to stage before Phase 1.5 lands.

Endpoint

wss://<one-public-origin>/processor/ws

Served behind the same reverse proxy that fronts directus and the react-spa static bundle. Single origin is non-negotiable — same-origin is what allows the auth cookie to flow with the WebSocket upgrade request (see Auth handshake below).

The path /processor/ws is illustrative; final path determined by the proxy routing rules. Whatever it is, the SPA reaches it as a relative URL, never a cross-origin URL.

Transport

  • Protocol: WebSocket (RFC 6455) over TLS at the edge. Internal hop from the proxy to the producer is plain WS on the trm_default Compose network.
  • Subprotocol: none required. Future versions may add a Sec-WebSocket-Protocol of trm.live.v1 if we need to negotiate versions; for now the path is the version.
  • Frame format: text frames, JSON-encoded. No binary frames. (If we ever need to ship raw position bytes for a high-frequency optimisation, that's a v2 concern.)
  • Heartbeat: the producer sends a ping every 30 s; the consumer responds. Consumer-side liveness is enforced by setInterval checking time-since-last-message > 60s ⇒ reconnect.

Auth handshake

Cookie-based, same-origin, validated against directus once at connection time. The SPA uses the Directus SDK in session mode (see react-spa §"Auth pattern"); the producer is cookie-name-agnostic and just forwards whatever cookie header the upgrade carries.

1. Browser opens WebSocket to wss://<origin>/processor/ws.
   Same-origin → browser automatically attaches the httpOnly session cookie
   issued by Directus's /auth/login (session mode).

2. Producer reads the entire Cookie header from the upgrade request.
   GET /users/me to Directus, forwarding the header verbatim.
   200 → user identity (id, role, etc.) is bound to the connection.
   401/403 → close the WebSocket with code 4401 (unauthorized).

3. Connection is now authenticated. The producer holds (connectionId → user)
   in memory. No further per-message auth.

Implementation notes:

  • Cookie validation cache. /users/me round-trip per connection is fine at pilot scale (≤500 viewers). At higher scale, cache the validation result for the connection's lifetime; on logout / session expiry the SPA reconnects, which re-validates.
  • No JWT in URL. Don't pass tokens in query strings — they end up in proxy logs. Cookie is the only credential.
  • Why cookie not Authorization header. Browsers don't let you set Authorization on a WebSocket upgrade. Cookies flow automatically. Same-origin is what makes this work.
  • Cookie-name-agnostic. The producer never parses individual cookies; it forwards the whole header to /users/me and lets Directus identify the session. This keeps the producer working unchanged if Directus's cookie name or auth-mode default ever changes.

Subscription model

After authentication, the SPA subscribes to event-scoped topics. One connection can hold multiple subscriptions; per-event authorization is checked once at subscribe time.

Topic format

event:<eventId>

<eventId> is the UUID of an events row. Authorization: the user must have a record in organization_users for the event's organization (any role). Phase 4 of directus (permissions) will tighten this; for now membership is enough.

Future topic shapes (not in v1):

  • device:<deviceId> — single-device follow.
  • entry:<entryId> — follow a specific competitor across stages.
  • org:<orgId> — broad org-wide watch (admin-only).

The protocol is forward-compatible: any string-typed topic is valid; producer rejects unknown shapes with error/unknown-topic.

Subscribe

// Client → Server
{
  "type": "subscribe",
  "topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
  "id": "client-correlation-id-1"
}

id is optional; if present, the server echoes it on the response so the client can correlate.

Server response — subscribed

// Server → Client
{
  "type": "subscribed",
  "topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
  "id": "client-correlation-id-1",
  "snapshot": [
    { "deviceId": "cbed320e...", "lat": 41.327, "lon": 19.819, "ts": 1714654800000, "speed": 42.3, "course": 187, "accuracy": 5.0, "attributes": {} },
    { "deviceId": "f6114c7e...", "lat": 41.328, "lon": 19.820, "ts": 1714654799000, "speed": 38.1, "course": 184, "accuracy": 4.5, "attributes": {} }
  ]
}

The snapshot is the latest known position per device registered to the event (via entry_devicesentriesevents). Without it, the SPA opens to a black map until devices report — feels broken.

Server response — error

// Server → Client
{
  "type": "error",
  "topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
  "id": "client-correlation-id-1",
  "code": "forbidden",
  "message": "User does not belong to the event's organization."
}

Error codes (initial set; extensible):

Code Meaning
forbidden User authenticated but not authorized for this topic.
not-found Topic refers to a non-existent entity (event id has no row).
unknown-topic Topic format not recognised.
rate-limited Subscribe rate exceeded (Phase 3 hardening; reserved).

Streaming updates

After subscribed, the server pushes one message per position-of-interest:

// Server → Client
{
  "type": "position",
  "topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
  "deviceId": "cbed320e-1e94-488a-93c3-41060fcb06bc",
  "lat": 41.32791,
  "lon": 19.81947,
  "ts": 1714654801000,
  "speed": 42.5,
  "course": 188,
  "accuracy": 5.0,
  "attributes": {}
}

Field semantics:

Field Type Required Notes
type "position" yes Discriminator.
topic string yes Echoes the subscription. Allows multiplexing on one connection.
deviceId uuid yes The devices.id (not the IMEI). SPA looks up device → entry → vehicle/crew via TanStack Query against directus.
lat / lon number (degrees, WGS84) yes GPS coordinates. Coordinate order in JSON is lat/lon (not [lon,lat] GeoJSON ordering — that conversion happens in the SPA).
ts number (epoch milliseconds, UTC) yes Authoritative timestamp from the device's GPS fix. Always use this, never Date.now() on the client.
speed number (km/h) optional Omitted if device reports speed=0 with invalid GPS fix (per teltonika convention).
course number (degrees, 0=N, clockwise) optional Heading. Omitted if unknown.
accuracy number (metres) optional Position accuracy radius for the react-spa's accuracy-circle layer.
attributes object optional, default {} The decoded IO bag. Phase 1 ships the raw IO map; Phase 2 of processor adds named attributes per io-element-bag. SPA must tolerate empty / unknown shapes.

The producer should omit fields rather than send null for absent values. Reduces JSON size and removes ambiguity (null = "we don't know" vs missing = "device didn't report").

Unsubscribe

// Client → Server
{
  "type": "unsubscribe",
  "topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
  "id": "client-correlation-id-2"
}

Server response:

// Server → Client
{
  "type": "unsubscribed",
  "topic": "event:ada60b3d-b29f-4017-b702-cd6b700f9f6c",
  "id": "client-correlation-id-2"
}

The connection stays open with whatever other subscriptions are active. Closing the WebSocket is the cleanup-everything path.

Reconnect semantics

The client reconnects on close (other than code 4401). Backoff: 1s, 2s, 4s, 8s, 16s, then 30s steady. Cap at 30s.

On reconnect, the client must re-subscribe to all previously-active topics. The server treats reconnect as a fresh connection; subscription state lives in memory only.

The server should accept reconnects from the same user without rate-limiting at pilot scale. Phase 3 may add a per-user concurrent-connection cap.

Multi-instance behaviour

When processor (or the gateway service) runs more than one replica:

  • Each instance reads the redis-streams telemetry stream on two consumer groups:
    • processor — the durable-write group (work-split: only one instance handles each record for the DB write).
    • live-broadcast-{instance_id} — a per-instance fan-out group (every instance reads every record for fan-out).
  • Connected clients are bound to one instance via the load balancer; that instance fans out to its own clients only. No cross-instance broadcasting needed.
  • The reconnect is what handles instance failure — client reconnects, gets re-load-balanced to a healthy instance, re-subscribes.

This design is documented in live-channel-architecture §"Multi-instance Processor".

Connection limits and back-pressure

Pilot-scale targets (subject to revision after first dogfood):

Metric Target
Concurrent connections per instance 100
Subscriptions per connection 4 (one event + room for future per-device follow)
Position messages per second per connection ≤ 500 (race start with 500 devices reporting at 1Hz)
End-to-end latency (Redis stream → client) p95 < 500ms
Reconnect storm tolerance 200 reconnects/sec for 5 seconds (race start surge)

If a slow consumer can't drain its queue, the server drops oldest position messages for that connection (per-device; latest position is always preserved). Position data is always-fresh — backlog isn't valuable. Only subscribed/unsubscribed/error control messages are guaranteed delivery.

Versioning

This is v1. Breaking changes (renaming fields, changing semantics) require:

  1. New endpoint path (/processor/ws/v2).
  2. Update this synthesis page to document both versions.
  3. Deprecation window: v1 stays online for ≥ one full event cycle after v2 lands.

Non-breaking additions (new optional fields, new message types, new error codes) ship in v1 without ceremony — both sides should ignore unknown fields and unknown type values.

Open questions

  • Session expiry while connected. Directus session cookies have a finite lifetime. The WebSocket connection's already-validated identity is unaffected for as long as the connection stays open — the producer authorised once at upgrade and doesn't re-check. If the session expires server-side, the SPA's next REST call (or its periodic /users/me ping, if added) will fail with 401, the SPA will redirect to login, and on re-login the SPA reconnects the WebSocket — which re-validates. Pilot answer: producer never re-validates mid-connection. Phase 3 hardening can revisit if real-world session durations make this feel wrong.
  • Device-to-event resolution snapshot freshness. The snapshot includes "every device registered to the event"; that registration set may change while a client is subscribed. Initial answer: subscription holds the registration set captured at subscribe time; new entries added mid-event don't appear until the client reconnects. Acceptable for pilot.
  • Faulty-flag visibility. When an operator flips a position's faulty=true flag in directus, should the live channel emit a correction? Current answer: no — faulty flagging is post-hoc operator review, not a live concern. Live map shows whatever was streamed at the time. The recompute pipeline (processor faulty position handling) corrects derived data, not the live history.
  • Replay-mode endpoint. Out of v1 scope. A future event:<id>:replay topic could stream historical positions at a chosen speed. Defer.

Cross-references

  • live-channel-architecture — architectural rationale and dual-channel design.
  • processor — the entity nominally hosting this endpoint (subject to the Implementation status note above).
  • react-spa — the consumer.
  • maps-architecture — consumer-side throughput discipline (rAF coalescer) that this contract is consumed through.
  • traccar-maps-architecture — the working production reference whose WS contract shape this draws from (with refinements for our needs).
  • directus — auth source (cookie validator) and the data source for event/device/org metadata the SPA looks up alongside the live stream.