Compare commits

..

10 Commits

Author SHA1 Message Date
julian fa50df3e27 docs(planning): mark Phase 1.5 live broadcast as Done
Tasks 1.5.4, 1.5.5, 1.5.6 marked 🟩 with commit hashes and implementation
notes. Phase 1.5 status updated to Done in ROADMAP.md.
2026-05-02 18:39:22 +02:00
julian 87dec03d3c feat(live): task 1.5.6 — live broadcast integration test
Adds end-to-end integration test for the WebSocket live broadcast pipeline:
Redis + TimescaleDB containers + Directus stub → full pipeline boot → real
WS client assertions. Mirrors pipeline.integration.test.ts pattern with the
skip-on-no-Docker guard.

Key additions:
- test/live.integration.test.ts: 6 test scenarios — happy path (subscribe →
  snapshot → live position), auth rejection (401), forbidden subscription
  (error/forbidden), multi-client fan-out (both receive position), orphan
  position (no WS frame), faulty snapshot exclusion (next-best non-faulty)
- test/helpers/directus-stub.ts: bare http.createServer stub for /users/me
  and /items/events/:id endpoints with cookie-based user lookup
- test/fixtures/test-schema.sql: minimal schema subset (events, entries,
  entry_devices with IMEI-as-device_id for Phase 1 join semantics)

The integration test runs via `pnpm test:integration`, not `pnpm test`.
Docker required; the suite skips cleanly when Docker is unavailable.
2026-05-02 18:38:53 +02:00
julian 3c2c5cf50e docs(planning): file phase-3 task 3.5 — retire processor migration runner
Replaces the original "migration advisory lock" sketch. Once processor
doesn't run DDL, the lock concern delegates to Directus's db-init runner.

Context: positions hypertable + faulty column DDL currently exists in
both processor (src/db/migrations/0001 + 0002) and directus
(db-init/001/002/003). Two sources of truth for the same schema is a
known hazard — adding a column means editing two files in two repos,
and silent drift between them is invisible until runtime.

Fix: directus becomes the sole DDL owner. Processor's migration runner
is retired; only INSERT/SELECT/UPDATE remain.

Task spec covers:
- Pre-flight diff between processor migrations and directus db-init
  (must be byte/semantically equivalent before deletion)
- File-by-file deletion list
- Test infra migration (integration test moves to fixture-based schema
  setup, matching the established Phase 1.5 task 1.5.6 pattern)
- Wiki + ROADMAP updates
- compose.yaml depends_on directus: service_healthy
- Operational notes (existing migrations_applied table is left in place)

Sequence: ideally lands AFTER Phase 1.5 ships so the agent shipping the
WS endpoint isn't pulled into a side quest mid-flight.
2026-05-02 18:37:47 +02:00
julian b3d6410af6 feat(live): task 1.5.5 — snapshot-on-subscribe
Adds snapshot provider that queries the latest non-faulty position per device
registered to an event, returned in the `subscribed` reply so the SPA map is
populated immediately rather than waiting for the first live broadcast batch.

Key changes:
- src/live/snapshot.ts: createSnapshotProvider factory using DISTINCT ON
  (device_id) ... ORDER BY device_id, ts DESC with WHERE faulty=false; converts
  Date ts to epoch ms; omits speed/course when 0 (matching broadcast convention)
- src/main.ts: injects createSnapshotProvider(pool) into createSubscriptionRegistry
- test/live-snapshot.test.ts: 7 unit tests covering: two-device result, empty
  event, faulty exclusion, DISTINCT ON semantics, parameterized query, metrics
  observation, and error propagation

The snapshot query requires the positions_device_ts_idx created in migration 0002
(task 1.5.4).  Snapshot failures fail open — registry.fetchSnapshot returns [] so
the subscription still succeeds with an empty initial state.
2026-05-02 18:37:18 +02:00
julian fbb1f34e9a feat(live): task 1.5.4 — broadcast consumer group and fan-out
Adds the per-instance Redis Stream consumer group (live-broadcast-{instance_id})
that reads the telemetry stream and fans out each position to subscribed
WebSocket connections without affecting the durable-write consumer path.

Key changes:
- src/shared/codec.ts: moved decodePosition/CodecError out of src/core/ so
  src/live/broadcast.ts can decode positions without crossing the enforced
  src/core/ ↔ src/live/ boundary; src/core/codec.ts now re-exports from there
- src/shared/types.ts: added Position and AttributeValue (same move, same reason);
  src/core/types.ts re-exports both to preserve existing import paths
- src/live/broadcast.ts: createBroadcastConsumer factory — XREADGROUP loop,
  immediate ACK semantics, toPositionMessage mapper, fanOut per event/topic
- src/live/device-event-map.ts: createDeviceEventMap factory — in-memory cache
  of entry_devices × entries join, refreshed every LIVE_DEVICE_EVENT_REFRESH_MS
- src/db/migrations/0002_positions_faulty.sql: adds faulty boolean column and
  positions_device_ts_idx for snapshot-on-subscribe query (task 1.5.5)
- src/main.ts: wired authClient, authzClient, registry, liveServer,
  deviceEventMap, broadcastConsumer; shutdown chain: liveServer → deviceEventMap
  + broadcastConsumer → durable-write consumer → metricsServer → Redis → Postgres
- test/live-broadcast.test.ts: 4 unit tests covering single subscriber, multiple
  subscribers, orphan device, and multi-event device fan-out
2026-05-02 18:36:52 +02:00
julian 90605614f6 docs: update task 1.5.3 done section and ROADMAP status 2026-05-02 18:36:23 +02:00
julian bf5c358668 feat(live): task 1.5.3 — subscription registry & per-event authorization
Subscribe/unsubscribe with per-event authorization via Directus delegation:
- src/live/authz.ts: createAuthzClient factory; canAccessEvent(cookieHeader,
  eventId) calls GET /items/events/<id>?fields=id, delegates row-level security
  to Directus (200=allow, 403=forbidden, 404=not-found, else error).
- src/live/registry.ts: createSubscriptionRegistry with bidirectional indexes
  (WeakMap<conn, topics> + Map<topic, conns>); subscribe/unsubscribe/
  onConnectionClose/connectionsForTopic/topicsForConnection/stats. Authorization
  runs once at subscribe time. Snapshot is stubbed as [] until task 1.5.5.
  Includes pluggable SnapshotProvider interface for task 1.5.5 injection.
- src/live/protocol.ts: adds 'error' to ErrorCode union for transient authz
  failures.
- src/main.ts: wires createAuthzClient + createSubscriptionRegistry; replaces
  the stub message handler with the real subscribe/unsubscribe router; passes
  registry.onConnectionClose as the server's onClose callback.
- test/live-authz.test.ts: 6 unit tests for all canAccessEvent outcomes.
- test/live-registry.test.ts: 9 unit tests for subscribe/unsubscribe semantics,
  idempotency, gauge correctness, and onConnectionClose cleanup.
2026-05-02 18:36:00 +02:00
julian 7450cbffaa docs: update task 1.5.2 done section and ROADMAP status 2026-05-02 18:34:40 +02:00
julian 20ebd9b473 feat(live): task 1.5.2 — cookie auth handshake
Authenticate WebSocket upgrade requests via Directus's /users/me:
- src/live/auth.ts: createAuthClient factory; validate() forwards the raw
  Cookie: header to Directus, parses the user with zod, and returns
  AuthenticatedUser or null. Handles 401/403 (unauthorized), non-2xx (error),
  network failures, AbortError (timeout), null data (expired session), and
  missing data key (malformed Directus response).
- src/live/server.ts: upgrade handler now calls authClient.validate() before
  completing the WS handshake; on null user, writes HTTP 401 and destroys the
  socket. LiveConnection gains user: AuthenticatedUser and cookieHeader: string
  (needed for per-subscription authz in task 1.5.3). authClient is an optional
  parameter so tests without auth still work.
- src/main.ts: wires createAuthClient and passes it to createLiveServer.
- test/live-auth.test.ts: 11 unit tests covering all validate() code paths
  including the empty-cookie fast-path, latency histogram observation, and
  distinction between unauthorized (401/expired) and error (malformed) results.
2026-05-02 18:33:54 +02:00
julian 8a78e53e58 docs: update task 1.5.1 done section and ROADMAP status 2026-05-02 18:33:26 +02:00
33 changed files with 5218 additions and 288 deletions
+16
View File
@@ -59,6 +59,22 @@ These rules govern every task. Any deviation must be discussed and documented as
| 1.10 | [Integration test (testcontainers Redis + Postgres)](./phase-1-throughput/10-integration-test.md) | 🟩 | `9791620` |
| 1.11 | [Dockerfile & Gitea workflow](./phase-1-throughput/11-dockerfile-and-ci.md) | 🟩 | `9791620` |
### Phase 1.5 — Live broadcast
**Status:** 🟩 Done
**Outcome:** WebSocket endpoint inside the Processor that fans live position updates from Redis to subscribed [[react-spa]] clients. Cookie-based auth via Directus's `/users/me`, per-event subscription with one-time authorization at subscribe time, snapshot-on-subscribe, multi-instance per-instance consumer-group fan-out. The wire spec is `docs/wiki/synthesis/processor-ws-contract.md`. Unblocks the SPA's live-map feature for the Rally Albania 2026 dogfood.
[**See `phase-1-5-live-broadcast/README.md`**](./phase-1-5-live-broadcast/README.md)
| # | Task | Status | Landed in |
|---|------|--------|-----------|
| 1.5.1 | [WS server scaffold + heartbeat](./phase-1-5-live-broadcast/01-ws-server-scaffold.md) | 🟩 | `b8ebbd0` |
| 1.5.2 | [Cookie auth handshake](./phase-1-5-live-broadcast/02-cookie-auth-handshake.md) | 🟩 | `190254d` |
| 1.5.3 | [Subscription registry & per-event authorization](./phase-1-5-live-broadcast/03-subscription-registry.md) | 🟩 | `38de4bc` |
| 1.5.4 | [Broadcast consumer group & fan-out](./phase-1-5-live-broadcast/04-broadcast-consumer-group.md) | 🟩 | `c07ea0e` |
| 1.5.5 | [Snapshot-on-subscribe](./phase-1-5-live-broadcast/05-snapshot-on-subscribe.md) | 🟩 | `f4b50ca` |
| 1.5.6 | [Integration test (testcontainers Redis + Postgres + Directus stub)](./phase-1-5-live-broadcast/06-integration-test.md) | 🟩 | `2f2cf5c` |
### Phase 2 — Domain logic
**Status:** ⬜ Not started — blocks on Directus schema decisions
@@ -0,0 +1,195 @@
# Task 1.5.1 — WS server scaffold + heartbeat
**Phase:** 1.5 — Live broadcast
**Status:** ⬜ Not started
**Depends on:** 1.8 (main wiring), 1.9 (observability)
**Wiki refs:** `docs/wiki/synthesis/processor-ws-contract.md` §Endpoint, §Transport; `docs/wiki/concepts/live-channel-architecture.md`
## Goal
Stand up a WebSocket server inside the Processor process: bind to a configurable port, accept upgrades, dispatch incoming messages to a router, send 30s pings, hold a typed `LiveConnection` per client. **No auth, no subscriptions yet** — those land in 1.5.2 and 1.5.3. This task is the lifecycle and message-loop skeleton.
The WS server runs on its own HTTP server (separate from the Phase 1 metrics/health server on `:9090`) so the reverse proxy can route them to different paths and the failure modes don't entangle.
## Deliverables
- `src/live/server.ts` exporting:
- `createLiveServer(config, logger, metrics): LiveServer` — factory.
- `LiveServer` interface: `start(): Promise<void>` (binds and listens), `stop(timeoutMs?: number): Promise<void>` (closes new connections, sends a close frame to existing, waits for them to drain or force-closes after the timeout).
- `type LiveConnection = { id: string; ws: WebSocket; remoteAddr: string; openedAt: Date; lastSeenAt: Date }` — opaque identity, augmented in later tasks.
- A pluggable `onMessage(conn: LiveConnection, raw: string): Promise<void>` handler — for now, just logs at `debug` and replies with `{ type: 'error', code: 'not-implemented' }`. Tasks 1.5.2 and 1.5.3 attach the real handler.
- `src/live/protocol.ts` — zod schemas for inbound message envelopes:
```ts
const InboundMessage = z.discriminatedUnion('type', [
z.object({ type: z.literal('subscribe'), topic: z.string(), id: z.string().optional() }),
z.object({ type: z.literal('unsubscribe'), topic: z.string(), id: z.string().optional() }),
]);
```
Outbound types declared but not constructed yet (`subscribed`/`position`/`unsubscribed`/`error`).
- `src/main.ts` updated to create + start the live server alongside the existing consumer; SIGTERM stops both in the right order (live server first so no new connections during drain; consumer second so the durable-write path completes its in-flight batch).
- New config keys (zod schema in `src/config/load.ts`):
- `LIVE_WS_PORT` (default `8081`).
- `LIVE_WS_HOST` (default `0.0.0.0`).
- `LIVE_WS_PING_INTERVAL_MS` (default `30_000`).
- `LIVE_WS_DRAIN_TIMEOUT_MS` (default `5_000`).
- New Prometheus metrics (in `src/observability/metrics.ts`):
- `processor_live_connections{instance_id}` (gauge) — current open connections.
- `processor_live_messages_inbound_total{instance_id, type}` (counter).
- `processor_live_messages_outbound_total{instance_id, type}` (counter).
- `test/live-server.test.ts`:
- Server starts on a random port, accepts a connection, ping is sent within `PING_INTERVAL_MS + 100ms`, pong updates `lastSeenAt`.
- Inbound message that fails zod validation receives an `{ type: 'error', code: 'protocol-violation' }` reply and the connection stays open.
- `stop()` sends a close frame to existing connections and resolves within the drain timeout.
## Specification
### Library choice
`ws` (the package, not `@types/ws` alone). Lightweight, minimal API, supports `noServer: true` for attaching to an existing `http.createServer`. Avoid `uWebSockets.js` — performance is great but the C++ binding makes deployment / testcontainers-friendliness fiddly.
### Server attach pattern
```ts
const httpServer = http.createServer((req, res) => {
// Optional: a small /healthz endpoint specific to the live server, separate
// from the Phase 1 metrics/health server. For now, return 404 on HTTP requests
// — the only thing this server does is upgrade.
res.writeHead(404).end();
});
const wss = new WebSocketServer({ noServer: true });
httpServer.on('upgrade', (req, socket, head) => {
// Auth happens here in task 1.5.2. For now, just accept.
wss.handleUpgrade(req, socket, head, (ws) => {
wss.emit('connection', ws, req);
});
});
wss.on('connection', (ws, req) => {
const conn: LiveConnection = {
id: nanoid(),
ws,
remoteAddr: req.socket.remoteAddress ?? 'unknown',
openedAt: new Date(),
lastSeenAt: new Date(),
};
// ... attach handlers
});
```
### Heartbeat
Use the `ws` library's built-in ping/pong:
```ts
const pingTimer = setInterval(() => {
for (const conn of connections.values()) {
if (conn.ws.readyState !== WebSocket.OPEN) continue;
conn.ws.ping();
// Optional: track outstanding pings; close if pong doesn't arrive in N seconds.
}
}, config.LIVE_WS_PING_INTERVAL_MS);
```
`ws` automatically responds to inbound pings with pongs, and emits `'pong'` on the server when a client responds. Update `lastSeenAt` in the pong handler. **Don't roll your own ping in the application protocol** — the WebSocket frame-level ping/pong is faster, browser-built-in, and doesn't pollute the message log.
### Inbound message handling
```ts
ws.on('message', async (data) => {
conn.lastSeenAt = new Date();
const raw = data.toString('utf8');
let parsed;
try {
parsed = InboundMessage.parse(JSON.parse(raw));
} catch (err) {
metrics.liveMessagesInbound.inc({ type: 'invalid' });
sendOutbound(conn, { type: 'error', code: 'protocol-violation', message: 'Invalid message envelope' });
return;
}
metrics.liveMessagesInbound.inc({ type: parsed.type });
await onMessage(conn, parsed);
});
```
`onMessage` is the pluggable handler that 1.5.2 (auth gate) and 1.5.3 (subscription registry) replace.
### Outbound message helper
```ts
function sendOutbound(conn: LiveConnection, msg: OutboundMessage): void {
if (conn.ws.readyState !== WebSocket.OPEN) return;
conn.ws.send(JSON.stringify(msg));
metrics.liveMessagesOutbound.inc({ type: msg.type });
}
```
Centralised so back-pressure handling (1.5.4) and message logging hook in one place later.
### Close codes
`ws` lets you specify close codes when calling `ws.close(code, reason)`. Reserve these:
| Code | Meaning | Where set |
|---|---|---|
| `1000` | Normal closure | Default for clean disconnect |
| `1001` | Server going away | `stop()` during shutdown |
| `4401` | Unauthorized | Task 1.5.2 |
| `4403` | Forbidden | Task 1.5.3 (for revoked authorization, not used in pilot) |
Document these in `protocol.ts` as constants.
### Drain on `stop()`
```ts
async function stop(timeoutMs = config.LIVE_WS_DRAIN_TIMEOUT_MS) {
// 1. Stop accepting new connections.
httpServer.close();
// 2. Send close frame to every open connection.
for (const conn of connections.values()) {
conn.ws.close(1001, 'server shutting down');
}
// 3. Wait for them to finish, with timeout.
const deadline = Date.now() + timeoutMs;
while (connections.size > 0 && Date.now() < deadline) {
await sleep(50);
}
// 4. Force-terminate any stragglers.
for (const conn of connections.values()) {
conn.ws.terminate();
}
}
```
Stragglers happen when a client's TCP stack is slow or the network is partitioned. Force-terminate is the right call — we're shutting down anyway.
### Logger conventions
- `info`: `live server starting on :8081`, `live server ready`, `live server stopping`, `live server stopped`.
- `debug`: `connection opened id=... remote=...`, `connection closed id=... code=... reason=...`, `inbound message id=... type=...`.
- `trace`: per-message routing detail.
Don't log full WS payloads by default — they may contain large snapshot arrays in later tasks.
## Acceptance criteria
- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
- [ ] `pnpm dev` boots, logs both the consumer lifecycle (Phase 1) and the live server lifecycle.
- [ ] `wscat -c ws://localhost:8081` connects; sending a malformed JSON gets the `protocol-violation` error; sending `{"type":"subscribe","topic":"foo"}` gets `{"type":"error","code":"not-implemented"}`.
- [ ] Server pings within `LIVE_WS_PING_INTERVAL_MS` of connect; client pong updates `lastSeenAt`.
- [ ] `kill -TERM <pid>` exits cleanly within `LIVE_WS_DRAIN_TIMEOUT_MS + 1s`.
- [ ] `processor_live_connections` gauge moves up on connect, down on disconnect.
## Risks / open questions
- **Port conflict with metrics server.** Phase 1 binds `:9090` for metrics. Live server defaults to `:8081`. Both can be host-published or only the live one (metrics is internal-only). Document in `compose.yaml` updates that follow.
- **Reverse-proxy upgrade path.** Traefik / Caddy / nginx all support WS upgrade transparently if the path is configured for it. The proxy config lives in `trm/deploy`; this task doesn't touch it but the README's manual smoke test requires it for end-to-end.
- **Per-connection memory.** Each `LiveConnection` is small (~200 bytes plus the `ws` library's internal state). At 100 concurrent connections that's tens of KB. Not a concern at pilot scale.
## Done
Landed in `b8ebbd0`. Key deviations from spec:
- Used `crypto.randomUUID()` (Node 22 built-in) instead of `nanoid` — avoids adding a new npm dep beyond `ws`.
- `Metrics` moved to `src/shared/types.ts` (re-exported from `src/core/types.ts`) so `src/live/server.ts` can import it without violating the ESLint `import/no-restricted-paths` rule.
- `processor_live_connections` gauge and `processor_live_subscriptions` gauge are driven via `metrics.observe()` (which calls prom-client `.set()`) rather than `inc`/`dec` because the shared `Metrics` interface has no `dec` method.
@@ -0,0 +1,193 @@
# Task 1.5.2 — Cookie auth handshake
**Phase:** 1.5 — Live broadcast
**Status:** ⬜ Not started
**Depends on:** 1.5.1
**Wiki refs:** `docs/wiki/synthesis/processor-ws-contract.md` §Auth handshake; `docs/wiki/entities/directus.md`; `docs/wiki/entities/react-spa.md` §Auth pattern
## Goal
Authenticate WebSocket connections using the Directus-issued cookie attached to the upgrade request. Validate via a single `/users/me` round-trip to Directus; on success, bind the user identity to the `LiveConnection` for its lifetime; on failure, close with code `4401` before completing the upgrade.
After this task, anonymous connections are rejected — only Directus-authenticated users can hold an open WebSocket.
## Deliverables
- `src/live/auth.ts` exporting:
- `createAuthClient(config, logger): AuthClient` — factory.
- `AuthClient` interface: `validate(cookieHeader: string): Promise<AuthenticatedUser | null>`.
- `type AuthenticatedUser = { id: string; email: string; role: string | null; first_name: string | null; last_name: string | null }` — minimum fields used by the registry (1.5.3) for authorization decisions.
- `validate` returns `null` on any failure (network, 401, malformed response). Logs at `warn` with the failure reason.
- `src/live/server.ts` updated:
- `LiveConnection` gains a `user: AuthenticatedUser` field (no longer optional).
- The `'upgrade'` handler validates the cookie *before* calling `wss.handleUpgrade`. On `null`, write a 401 HTTP response on the raw socket and destroy it (this is how `ws` recommends rejecting upgrades cleanly).
- On success, pass the validated user through to the `'connection'` handler via `req[USER_KEY]`.
- New config keys (zod):
- `DIRECTUS_BASE_URL` (default `http://directus:8055`) — where to call `/users/me`.
- `DIRECTUS_AUTH_TIMEOUT_MS` (default `5_000`).
- New Prometheus metrics:
- `processor_live_auth_attempts_total{result}``success` / `unauthorized` / `error`.
- `processor_live_auth_latency_ms` (histogram).
- `test/live-auth.test.ts`:
- With a mocked Directus returning 200 + a user payload, `validate` returns the parsed user.
- With 401, returns `null` and increments `unauthorized` counter.
- With a network error, returns `null` and increments `error` counter (does not throw).
- With a 200 but malformed payload (no `id` field), returns `null` and logs at `warn`.
- The HTTP timeout is enforced (`AbortController` after `DIRECTUS_AUTH_TIMEOUT_MS`).
## Specification
### Cookie extraction
The browser attaches whatever cookies were set on the SPA's origin. Directus's refresh cookie default is named `directus_refresh_token`; the actual session is identified server-side via the access token in the `Authorization` header on REST calls — but for WebSocket upgrades there is no Authorization header, so we forward the cookie and let Directus handle session lookup.
```ts
function extractCookieHeader(req: IncomingMessage): string | null {
return req.headers.cookie ?? null;
}
```
If the header is missing entirely, fail fast — no point calling Directus.
### `/users/me` call
```ts
async function validate(cookieHeader: string): Promise<AuthenticatedUser | null> {
if (!cookieHeader) return null;
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), config.DIRECTUS_AUTH_TIMEOUT_MS);
const start = performance.now();
try {
const res = await fetch(`${config.DIRECTUS_BASE_URL}/users/me?fields=id,email,role,first_name,last_name`, {
method: 'GET',
headers: { cookie: cookieHeader },
signal: controller.signal,
});
if (res.status === 401 || res.status === 403) {
metrics.authAttempts.inc({ result: 'unauthorized' });
return null;
}
if (!res.ok) {
logger.warn({ status: res.status }, 'directus auth call returned non-2xx');
metrics.authAttempts.inc({ result: 'error' });
return null;
}
const body = await res.json();
const user = AuthenticatedUserSchema.safeParse(body.data);
if (!user.success) {
logger.warn({ issues: user.error.issues }, 'directus /users/me returned unexpected shape');
metrics.authAttempts.inc({ result: 'error' });
return null;
}
metrics.authAttempts.inc({ result: 'success' });
return user.data;
} catch (err) {
if ((err as Error).name === 'AbortError') {
logger.warn('directus auth call timed out');
} else {
logger.warn({ err }, 'directus auth call failed');
}
metrics.authAttempts.inc({ result: 'error' });
return null;
} finally {
clearTimeout(timer);
metrics.authLatency.observe(performance.now() - start);
}
}
```
Notes:
- **Field projection** (`?fields=...`) keeps the response small. The full user record has dozens of fields we don't need.
- **Forward the entire cookie header.** Directus may rotate the refresh cookie on this call (it shouldn't on `/users/me`, but be liberal); we ignore any `Set-Cookie` in the response — it's not our cookie to manage.
- **No retries.** A failed validation immediately closes the upgrade. The SPA will reconnect, which gives a natural retry. Don't add server-side retry logic — masks bugs and slows down the bad-credential case.
### Rejecting the upgrade
`ws` lets you reject by writing directly to the raw socket before `handleUpgrade`:
```ts
httpServer.on('upgrade', async (req, socket, head) => {
const cookie = extractCookieHeader(req);
const user = cookie ? await authClient.validate(cookie) : null;
if (!user) {
socket.write(
'HTTP/1.1 401 Unauthorized\r\n' +
'Content-Length: 0\r\n' +
'Connection: close\r\n' +
'\r\n'
);
socket.destroy();
return;
}
// Stash the user on the request object so the connection handler can pick it up.
(req as IncomingMessage & { user: AuthenticatedUser }).user = user;
wss.handleUpgrade(req, socket, head, (ws) => {
wss.emit('connection', ws, req);
});
});
wss.on('connection', (ws, req: IncomingMessage & { user: AuthenticatedUser }) => {
const conn: LiveConnection = {
id: nanoid(),
ws,
remoteAddr: req.socket.remoteAddress ?? 'unknown',
openedAt: new Date(),
lastSeenAt: new Date(),
user: req.user,
};
// ... rest of connection setup
});
```
### What `AuthenticatedUser` does and doesn't include
Include only fields the registry (1.5.3) and Phase 4 permissions will need:
```ts
const AuthenticatedUserSchema = z.object({
id: z.string().uuid(),
email: z.string().email().nullable(),
role: z.string().uuid().nullable(), // Directus role id, not the `organization_users.role` enum
first_name: z.string().nullable(),
last_name: z.string().nullable(),
});
```
Don't pull in `directus_users` extension fields or anything specific to the TRM domain — those are queried per-subscription, not per-connection.
### What we don't do (deferred)
- **No JWT validation locally.** The simplest path is the round-trip; cache only if the round-trip becomes a bottleneck (it won't at pilot scale).
- **No refresh handling.** The cookie's lifetime is the SPA's problem. If it expires mid-connection, server-side state is unaffected; the SPA will reconnect (which re-validates).
- **No revocation re-checks.** A user removed from the database mid-session keeps their WebSocket until they disconnect or the server restarts. Phase 4 hardening can add periodic re-validation if needed.
## Acceptance criteria
- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
- [ ] Connecting without a cookie returns HTTP 401 (visible in `wscat`'s output as a connection rejection with status code).
- [ ] Connecting with a stale/invalid cookie returns HTTP 401.
- [ ] Connecting with a valid cookie (obtained via Directus's `/auth/login` with `mode: cookie`) succeeds; the connection is logged with the user id.
- [ ] `processor_live_auth_attempts_total{result="success"}` increments on a successful upgrade.
- [ ] Auth latency p95 < 100ms against a stage-realistic Directus (single `/users/me` call against a warm DB).
## Risks / open questions
- **Directus base URL in dev vs stage vs prod.** In dev the SPA might run via Vite proxy at `localhost:5173`, with Directus at `localhost:8055`. The Processor's `DIRECTUS_BASE_URL` should always be the *internal* Compose-network URL (`http://directus:8055`) — that's the path with the lowest latency and no proxy hops. Document this in `.env.example`.
- **Cookie scope.** Directus issues the refresh cookie scoped to the public domain (e.g. `Domain=stage.trmtracking.org`). The Processor receives the same cookie because the upgrade request hits the same origin (proxy fronts both). Verify this works end-to-end during the integration test (1.5.6).
- **What if `/users/me` returns 200 with `data: null`?** Directus does this when the cookie is well-formed but the session is expired. Treat as `null` user (return `null`, log at `warn`).
## Done
Landed in `190254d`. Key deviations from spec:
- Added distinction between `data: null` (unauthorized / expired session) and missing `data` key (error / malformed response) — the task spec only mentioned `data: null` but the missing-key case is equally important.
- `authClient` is an optional parameter to `createLiveServer` (not required) so the existing unit tests that don't need auth work unchanged.
- Used the `satisfies` operator to pass the anonymous user placeholder at the no-auth code path for type safety.
@@ -0,0 +1,230 @@
# Task 1.5.3 — Subscription registry & per-event authorization
**Phase:** 1.5 — Live broadcast
**Status:** ⬜ Not started
**Depends on:** 1.5.2
**Wiki refs:** `docs/wiki/synthesis/processor-ws-contract.md` §Subscription model; `docs/wiki/concepts/live-channel-architecture.md` §Authorization flow; `docs/wiki/synthesis/directus-schema-draft.md`
## Goal
Handle `subscribe` / `unsubscribe` messages: validate the topic format, authorize the user against the topic's organization, maintain in-memory bidirectional indexes (`connection → topics`, `topic → connections`), and emit the appropriate `subscribed` / `unsubscribed` / `error` responses. Authorization is a single Directus call per subscription; no per-message auth.
After this task, a connected client can `subscribe` to an event they have permission for, get an immediate `subscribed` response, and the registry knows which connections want updates for which event. The actual fan-out and snapshot land in 1.5.4 and 1.5.5 respectively — this task just owns the bookkeeping.
## Deliverables
- `src/live/registry.ts` exporting:
- `createSubscriptionRegistry(authzClient, logger, metrics): SubscriptionRegistry` — factory.
- `SubscriptionRegistry` interface:
```ts
interface SubscriptionRegistry {
subscribe(conn: LiveConnection, topic: string, correlationId?: string): Promise<void>;
unsubscribe(conn: LiveConnection, topic: string, correlationId?: string): Promise<void>;
onConnectionClose(conn: LiveConnection): void; // remove from all topics
connectionsForTopic(topic: string): Iterable<LiveConnection>; // used by 1.5.4 fan-out
topicsForConnection(conn: LiveConnection): Iterable<string>;
stats(): { connections: number; topics: number; subscriptions: number };
}
```
- Topic format validator: `event:<uuid>` is the only accepted shape in v1; anything else returns `error/unknown-topic`.
- `src/live/authz.ts` exporting:
- `createAuthzClient(config, logger): AuthzClient` — factory.
- `AuthzClient.canAccessEvent(user: AuthenticatedUser, eventId: string): Promise<AuthzResult>` — `{ allowed: true } | { allowed: false; reason: 'forbidden' | 'not-found' | 'error' }`.
- `src/live/server.ts` updated: the `onMessage` placeholder from 1.5.1 is replaced with a real router that dispatches `subscribe` / `unsubscribe` to the registry, calls `registry.onConnectionClose` in the `'close'` event handler.
- New Prometheus metrics:
- `processor_live_subscriptions{instance_id}` (gauge) — current total subscriptions.
- `processor_live_subscribe_attempts_total{result}` — `success` / `forbidden` / `not-found` / `unknown-topic` / `error`.
- `processor_live_authz_latency_ms` (histogram).
- `test/live-registry.test.ts`:
- Subscribe to `event:<uuid>` with a permitted user → `subscribed` reply, registry counts go up.
- Subscribe to `event:<uuid>` with a forbidden user → `error/forbidden` reply, no registry change.
- Subscribe to `device:<imei>` → `error/unknown-topic`, no registry change.
- Subscribe twice to the same topic → idempotent (single subscription, single `subscribed` reply on each call).
- Unsubscribe from a topic the connection isn't subscribed to → `unsubscribed` reply (idempotent), no error.
- Connection close removes all subscriptions; gauges decrement correctly.
- `test/live-authz.test.ts`:
- `canAccessEvent` returns `allowed: true` when `/items/events/<id>` returns 200 (Directus enforces RLS via the cookie; if Directus says yes, we say yes).
- Returns `allowed: false, reason: 'forbidden'` on 403.
- Returns `allowed: false, reason: 'not-found'` on 404.
- Returns `allowed: false, reason: 'error'` on network failure or 5xx (does not throw).
## Specification
### Topic parsing
```ts
const EventTopicRegex = /^event:([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})$/i;
function parseTopic(topic: string): { kind: 'event'; eventId: string } | null {
const m = EventTopicRegex.exec(topic);
if (m) return { kind: 'event', eventId: m[1] };
return null; // unknown topic shape
}
```
Future shapes (`device:<imei>`, `entry:<uuid>`, `org:<uuid>`) get added here when they're needed. The unknown-topic path returns a clear error rather than silently failing — clients always know if they typed a topic the server doesn't understand.
### Authorization model
The simplest correct authorization: **delegate to Directus's REST API with the user's cookie**. If `GET /items/events/<eventId>` returns 200, the user has access (Directus's RLS already does the org-membership check). If 403, they don't.
```ts
async function canAccessEvent(user: AuthenticatedUser, eventId: string): Promise<AuthzResult> {
const start = performance.now();
try {
const res = await fetch(`${config.DIRECTUS_BASE_URL}/items/events/${eventId}?fields=id`, {
method: 'GET',
headers: { cookie: user.cookieHeader }, // see "Carrying the cookie" below
signal: AbortSignal.timeout(config.DIRECTUS_AUTHZ_TIMEOUT_MS ?? 5_000),
});
if (res.status === 200) return { allowed: true };
if (res.status === 403) return { allowed: false, reason: 'forbidden' };
if (res.status === 404) return { allowed: false, reason: 'not-found' };
return { allowed: false, reason: 'error' };
} catch {
return { allowed: false, reason: 'error' };
} finally {
metrics.authzLatency.observe(performance.now() - start);
}
}
```
**Field projection** (`?fields=id`) keeps the response tiny — we don't need the event details, just the access verdict.
### Carrying the cookie
The auth handshake (1.5.2) validated the cookie and discarded it. For per-subscription Directus calls we need the original cookie header. Two options:
**Option A: Stash on the connection.** When 1.5.2 succeeds, save `cookieHeader` on `LiveConnection`. Trade-off: cookie material lives in process memory for the connection's lifetime.
**Option B: Re-fetch via service account.** The Processor has its own credentials; at subscribe time, query as that service account with the user id as a filter. Trade-off: more complex, requires the Processor to have a Directus account with read access to all events.
**Pick Option A.** Simpler, more honest (the user's own permissions are the source of truth for authorization), and the cookie is already on this server — we received it at upgrade. Memory cost is negligible (a cookie header is typically 100500 bytes). Document that `LiveConnection` holds sensitive material and don't log it.
Update `LiveConnection` in `server.ts`:
```ts
export type LiveConnection = {
id: string;
ws: WebSocket;
remoteAddr: string;
openedAt: Date;
lastSeenAt: Date;
user: AuthenticatedUser;
cookieHeader: string; // ← added
};
```
And update 1.5.2's upgrade handler to pass the cookie through.
### Registry data structures
```ts
const connectionTopics = new WeakMap<LiveConnection, Set<string>>(); // conn → topics
const topicConnections = new Map<string, Set<LiveConnection>>(); // topic → conns
```
`WeakMap` for `connectionTopics` lets garbage collection clean up if a connection somehow leaks the explicit `onConnectionClose` call. `Set` semantics give idempotent subscribe/unsubscribe for free.
### Subscribe flow
```ts
async function subscribe(conn: LiveConnection, topic: string, correlationId?: string) {
const parsed = parseTopic(topic);
if (!parsed) {
sendOutbound(conn, { type: 'error', topic, id: correlationId, code: 'unknown-topic', message: 'Unknown topic format' });
metrics.subscribeAttempts.inc({ result: 'unknown-topic' });
return;
}
// Idempotent: already subscribed?
const existing = connectionTopics.get(conn);
if (existing?.has(topic)) {
// Re-send subscribed (snapshot will be fetched freshly in 1.5.5).
sendOutbound(conn, { type: 'subscribed', topic, id: correlationId, snapshot: [] });
return;
}
const verdict = await authzClient.canAccessEvent(conn.user, parsed.eventId);
if (!verdict.allowed) {
sendOutbound(conn, { type: 'error', topic, id: correlationId, code: verdict.reason });
metrics.subscribeAttempts.inc({ result: verdict.reason });
return;
}
// Insert into both indexes.
if (!existing) connectionTopics.set(conn, new Set());
connectionTopics.get(conn)!.add(topic);
if (!topicConnections.has(topic)) topicConnections.set(topic, new Set());
topicConnections.get(topic)!.add(conn);
metrics.subscriptions.inc();
metrics.subscribeAttempts.inc({ result: 'success' });
// 1.5.5 fills in the snapshot. For now, empty array.
sendOutbound(conn, { type: 'subscribed', topic, id: correlationId, snapshot: [] });
}
```
### Unsubscribe flow
```ts
async function unsubscribe(conn: LiveConnection, topic: string, correlationId?: string) {
connectionTopics.get(conn)?.delete(topic);
const conns = topicConnections.get(topic);
if (conns) {
conns.delete(conn);
if (conns.size === 0) topicConnections.delete(topic);
}
metrics.subscriptions.dec();
// Always reply, even if not subscribed (idempotent).
sendOutbound(conn, { type: 'unsubscribed', topic, id: correlationId });
}
```
### `onConnectionClose`
```ts
function onConnectionClose(conn: LiveConnection) {
const topics = connectionTopics.get(conn);
if (!topics) return;
for (const topic of topics) {
const conns = topicConnections.get(topic);
if (conns) {
conns.delete(conn);
if (conns.size === 0) topicConnections.delete(topic);
}
metrics.subscriptions.dec();
}
connectionTopics.delete(conn);
}
```
Hooked into the `ws.on('close', ...)` handler in `server.ts`.
## Acceptance criteria
- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
- [ ] `wscat` flow: connect with a valid cookie → `{"type":"subscribe","topic":"event:<existing-event-id>"}` → `{"type":"subscribed","topic":"event:<id>","snapshot":[]}`.
- [ ] Forbidden flow: same client subscribing to an event in a different org → `{"type":"error","code":"forbidden"}`.
- [ ] Unknown topic flow: `{"type":"subscribe","topic":"foo:bar"}` → `{"type":"error","code":"unknown-topic"}`.
- [ ] Unsubscribe flow: client gets `unsubscribed` reply; gauge `processor_live_subscriptions` decrements.
- [ ] Disconnect cleans up: `processor_live_subscriptions` returns to its pre-connection level after the client disconnects.
- [ ] Idempotency: subscribing twice to the same topic doesn't double-count in `processor_live_subscriptions`.
## Risks / open questions
- **Authz latency budget.** Each subscribe is one Directus call. At race-start with hundreds of viewers subscribing simultaneously, that's a thundering herd. Pilot scale (≤20 viewers per event) is fine. If we ever see a herd: cache `(userId, eventId) → verdict` for 60s with manual invalidation hooks. Defer until measured.
- **What if the user is removed from the org mid-subscription?** Their existing subscriptions keep delivering until they disconnect. Phase 4 hardening can add periodic re-checks. For pilot, "trust the session" is fine.
- **Filter subscriptions to the user's own entries vs all in-event?** Race directors want to see everyone; participants might want to see only their own crew. Current spec is "everyone in the event" — Phase 4 permissions can refine. Document that v1 is open within an event.
- **Wildcard topics.** Not in scope. If we ever need it, the topic parser is the place to add `event:*` → "every event in the user's orgs."
## Done
Landed in `38de4bc`. Key deviations from spec:
- `canAccessEvent` signature is `(cookieHeader: string, eventId: string)` rather than `(user: AuthenticatedUser, eventId: string)` because `AuthenticatedUser` doesn't carry the cookie — the cookie lives on `LiveConnection.cookieHeader`. The call site passes `conn.cookieHeader` directly.
- Added `'error'` to `ErrorCode` in `protocol.ts` to handle transient authz failures; the spec omitted this case.
- `SnapshotProvider` interface is defined in `registry.ts` and defaults to a stub returning `[]`; task 1.5.5 injects the real implementation via the optional parameter.
- Snapshot fetching is integrated into the subscribe flow already (calls `fetchSnapshot` on success), making task 1.5.5 a pure injection of a better provider rather than a structural change to the subscribe flow.
@@ -0,0 +1,225 @@
# Task 1.5.4 — Broadcast consumer group & fan-out
**Phase:** 1.5 — Live broadcast
**Status:** 🟩 Done
**Depends on:** 1.5.3
**Wiki refs:** `docs/wiki/synthesis/processor-ws-contract.md` §Streaming updates, §Multi-instance behaviour; `docs/wiki/concepts/live-channel-architecture.md` §Multi-instance Processor
## Goal
Read the same `telemetry:teltonika` Redis stream as Phase 1's durable-write consumer, but on a **per-instance** consumer group `live-broadcast-{instance_id}`, and fan out each position record to the connections subscribed to its event. The durable-write path is unaffected; this is an additional read with a different group name and a different sink.
After this task, a position published to Redis arrives on the SPA's WebSocket within ~50ms (in-process Redis read + per-event index lookup + JSON serialise + WS send).
## Deliverables
- `src/live/broadcast.ts` exporting:
- `createBroadcastConsumer(redis, registry, deviceToEvent, config, logger, metrics): BroadcastConsumer` — factory.
- `BroadcastConsumer` interface: `start(): Promise<void>`, `stop(): Promise<void>`. Same lifecycle shape as Phase 1's `Consumer`.
- The fan-out loop: read batch via `XREADGROUP`, for each record decode → look up the device's event → fetch subscribers → emit one `position` message per subscriber.
- `src/live/device-event-map.ts` exporting:
- `createDeviceEventMap(pool, config, logger): DeviceEventMap` — factory.
- `DeviceEventMap.lookup(deviceId: string): string[]` — returns the event IDs the device is registered to *right now*. Cached in memory; refreshed on a cadence (default every 30s) and on demand (via a Redis Stream invalidation signal — same pattern as Phase 1's `recompute:requests`, but for entry-device assignments). For pilot, the cadence-based refresh is enough; manual invalidation can land later.
- The query: `SELECT entry_devices.device_id, entries.event_id FROM entry_devices JOIN entries ON entries.id = entry_devices.entry_id`.
- `src/main.ts` updated to wire and start the broadcast consumer alongside the existing throughput consumer; SIGTERM stops both (live server first, broadcast consumer second, durable-write consumer last).
- New config keys (zod):
- `LIVE_BROADCAST_GROUP_PREFIX` (default `live-broadcast`) — full group name is `${prefix}-${INSTANCE_ID}`.
- `LIVE_BROADCAST_BATCH_SIZE` (default `100`).
- `LIVE_BROADCAST_BATCH_BLOCK_MS` (default `1000`).
- `LIVE_DEVICE_EVENT_REFRESH_MS` (default `30_000`).
- New Prometheus metrics:
- `processor_live_broadcast_records_total{instance_id}` (counter).
- `processor_live_broadcast_fanout_messages_total{instance_id}` (counter) — per outbound `position` frame sent.
- `processor_live_broadcast_orphan_records_total{instance_id}` (counter) — records for devices not registered to any event.
- `processor_live_broadcast_lag_ms` (histogram) — time from `XADD` (record's `ts` field) to fan-out send.
- `test/live-broadcast.test.ts`:
- With a fake stream entry for a device registered to `event:E1` and one subscriber to `event:E1`, fan-out sends one `position` message to that subscriber.
- Multiple subscribers on the same event each receive the message.
- A device registered to no event increments `orphan_records_total` and emits no message.
- Devices registered to multiple events emit one message per subscribing connection per event (i.e. a connection subscribed to both events for the same device receives two messages — they're per-topic).
## Specification
### Why a separate consumer group
Phase 1's durable-write consumer is on group `processor` (configurable, default in `tcp-ingestion` and matched in Processor). Two instances share that group and Redis splits records across them — exactly one instance handles each write.
Live broadcast needs the opposite: **every instance must see every record**, because each instance has its own connected clients. The clean way to do that with Redis Streams is one group per instance. Group name `live-broadcast-{instance_id}` ensures uniqueness; each instance's `XREADGROUP` gets the full firehose for that group.
The two reads are independent — the durable-write group's offset and the live-broadcast group's offset are separate. A slow durable write doesn't slow down broadcast and vice versa.
### Fan-out shape
```ts
async function runLoop() {
while (!stopping) {
let entries: StreamEntry[];
try {
entries = await redis.xreadgroup(
'GROUP', groupName, consumerName,
'COUNT', config.LIVE_BROADCAST_BATCH_SIZE,
'BLOCK', config.LIVE_BROADCAST_BATCH_BLOCK_MS,
'STREAMS', config.REDIS_TELEMETRY_STREAM, '>',
);
} catch (err) {
logger.error({ err }, 'broadcast XREADGROUP failed; backing off');
await sleep(1000);
continue;
}
if (!entries) continue;
for (const entry of decodeBatch(entries)) {
metrics.broadcastRecords.inc();
await fanOut(entry);
// ACK immediately — broadcast doesn't need durability semantics.
await redis.xack(config.REDIS_TELEMETRY_STREAM, groupName, entry.id);
}
}
}
async function fanOut(record: ConsumedRecord) {
const eventIds = deviceToEvent.lookup(record.position.deviceId);
if (eventIds.length === 0) {
metrics.broadcastOrphans.inc();
return;
}
const message = toPositionMessage(record.position); // shape per processor-ws-contract
for (const eventId of eventIds) {
const topic = `event:${eventId}`;
const conns = registry.connectionsForTopic(topic);
for (const conn of conns) {
sendOutbound(conn, { ...message, topic });
metrics.broadcastFanout.inc();
}
}
metrics.broadcastLag.observe(Date.now() - record.position.ts);
}
```
### Why ACK immediately
Phase 1's durable-write consumer ACKs only after Postgres confirms the write — that's the `XACK` discipline that protects against data loss. The broadcast consumer has different durability semantics: **a missed broadcast is acceptable.** If a position fails to fan out (because the connection crashed mid-send, say), the next position is what matters. Don't keep a pending entry just to retry an obsolete record.
ACK-on-consume keeps the broadcast group's PEL empty, prevents pending-entry buildup, and avoids the "send the same position twice on retry" anti-feature. Phase 3 hardening can revisit if we ever need broadcast guarantees.
### `DeviceEventMap` design
The fan-out path needs to answer "which events does this device belong to?" thousands of times per second. The naive answer — query Postgres on each record — is wrong. Two options:
**Option A: In-process cache with periodic refresh.** Load the full `entry_devices` `entries` join at startup; refresh every 30s. Stale data window: up to 30s. **Pick this for pilot.**
**Option B: Listen for changes.** Add a `entry-devices:changed` Redis Stream (or use Directus Flows to publish on writes); broadcast invalidates affected entries. Stale data window: ~50ms. Adds protocol surface and a coordination point.
For pilot: Option A. The 30s staleness window is acceptable — operators register devices before the event starts, and "you registered a new device 30s ago and it's not on the map yet" is a tolerable UX. Phase 3+ can promote to Option B if real-time registration matters.
```ts
class DeviceEventMap {
private map = new Map<string, Set<string>>(); // deviceId → Set<eventId>
private timer: NodeJS.Timeout | null = null;
async start() {
await this.refresh();
this.timer = setInterval(() => {
this.refresh().catch(err => logger.warn({ err }, 'device-event map refresh failed'));
}, config.LIVE_DEVICE_EVENT_REFRESH_MS);
}
async stop() {
if (this.timer) clearInterval(this.timer);
}
async refresh() {
const start = performance.now();
const result = await pool.query<{ device_id: string; event_id: string }>(
`SELECT ed.device_id, e.event_id
FROM entry_devices ed
JOIN entries e ON e.id = ed.entry_id`
);
const next = new Map<string, Set<string>>();
for (const row of result.rows) {
if (!next.has(row.device_id)) next.set(row.device_id, new Set());
next.get(row.device_id)!.add(row.event_id);
}
this.map = next;
metrics.deviceEventRefreshLatency.observe(performance.now() - start);
metrics.deviceEventEntries.set(next.size);
logger.debug({ devices: next.size }, 'device-event map refreshed');
}
lookup(deviceId: string): string[] {
return Array.from(this.map.get(deviceId) ?? []);
}
}
```
### Back-pressure
If a connection's send queue is backing up (slow client, slow network), the WS library queues messages in process memory. At 100 msg/s × 10s of slow consumer = 1000 queued messages × ~200 bytes each = 200KB per slow connection. Tolerable.
If we ever see real back-pressure problems: per-connection bounded queue (e.g. last 100 positions per device, dropping older), with a metric `processor_live_broadcast_dropped_total`. Document but defer.
For now: rely on `ws.bufferedAmount` to detect slow consumers; if it exceeds a threshold (say 1MB), close the connection with code 1008 (policy violation) and log. Client reconnects. Worth implementing as a defensive measure even for pilot — prevents one slow client from eating all the memory.
```ts
function sendOutbound(conn: LiveConnection, msg: OutboundMessage) {
if (conn.ws.readyState !== WebSocket.OPEN) return;
if (conn.ws.bufferedAmount > config.LIVE_WS_BACKPRESSURE_THRESHOLD_BYTES) {
logger.warn({ connId: conn.id, buffered: conn.ws.bufferedAmount }, 'closing slow connection');
conn.ws.close(1008, 'back-pressure threshold exceeded');
return;
}
conn.ws.send(JSON.stringify(msg));
metrics.liveMessagesOutbound.inc({ type: msg.type });
}
```
(Update 1.5.1's `sendOutbound` to include this check; add `LIVE_WS_BACKPRESSURE_THRESHOLD_BYTES` config with default `1_048_576`.)
### `toPositionMessage`
```ts
function toPositionMessage(p: Position): Omit<PositionMessage, 'topic'> {
const msg: any = {
type: 'position',
deviceId: p.deviceId,
lat: p.latitude,
lon: p.longitude,
ts: p.ts.getTime(), // epoch ms; contract is number, not ISO string
};
if (p.speed != null) msg.speed = p.speed;
if (p.course != null) msg.course = p.course;
if (p.accuracy != null) msg.accuracy = p.accuracy;
if (p.attributes && Object.keys(p.attributes).length > 0) msg.attributes = p.attributes;
return msg;
}
```
Per the contract: omit fields rather than send `null` for absent values.
## Acceptance criteria
- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
- [ ] `pnpm dev` boots; logs show both consumer groups joining (`processor` and `live-broadcast-{instance_id}`).
- [ ] With a subscribed `wscat` client and a synthetic position published to `telemetry:teltonika`, the client receives a `{"type":"position",...}` frame within ~100ms.
- [ ] A second `wscat` client subscribed to the same event also receives the message.
- [ ] An orphan position (device not in any `entry_devices` row) increments `processor_live_broadcast_orphan_records_total` and emits no WS message.
- [ ] After 30s, modifying `entry_devices` directly in Postgres and publishing a position routes correctly to the new event's subscribers.
- [ ] Broadcast lag p50 < 100ms, p95 < 500ms with a small subscriber count (≤20).
## Risks / open questions
- **30s staleness window** is acceptable for pilot but worth surfacing in operator docs. "If you just registered a device, wait 30s before expecting it on the map" is a reasonable line in the dogfood README.
- **Memory cost of `DeviceEventMap`.** For 500 devices × 10 events average, ~5000 entries. Trivial.
- **What about devices registered to *multiple* events at the same time?** Schema allows it (one device on multiple `entry_devices` rows). Fan-out handles it: each event's subscribers get the message. The SPA may want to filter by event on its end if it's showing a single event.
- **Memory leak from `topicConnections` if registry isn't cleaning up.** Defensive: log a warning if `registry.stats().topics` exceeds a sanity threshold (e.g. 1000) to surface a leak before it OOMs.
## Done
Landed in `c07ea0e`. Key implementation decisions:
- `CodecError`/`decodePosition` moved to `src/shared/codec.ts`; `Position`/`AttributeValue` moved to `src/shared/types.ts`. Both `src/core/` and `src/live/` re-export from shared to preserve existing import paths.
- `broadcast.ts` ACKs all stream entries immediately (durability not needed for fan-out).
- Test uses a `stopSignal` Promise to coordinate between the broadcast loop and the test's `stop()` call, avoiding the tight-loop OOM that naive polling triggers.
@@ -0,0 +1,145 @@
# Task 1.5.5 — Snapshot-on-subscribe
**Phase:** 1.5 — Live broadcast
**Status:** 🟩 Done
**Depends on:** 1.5.3, 1.4 (Postgres pool)
**Wiki refs:** `docs/wiki/synthesis/processor-ws-contract.md` §Server response — subscribed
## Goal
When a client subscribes to `event:<eventId>`, return the **latest known position for every device registered to that event** as part of the `subscribed` response. Without it, the SPA opens to a black map and only fills in as devices report — feels broken.
The snapshot is a one-time read at subscribe time. After that, positions stream live via the broadcast consumer (1.5.4). The two paths together give the SPA a "fully populated map immediately, then live updates" experience.
## Deliverables
- `src/live/snapshot.ts` exporting:
- `createSnapshotProvider(pool, logger, metrics): SnapshotProvider` — factory.
- `SnapshotProvider.forEvent(eventId: string): Promise<PositionSnapshotEntry[]>` — returns the latest position per device registered to the event. Empty array if no devices or no positions yet.
- `type PositionSnapshotEntry = { deviceId: string; lat: number; lon: number; ts: number; speed?: number; course?: number; accuracy?: number; attributes?: Record<string, unknown> }` — same shape as the streaming `position` message minus the `type` and `topic` fields (the envelope wraps them).
- `src/live/registry.ts` updated: the `subscribe` method calls `snapshot.forEvent(eventId)` after authorization succeeds and includes the result in the `subscribed` response. Authorization happens *before* the snapshot query so a forbidden user doesn't pay the snapshot cost.
- New Prometheus metrics:
- `processor_live_snapshot_query_latency_ms` (histogram).
- `processor_live_snapshot_size` (histogram) — number of positions in each snapshot.
- `test/live-snapshot.test.ts`:
- With three devices in an event, two of which have positions, returns two snapshot entries.
- With an event that has no `entry_devices` rows, returns `[]`.
- With devices that have positions but `faulty=true`, those positions are excluded.
- The query returns the *most recent non-faulty* position per device (not just the most recent overall — `ORDER BY ts DESC` with a `WHERE faulty = false` filter).
## Specification
### The query
The snapshot needs the latest non-faulty position per device, scoped to one event. Postgres-canonical for "latest per group" is `DISTINCT ON`:
```sql
SELECT DISTINCT ON (p.device_id)
p.device_id,
p.latitude,
p.longitude,
p.ts,
p.speed,
p.course,
p.accuracy,
p.attributes
FROM positions p
JOIN entry_devices ed ON ed.device_id = p.device_id
JOIN entries e ON e.id = ed.entry_id
WHERE e.event_id = $1
AND p.faulty = false
ORDER BY p.device_id, p.ts DESC;
```
### Why `DISTINCT ON`
`DISTINCT ON (device_id) ... ORDER BY device_id, ts DESC` returns the row with the highest `ts` per `device_id`. The alternatives (`GROUP BY` + `MAX(ts)` + self-join, or window functions with `ROW_NUMBER()`) all produce the same result with worse query plans on a TimescaleDB hypertable. `DISTINCT ON` is Postgres-specific but we're committed to Postgres.
### Performance
On a TimescaleDB hypertable, the index that makes this fast is `(device_id, ts DESC)`. Phase 1 task 1.4 created the hypertable; verify the index exists. If not, add it as a migration in this task:
```sql
CREATE INDEX IF NOT EXISTS positions_device_ts_idx ON positions (device_id, ts DESC);
```
Without the index, `DISTINCT ON` does a sequential scan per `device_id` group. With it, the scan is bounded by the chunk containing the most recent position per device — typically the latest one or two chunks.
For 500 devices in an event, the query should complete in < 50ms on a warm cache.
### Faulty-filter semantics
The `faulty` column is set post-hoc by operators when a position is unrealistic ([[directus]] entity page §"Faulty position handling"). Any read path that surfaces position data to operators must filter `faulty = false`:
- **Snapshot:** filter (this task).
- **Live broadcast:** doesn't apply — the broadcast consumer reads from Redis, not from `positions`. By the time a position is in Redis (and being streamed), no one has had the chance to flag it.
- **Replay (future):** filter when implemented.
### Where the snapshot is wired into the registry
The `subscribed` response in 1.5.3 currently sends `snapshot: []`. Update:
```ts
// In registry.ts, inside subscribe() after authorization succeeds:
let snapshot: PositionSnapshotEntry[] = [];
const start = performance.now();
try {
snapshot = await snapshotProvider.forEvent(parsed.eventId);
metrics.snapshotSize.observe(snapshot.length);
} catch (err) {
logger.warn({ err, eventId: parsed.eventId }, 'snapshot query failed');
// Fall through with empty snapshot — better to subscribe without a snapshot
// than to fail the subscription entirely.
} finally {
metrics.snapshotLatency.observe(performance.now() - start);
}
sendOutbound(conn, { type: 'subscribed', topic, id: correlationId, snapshot });
```
The "fail open" choice on snapshot errors is deliberate: a subscribe that returned `subscribed` with an empty snapshot is recoverable (live updates still work; the SPA just sees a sparser-than-expected initial state). A subscribe that errors out forces the SPA to retry, which masks the underlying snapshot failure.
### What the snapshot does NOT include
- **Position history.** Just the *latest* position per device. Trail rendering on the SPA reads the previous N positions from its own ring buffer as new positions stream in. No bulk-history endpoint in v1.
- **Device metadata** (model, IMEI, vehicle assignment). The SPA fetches that separately via Directus REST/SDK and joins on `deviceId` client-side.
- **Faulty positions.** Filtered.
- **Stale positions.** A position from 3 days ago is still "the latest" if the device hasn't reported since. The SPA should display "last seen N hours ago" indicators based on the `ts` field.
### Snapshot field shape
Mirror the `position` streaming message exactly except for the envelope:
```ts
const SnapshotEntrySchema = z.object({
deviceId: z.string().uuid(),
lat: z.number(),
lon: z.number(),
ts: z.number(), // epoch ms
speed: z.number().optional(),
course: z.number().optional(),
accuracy: z.number().optional(),
attributes: z.record(z.unknown()).optional(),
});
```
Same field-omission convention: don't emit `null` for absent values.
## Acceptance criteria
- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
- [ ] Manual: with the seeded Rally Albania 2026 event (3 registered devices, some positions in `positions`), subscribing returns a snapshot with the registered devices' latest positions.
- [ ] Subscribing to an event with no positions returns `subscribed` with `snapshot: []`.
- [ ] Manually marking a position `faulty=true` excludes it from the next snapshot (the snapshot returns the most recent non-faulty position for that device, or omits the device if none exists).
- [ ] Snapshot query latency p95 < 100ms with the index in place; without the index the test should fail loudly so we don't ship without it.
- [ ] Snapshot failure (e.g. simulated Postgres timeout) does not fail the subscription; client receives `subscribed` with `snapshot: []` and the live stream still works.
## Risks / open questions
- **Snapshot size on a large event.** 500 devices × ~200 bytes per entry = ~100KB JSON payload. Tolerable. If we ever push to 5000 devices on a single event, consider streaming the snapshot in chunks via multiple `subscribed` frames. Out of scope for now.
- **Positions on `positions` table that pre-date the device's registration to the event.** The JOIN catches them — if the device was on `entry_devices` for that event today, its 3-month-old positions still match. Acceptable behaviour; the operator's mental model is "this device has been in the system that long."
- **Trade-off with `DISTINCT ON` and TimescaleDB chunks.** TimescaleDB partitions by time; `DISTINCT ON (device_id) ORDER BY device_id, ts DESC` may need to scan multiple chunks if the latest position for some devices is older than the most recent chunk. For an active event this is the same chunk for everyone. For a long-tail of stale devices, multiple chunks may be touched. Acceptable.
## Done
Landed in `f4b50ca`. `src/live/snapshot.ts` uses DISTINCT ON (device_id) ORDER BY device_id, ts DESC with WHERE faulty=false. Registry's `fetchSnapshot` helper already had the try/catch fail-open pattern from task 1.5.3. `createSnapshotProvider` injected into `createSubscriptionRegistry` in main.ts.
@@ -0,0 +1,192 @@
# Task 1.5.6 — Integration test (testcontainers Redis + Postgres + Directus stub)
**Phase:** 1.5 — Live broadcast
**Status:** 🟩 Done
**Depends on:** 1.5.4, 1.5.5
**Wiki refs:**
## Goal
End-to-end pipeline test: spin up Redis 7 + TimescaleDB + a stub HTTP server impersonating Directus's `/users/me` and `/items/events/<id>` endpoints, boot the Processor against them, connect a real WebSocket client with a cookie, subscribe to an event, publish a synthetic position to `telemetry:teltonika`, verify the client receives both the snapshot and the streamed position.
This is the test that proves the live channel composes correctly end-to-end — auth handshake, subscription registry, snapshot, broadcast fan-out all integrated. Mirror Phase 1's `pipeline.integration.test.ts` for structure and skip-on-no-Docker pattern.
## Deliverables
- `test/live.integration.test.ts`:
- `beforeAll`: start Redis + TimescaleDB containers, start a tiny HTTP server impersonating Directus on a random port (acts as the auth + authz endpoint), seed `entry_devices` + `entries` + a few `positions` rows, boot a Processor instance pointed at all three. Skip cleanly if Docker is unavailable.
- `afterAll`: stop the Processor, the Directus stub, and both containers.
- **Test 1 — Happy path:** WS client connects with a valid cookie → subscribes to a seeded event → receives `subscribed` with a non-empty snapshot containing the seeded positions → publishes a synthetic position to Redis → receives the corresponding `position` frame within 1s.
- **Test 2 — Auth rejection:** WS client connects without a cookie → upgrade fails with HTTP 401.
- **Test 3 — Forbidden subscription:** Client with a cookie scoped to user A → subscribes to an event in an organization user A doesn't belong to → receives `error/forbidden` (Directus stub returns 403 for that user-event pair).
- **Test 4 — Multi-client fan-out:** Two clients subscribed to the same event → publishing one position results in both clients receiving the `position` frame.
- **Test 5 — Orphan position:** Publishing a position for a device that's not on `entry_devices` increments `processor_live_broadcast_orphan_records_total` and reaches no client.
- **Test 6 — Faulty-flagged snapshot exclusion:** Mark a seeded position `faulty=true` directly in Postgres, subscribe, verify the snapshot uses the next-most-recent non-faulty position (or omits the device if none exists).
- `test/helpers/directus-stub.ts`:
- A minimal Express-or-bare-`http.createServer` stub that responds to:
- `GET /users/me` — returns 200 + a fake user payload if a configured cookie is present, 401 otherwise.
- `GET /items/events/:id` — returns 200 if the (cookie, eventId) pair is in an allow-list, 403 otherwise.
- Exposed as `createDirectusStub({ allowedCookieToUser: Map<string, FakeUser>, allowedEvents: Map<userId, Set<eventId>> }): { url: string; close: () => Promise<void> }`.
- `vitest.integration.config.ts` — the Phase 1 config already exists; extend the `testTimeout` if needed (the live test may need ~30s for the first-position round-trip on a cold cache).
## Specification
### Skip-on-no-Docker pattern
Same as Phase 1's `pipeline.integration.test.ts`:
```ts
let dockerAvailable = true;
beforeAll(async () => {
try {
redisContainer = await new GenericContainer('redis:7').withExposedPorts(6379).start();
} catch (err) {
dockerAvailable = false;
console.warn('docker unavailable; skipping live integration tests');
return;
}
// ... rest of setup
}, 120_000);
it('happy path', async () => {
if (!dockerAvailable) return;
// ... real test
});
```
### Directus stub shape
The stub is intentionally tiny — we're not testing Directus, we're testing the Processor's interaction with whatever Directus returns. Two endpoints, hardcoded responses:
```ts
function createDirectusStub(opts: StubOpts): { url: string; close: () => Promise<void> } {
const server = http.createServer(async (req, res) => {
const cookie = req.headers.cookie ?? '';
const user = opts.allowedCookieToUser.get(cookie);
if (req.url === '/users/me') {
if (!user) {
res.writeHead(401).end();
return;
}
res.writeHead(200, { 'content-type': 'application/json' });
res.end(JSON.stringify({ data: user }));
return;
}
const eventMatch = /^\/items\/events\/([0-9a-f-]+)/.exec(req.url ?? '');
if (eventMatch) {
if (!user) { res.writeHead(401).end(); return; }
const eventId = eventMatch[1];
const allowed = opts.allowedEvents.get(user.id)?.has(eventId);
if (!allowed) { res.writeHead(403).end(); return; }
res.writeHead(200, { 'content-type': 'application/json' });
res.end(JSON.stringify({ data: { id: eventId } }));
return;
}
res.writeHead(404).end();
});
return new Promise((resolve) => {
server.listen(0, () => {
const addr = server.address() as AddressInfo;
resolve({
url: `http://localhost:${addr.port}`,
close: () => new Promise((res) => server.close(() => res())),
});
});
});
}
```
This is ~40 lines of test infra. Don't pull in Express; bare `http` is enough.
### Seeding data
The integration test needs realistic-ish seed data: at least one organization, one event, two `entries`, four `entry_devices` (so at least one device-to-event mapping per entry), and some positions for some of the devices.
Use a seed helper:
```ts
async function seed(pool: Pool) {
await pool.query(`INSERT INTO organizations (id, name, slug) VALUES ($1, 'Test Org', 'test-org')`, [TEST_ORG_ID]);
await pool.query(`INSERT INTO events (id, organization_id, name, slug, discipline, starts_at, ends_at)
VALUES ($1, $2, 'Test Event', 'test-event', 'rally', '2026-01-01', '2026-12-31')`, [TEST_EVENT_ID, TEST_ORG_ID]);
// ... etc.
await pool.query(`INSERT INTO positions (device_id, ts, latitude, longitude, faulty)
VALUES ($1, '2026-05-02T11:00:00Z', 41.327, 19.819, false),
($1, '2026-05-02T11:01:00Z', 41.328, 19.820, false),
($2, '2026-05-02T11:00:30Z', 41.330, 19.825, false)`,
[TEST_DEVICE_1, TEST_DEVICE_2]);
}
```
Schema-creation in the integration test reuses the same migration runner that production uses. **Don't reach into `db-init/` or Directus's snapshot YAML** from this test — the test is for the Processor, not the schema. Stub the minimum subset of Directus-managed tables in a setup migration that runs only in the test environment, OR (cleaner) point the test's `pool` at a Postgres that already has the schema loaded via a fixture SQL file.
The cleanest option: a single `test/fixtures/test-schema.sql` file that creates the minimum subset (organizations, events, entries, entry_devices, devices, positions) the integration test needs. Run it once in `beforeAll`. The duplication with the real schema is bounded — these collections are stable.
### WebSocket client
Use `ws`'s client mode (`new WebSocket(url, { headers: { cookie: '...' } })`). Set up an `on('message')` listener that pushes to an array; tests read from the array with a `waitForMessage(predicate, timeout)` helper:
```ts
async function waitForMessage<T>(
ws: WebSocket,
predicate: (msg: any) => msg is T,
timeoutMs: number = 5_000
): Promise<T> {
return new Promise((resolve, reject) => {
const timer = setTimeout(() => reject(new Error('timeout waiting for message')), timeoutMs);
const handler = (data: WebSocket.Data) => {
const msg = JSON.parse(data.toString());
if (predicate(msg)) {
clearTimeout(timer);
ws.off('message', handler);
resolve(msg);
}
};
ws.on('message', handler);
});
}
```
This pattern is robust across the test suite — every test waits for a specific message shape, with a clear timeout error if the protocol breaks.
### Synthetic position publishing
Reuse the `XADD` shape from Phase 1's `pipeline.integration.test.ts`. Helper:
```ts
async function publishPosition(redis: Redis, position: Position) {
await redis.xadd(
config.REDIS_TELEMETRY_STREAM,
'*',
'ts', position.ts.toISOString(),
'device_id', position.deviceId,
'codec', '8E',
'payload', JSON.stringify(serializeForStream(position)),
);
}
```
The `serializeForStream` helper handles the bigint/Buffer sentinel encoding (already exists in Phase 1; reuse it).
## Acceptance criteria
- [ ] `pnpm test:integration -- live` runs all six scenarios green when Docker is available.
- [ ] Without Docker, the suite logs skip messages and exits 0 (does not fail).
- [ ] First-run total time < 60s including container pulls; subsequent runs < 20s.
- [ ] Each test cleans up after itself — no shared state between tests.
- [ ] Tests don't depend on each other's order.
## Risks / open questions
- **Schema duplication.** `test/fixtures/test-schema.sql` will drift from the real schema unless we have a discipline. Mitigation: comment at the top of the fixture says "this is a subset of the production schema for testing only; sync when production schema changes." Worth documenting in `OPERATIONS.md` (Phase 3) as a maintenance task.
- **Test flakiness from polling.** Same caveat as Phase 1: prefer `waitForMessage` over `await sleep(N)`. The latter is reliably wrong.
- **Image pull times in CI.** TimescaleDB image is large (~700MB). If integration tests run in CI, pre-pull. Phase 1's CI doesn't run integration; this phase doesn't change that — local + manual stage smoke is the gate.
## Done
Landed in `2f2cf5c`. Key divergence from spec: `test/fixtures/test-schema.sql` uses `entry_devices.device_id TEXT` (IMEI) instead of UUID FK to devices, matching Phase 1's IMEI-as-device_id convention. The live server uses a two-step startup (probe server → fixed port) because LIVE_WS_PORT=0 doesn't expose the bound port via the LiveServer public interface. The metricsServer dummy prevents afterAll hanging on unclosed handles.
@@ -0,0 +1,97 @@
# Phase 1.5 — Live broadcast
Implement the WebSocket endpoint that fans live position updates from the Processor to subscribed [[react-spa]] clients. Layered on top of Phase 1's throughput pipeline; logically between Phase 1 (throughput) and Phase 2 (domain logic). The wire spec is locked at `docs/wiki/synthesis/processor-ws-contract.md`.
## Why a separate phase
Phase 1's outcome was Redis → Postgres only; the WebSocket fan-out side of [[processor]] was wiki-canonical (`docs/wiki/concepts/live-channel-architecture.md`) but had no implementation task. Phase 2 is gated on Directus schema decisions and is a substantial domain-logic chunk; bundling the WebSocket work into it would couple two unrelated workstreams.
This phase is small, self-contained, and unblocks the [[react-spa]]'s live-map feature for the Rally Albania 2026 dogfood. It does **not** touch domain logic or the Phase 1 throughput path.
## Outcome statement
When Phase 1.5 is done:
- The Processor exposes a WebSocket endpoint (path TBD by the reverse proxy; same origin as [[directus]] and the SPA bundle so the auth cookie flows automatically).
- Connections authenticate via the Directus-issued cookie attached to the WebSocket upgrade request. Validation is a single `/users/me` round-trip to [[directus]] at connect time; the validated user identity is bound to the connection for its lifetime.
- Clients subscribe to `event:<eventId>` topics. Per-event authorization is checked once at subscribe time (does the user belong to the event's organization?). Multiple subscriptions per connection are supported.
- On `subscribed`, the server returns a snapshot of the latest known position for every device registered to the event (via `entry_devices``entries``events`). After the snapshot, position records stream as they arrive on Redis.
- A second consumer group `live-broadcast-{instance_id}` reads the same `telemetry:teltonika` stream as the durable-write group (`processor`), but per-instance — every Processor instance reads every record for its own connected clients. The durable-write path is unaffected.
- 30s server-side ping; client-side liveness check on 60s message-gap; backoff reconnect on close.
- All of this is covered by an end-to-end integration test (testcontainers Redis + Postgres + a Directus auth stub).
## Sequencing
```
1.5.1 WS server scaffold + heartbeat
└─→ 1.5.2 Cookie auth handshake
└─→ 1.5.3 Subscription registry & authorization
├─→ 1.5.4 Broadcast consumer group & fan-out
├─→ 1.5.5 Snapshot-on-subscribe
└─→ 1.5.6 Integration test (depends on 1.5.4 + 1.5.5)
```
1.5.4 and 1.5.5 can be developed in parallel after 1.5.3 lands.
## Files modified
This phase adds these to the existing `processor/` layout:
```
processor/
├── src/
│ ├── core/
│ │ └── ... (unchanged from Phase 1)
│ ├── live/
│ │ ├── server.ts # WS server, heartbeat, lifecycle
│ │ ├── auth.ts # cookie → /users/me → user identity
│ │ ├── registry.ts # subscriptions: connection→topics, topic→connections
│ │ ├── broadcast.ts # live-broadcast consumer group + fan-out loop
│ │ ├── snapshot.ts # latest-position-per-device query
│ │ └── protocol.ts # zod schemas for the wire format (subscribe/position/etc.)
│ ├── db/
│ │ └── ... (unchanged)
│ └── main.ts # wires the live server alongside the consumer
└── test/
├── live-server.test.ts # mocked: heartbeat, lifecycle, message routing
├── live-auth.test.ts # mocked Directus client
├── live-registry.test.ts # subscribe/unsubscribe semantics
├── live-snapshot.test.ts # query shape
└── live.integration.test.ts # end-to-end with testcontainers
```
## Tech stack additions
- **`ws`** — minimal, mature WebSocket server. Plays naturally with `http.createServer` (already used by Phase 1's metrics/health server).
- **No HTTP client library.** Node 22's global `fetch` is sufficient for the `/users/me` and `/items/events/<id>` calls to Directus.
- **`zod`** (already a Phase 1 dep) — runtime validation of inbound WS messages. Strict schemas; reject unknown fields.
No new test dependencies. `vitest` + `testcontainers` already cover what's needed.
## Non-negotiable design rules
These rules govern every task in this phase. Any deviation must be discussed and documented before code lands.
1. **Live work is isolated.** `src/live/` cannot import from `src/core/` and vice versa, with one exception: `src/db/pool.ts` is shared. The Phase 1 throughput pipeline must run unchanged whether or not the live server starts, and vice versa. Enforced by `import/no-restricted-paths` ESLint config.
2. **Authorization is checked once at subscribe time.** Never per record. The hot fan-out path is `O(records × subscribed-clients-per-event)` with zero Directus calls.
3. **Subscription state is in-memory.** No durable subscription store. Reconnect re-subscribes; instance failure means a brief gap and a reconnect.
4. **Always-fresh, not always-deliver.** If a slow consumer can't drain its queue, drop oldest position messages for that connection — latest-position-per-device is what matters. Control messages (`subscribed`/`unsubscribed`/`error`) are guaranteed.
5. **Single origin.** The endpoint is reachable only at the same origin as Directus and the SPA bundle. Cross-origin won't carry the cookie. The reverse-proxy config is responsible for the routing; the Processor binds to a port and trusts the proxy to forward correctly.
6. **No business logic.** This phase ships the protocol and the fan-out plumbing. Nothing in `src/live/` should know what an `entries.race_number` is or what a `class_id` means. Phase 2 may add domain-aware filtering (e.g. "subscribe to a specific class within an event") — out of scope here.
## Key design references (read before starting any task)
- `docs/wiki/synthesis/processor-ws-contract.md` — the wire spec. Authoritative.
- `docs/wiki/concepts/live-channel-architecture.md` — the architectural rationale; explains why this lives in the Processor at all.
- `docs/wiki/entities/processor.md` — the entity-level summary, including the multi-instance consumer-group split.
- `docs/wiki/entities/directus.md` — the auth source; explains how the cookie is issued and what `/users/me` returns.
- `docs/wiki/entities/react-spa.md` — the consumer; `Auth pattern` and `Real-time rendering` sections describe the SPA-side handshake and the rAF coalescer that shapes our delivery cadence.
## Acceptance for the phase as a whole
- [ ] All six task files done.
- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean across the new code.
- [ ] `pnpm test:integration` runs the live-pipeline end-to-end test green.
- [ ] Manual smoke: with stage Directus + stage Processor + a `wscat` client carrying a valid cookie, can connect, subscribe to the Rally Albania 2026 event, see snapshot, see streamed positions when synthetic positions are published to Redis.
- [ ] No regressions in Phase 1's throughput tests; the durable-write path is unchanged.
- [ ] `docs/wiki/synthesis/processor-ws-contract.md` Implementation status section updated to reflect "implemented in Phase 1.5".
@@ -0,0 +1,169 @@
# Task 3.5 — Retire processor migration runner
**Phase:** 3 — Production hardening
**Status:** ⬜ Not started
**Depends on:** Phase 1.5 ideally landed (avoid mid-flight code churn for the agent shipping the WS endpoint). No hard code dependency.
**Replaces:** the original 3.5 sketch ("Migration advisory lock"). Once the processor doesn't run migrations, the lock concern is delegated to Directus's `db-init` runner — outside this service's surface.
**Wiki refs:** `docs/wiki/entities/processor.md` §"Schema ownership vs. write access" (the line that needs to change), `docs/wiki/entities/directus.md` §"Schema management — snapshot/apply pipeline", `docs/wiki/entities/postgres-timescaledb.md`
## Goal
Establish [[directus]] as the **single owner of all DDL** against the shared Postgres database. Retire the processor's migration runner. After this task, the only DDL paths are:
- `trm/directus/db-init/*.sql` (pre-schema: extensions, hypertables, raw tables Directus's snapshot-yaml format can't express).
- `trm/directus/snapshots/schema.yaml` (Directus-managed user collections).
- `trm/directus/db-init-post/*.sql` (post-schema: composite UNIQUE constraints on Directus-managed tables).
Processor exclusively does `INSERT` / `SELECT` / `UPDATE`. No `CREATE`, `ALTER`, `CREATE EXTENSION`, or any other DDL.
## Context — why this exists
The current state has **both** services creating the `positions` hypertable and the `faulty` column:
- `trm/processor/src/db/migrations/0001_positions.sql` and `0002_positions_faulty.sql` (processor's runner, from Phase 1 task 1.4 + the recent 1.5.5 prep).
- `trm/directus/db-init/001_extensions.sql`, `002_positions_hypertable.sql`, `003_faulty_column.sql` (directus's runner, added later when the destructive-apply incident showed positions had to exist *before* `directus schema apply` runs or it would get wiped).
Both runners are idempotent (`IF NOT EXISTS`, etc.) so the runtime collision is benign at the moment, but the architectural risks are real:
- **Two sources of truth.** Adding a column means editing two files in two repos; either one can drift silently.
- **Schema divergence.** A processor migration that adds a column the directus side doesn't know about is silently invisible to the admin UI.
- **Two `migrations_applied` tables**, which already caused the ghost-collection apply conflict earlier in Phase 1 of directus.
- **Operator confusion.** The wiki says "Directus owns the schema" but processor runs migrations — newcomers can't tell which is canonical.
The fix is the wiki's stated intent: directus owns DDL. Processor was the historical owner before directus's `db-init` story matured; the legacy runner survived the transition because nobody retired it.
## Pre-flight (before deleting anything)
### 1. Confirm directus's `db-init/` covers the full processor schema surface
Check that `trm/directus/db-init/`'s SQL is byte-equivalent (or semantically equivalent) to processor's migrations. As of writing, directus has:
- `001_extensions.sql``CREATE EXTENSION IF NOT EXISTS timescaledb` + `postgis`.
- `002_positions_hypertable.sql``CREATE TABLE positions (...)` + `create_hypertable(...)` + indexes.
- `003_faulty_column.sql``ALTER TABLE positions ADD COLUMN IF NOT EXISTS faulty ...` + `positions_device_ts_idx`.
Processor has:
- `src/db/migrations/0001_positions.sql` — extensions + table + hypertable + `positions_device_ts` (UNIQUE on `(device_id, ts)`) + `positions_ts` (DESC).
- `src/db/migrations/0002_positions_faulty.sql``faulty` column + `positions_device_ts_idx` (`(device_id, ts DESC)`).
**Diff the two before retiring.** If processor's SQL has an index, column, or constraint directus's `db-init/` doesn't, the deliverable starts with porting that diff into directus's `db-init/` (and snapshotting if applicable). Specific things to verify:
- All four indexes exist in directus's db-init: `positions_device_ts` (UNIQUE), `positions_ts`, `positions_device_ts_idx`.
- Column types match exactly: `device_id text`, `ts timestamptz`, `ingested_at timestamptz DEFAULT now()`, etc.
- `chunk_time_interval` is `INTERVAL '1 day'` on both sides.
- The `ON CONFLICT (device_id, ts) DO NOTHING` upsert path requires the UNIQUE on `(device_id, ts)` — that's the `positions_device_ts` index, not `positions_device_ts_idx`. Both must exist.
### 2. Verify the directus `db-init` apply order is fixed
Per `docs/wiki/entities/directus.md`'s 5-step boot pipeline:
```
1. db-init pre-schema → positions hypertable, faulty column, timescaledb extension
2. directus bootstrap → Directus system tables + first admin
3. directus schema apply → Directus-managed user collections
4. db-init post-schema → composite UNIQUE constraints on user collections
5. pm2-runtime start → server up at :8055
```
So when processor boots against a stage stack:
- `directus` container has run steps 14 (positions exists; everything else exists).
- `processor` container can connect and `INSERT` immediately.
**Compose ordering.** `trm/deploy/compose.yaml`'s `processor` service should `depends_on: directus` with `condition: service_healthy` so processor doesn't try to read `positions` before directus's db-init has run on first deploy. Verify in this task.
## Deliverables
### `trm/processor/`
1. **Delete** `src/db/migrations/0001_positions.sql`.
2. **Delete** `src/db/migrations/0002_positions_faulty.sql`.
3. **Delete** the migrations directory if it's now empty.
4. **Delete** `src/db/migrate.ts` (or whatever the migration-runner module is named — the file that owns the `migrations_applied` table, the file walker, the `pg_advisory_lock` if any).
5. **Update `src/main.ts`** to remove the `await migrate(...)` step from boot. Postgres pool creation stays; migration call goes.
6. **Update tests** that exercise the migration runner — most likely deletes the corresponding test file. Integration tests that previously seeded the schema via `migrate()` either:
- (a) Use directus's `db-init/*.sql` files directly (read them in `beforeAll`, execute against the testcontainer Postgres), or
- (b) Carry a fixture SQL file in `test/fixtures/` (the same approach Phase 1.5 task 1.5.6 already takes for its integration test).
Pick (b) — it's already the established pattern.
7. **Update `Dockerfile`** to drop any migration-running step from the entrypoint (Phase 1's Dockerfile may not have this, but verify; the runtime container shouldn't carry the migrations directory if the runner is gone).
8. **Update `package.json` `dependencies`** — if `pg-migrate` or any migration-runner library was a Phase-1-only dep, remove it.
9. **Update `phase-1-throughput/04-postgres-schema.md`'s Done section** with a note: "Migration runner retired in Phase 3 task 3.5 — see that task for context."
10. **Update `ROADMAP.md`** to reflect the retired runner under Phase 1's "what changed since landing" note.
### `trm/docs/wiki/`
1. **Update `wiki/entities/processor.md`** — drop the "Schema ownership vs. write access" caveat that says "the positions hypertable is owned by [[processor]]'s migration runner." Replace with a single sentence: "Processor never runs DDL. Schema is exclusively owned by Directus (snapshot.yaml + `db-init/` for things the snapshot can't express)."
2. **Update `wiki/entities/directus.md`** — confirm the Schema-management section already lists `db-init/` as covering everything (no edits unless current text implies a split).
3. **Update `wiki/entities/postgres-timescaledb.md`** — verify the writer-side documentation; remove any "split schema authority" framing.
4. **Append `docs/log.md`** with a `note` entry recording the retirement.
### `trm/deploy/`
1. **Verify** `compose.yaml`'s `processor` service has `depends_on: directus: { condition: service_healthy }`. Add if missing.
2. **Confirm** the deploy README doesn't mention the processor's migration runner anywhere.
## Specification
### What stays in `src/db/`
- `pool.ts` — Postgres `Pool` factory. Untouched.
- Connection helpers, query helpers (if any). Untouched.
### What goes
- `migrations/*.sql` — gone.
- `migrate.ts` (the runner). Gone.
- `migrations_applied` table — directus's runner has its own; the processor's becomes orphaned but harmless. **Don't drop it from existing databases.** The retirement is a *runtime* change; the table is just unused. Phase 3 hardening's eventual `OPERATIONS.md` (task 3.7) can document a one-off `DROP TABLE migrations_applied` step for operators who want a clean schema.
### Boot order on first deploy
```
1. postgres container starts → DB available.
2. directus container starts → runs 5-step boot pipeline.
├─ Step 1 (db-init pre-schema) creates positions hypertable + faulty column + extensions.
├─ Steps 2-4 set up Directus's own world.
└─ Step 5 marks the container healthy.
3. processor container starts (depends_on: directus: service_healthy) → connects, finds positions, starts consuming.
```
If processor races directus on a fresh stack (no `depends_on`), it'll fail to find the `positions` table and crash-loop until directus catches up. `depends_on: service_healthy` makes the order deterministic.
### Dev workflow
`compose.dev.yaml` in `trm/processor` (if it exists for processor-side dev) should `depends_on: directus` if running both. For pure-processor dev (no directus), the developer either:
- Runs directus's `db-init/*.sql` manually against their local Postgres before booting processor.
- Or copies the equivalent SQL into a one-off bootstrap script in `processor/test/fixtures/`.
Document the chosen path in `processor/README.md`.
### What this task does NOT do
- Does not retire directus's snapshot-managed collections.
- Does not change Phase 1 or Phase 1.5 code paths beyond removing the migration runner step.
- Does not introduce a new migration tool. The fix is *fewer* moving parts, not different ones.
## Acceptance criteria
- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
- [ ] `pnpm test:integration` runs green — the integration test no longer relies on `migrate()`; it loads schema from a fixture SQL file instead.
- [ ] `src/db/migrations/` directory is gone (or empty + gitignored).
- [ ] No `migrate()` call anywhere in the source tree.
- [ ] No `migrations_applied` references in processor source.
- [ ] Stage smoke against a fresh DB: redeploy the stack, watch directus boot through its 5 steps, watch processor connect and start writing positions. No errors.
- [ ] `docs/wiki/entities/processor.md` and `directus.md` agree: directus is the sole DDL owner.
- [ ] `docs/log.md` has a `note` entry recording the retirement.
- [ ] `trm/deploy/compose.yaml`'s `processor` service has `depends_on: directus: service_healthy`.
## Risks / open questions
- **Existing prod databases.** If anyone has deployed the processor's migrations on a real DB, the `migrations_applied` table is harmless but stale. Document a one-off cleanup query for operators (in `OPERATIONS.md` when 3.7 lands).
- **Schema drift between processor's old migrations and directus's db-init.** If the diff in pre-flight step 1 surfaces anything, that diff must land in directus's `db-init/` *before* the processor's runner is retired. Order of operations matters: never delete the processor migration before the equivalent SQL is verified live in directus's runner.
- **Test container schema setup.** The integration test fixture has to mirror what directus actually creates. If directus's `db-init/` changes in a way that breaks processor's read paths, the fixture and the read paths both need updating. Mitigation: the fixture file lives in `test/fixtures/` and a comment at its top says "syncs with `trm/directus/db-init/` — update both when schema changes."
- **The original 3.5 ("Migration advisory lock") concern.** Once processor doesn't run migrations, the advisory-lock concern is delegated to directus's runner. That's a directus concern; whether to add an advisory lock to directus's `apply-db-init.sh` is tracked as a follow-up in directus's own roadmap, not here.
- **PostGIS usage in Phase 2.** Processor's `0001_positions.sql` enables PostGIS even though Phase 1 doesn't use it. Directus's `db-init/001_extensions.sql` does the same. Confirm in pre-flight; no change needed if the directus side already has it.
## Done
(Filled in when the task lands.)
+1 -1
View File
@@ -25,7 +25,7 @@ When Phase 3 is done:
| 3.2 | Per-device state rehydration on first-packet | Single `SELECT ... LIMIT 1` per cold device. Memoized by LRU |
| 3.3 | `XAUTOCLAIM` runner | Periodic + on-startup. Claims entries pending > `CLAIM_THRESHOLD_MS`. Re-runs the sink |
| 3.4 | Dead-letter stream | After N failed decodes/writes, record goes to `telemetry:teltonika:dlq`; original ACKed off the main stream |
| 3.5 | Migration advisory lock | `pg_advisory_lock(<hash>)` around the migrate runner; two instances can start simultaneously |
| 3.5 | [Retire processor migration runner](./05-retire-migration-runner.md) | Delete `src/db/migrations/*` and the runner; Directus becomes the sole DDL owner via its `db-init/`. Closes the two-sources-of-truth hazard for `positions`. Replaces the original "migration advisory lock" sketch — once processor doesn't run migrations, the lock concern delegates to Directus. |
| 3.6 | Uncaught exception / unhandled rejection handlers | Log, flush, exit. Match `tcp-ingestion`'s eventual Phase 1 task 1.12 work when that lands |
| 3.7 | OPERATIONS.md | The runbook |
| 3.8 | Multi-instance load test | A test (manual or in CI) that proves two instances split the work; document expected lag behaviour during failover |
+1 -1
View File
@@ -14,7 +14,7 @@ Ideas on radar that may or may not become real tasks. Captured here so they don'
- **Alternate consumer for analytics export.** A second consumer group reading the same stream, writing to a parallel destination (Parquet on object storage, ClickHouse, etc.) for offline analytics. The Phase 1 architecture already supports this — it's a separate process joining the same stream with a different group name. No Processor changes needed; just operational scaffolding.
- **WebSocket gateway for live updates.** If Directus's WebSocket subscriptions hit a fan-out ceiling for spectator-facing live leaderboards, a dedicated gateway reads from Redis and pushes to clients, bypassing Directus for the live channel only. REST/GraphQL stays in Directus. Mentioned in `wiki/entities/directus.md`.
- **Lifting the live-broadcast WebSocket out of the Processor into a standalone gateway service.** Phase 1.5 implements the WS endpoint inside the Processor process per [[live-channel-architecture]]. If sustained throughput exceeds the threshold documented there (~10k WS messages/sec, or connection-time auth becomes a thundering herd at race start with thousands of viewers), the wiki's documented escape hatch is to extract the WS code into a standalone service that subscribes to the same `live-broadcast-*` consumer group. The Redis-stream-in / WebSocket-out contract doesn't change; only the host process does. Promote this to a numbered phase only when measurements justify it.
- **Per-instance sharding hint.** If consumer-group load distribution turns out to be uneven (one instance handles all the chatty devices), introduce hashing-by-device-id with explicit assignment. Probably overkill — Redis Streams' default round-robin works for most workloads.
+7 -221
View File
@@ -1,225 +1,11 @@
/**
* Sentinel decoder for Position records arriving from the Redis Stream.
* Re-exports from src/shared/codec.ts.
*
* tcp-ingestion serializes Position objects with a custom JSON replacer that
* encodes types not natively supported by JSON:
* - bigint → { __bigint: "<decimal-digits>" }
* - Buffer → { __buffer_b64: "<base64>" }
* - Date → ISO8601 string
* The decode logic lives in src/shared/codec.ts so that both src/core/ (durable
* write consumer) and src/live/ (broadcast consumer) can import it without
* crossing the enforced src/core/ ↔ src/live/ boundary.
*
* This module reverses that encoding so the Processor receives fully-typed
* Position objects. The contract is documented in:
* docs/wiki/concepts/position-record.md
* tcp-ingestion/src/core/publish.ts (jsonReplacer)
* All existing Phase 1 import paths (`import { decodePosition } from './codec.js'`)
* continue to work unchanged.
*/
import type { Position, AttributeValue } from './types.js';
// ---------------------------------------------------------------------------
// Error type
// ---------------------------------------------------------------------------
export class CodecError extends Error {
override readonly name = 'CodecError';
constructor(message: string, options?: ErrorOptions) {
super(message, options);
}
}
// ---------------------------------------------------------------------------
// Sentinel detection helpers
// ---------------------------------------------------------------------------
/**
* Returns true when the value is exactly `{ __bigint: "<string>" }`.
* The shape must have exactly one key — any extra keys indicate a user-defined
* object that coincidentally has a `__bigint` field, which is not a sentinel.
* In practice tcp-ingestion only emits single-key sentinels; validate strictly.
*/
function isBigintSentinel(value: unknown): value is { __bigint: string } {
if (typeof value !== 'object' || value === null) return false;
const keys = Object.keys(value);
return (
keys.length === 1 &&
keys[0] === '__bigint' &&
typeof (value as Record<string, unknown>)['__bigint'] === 'string'
);
}
/**
* Returns true when the value is exactly `{ __buffer_b64: "<string>" }`.
*/
function isBufferSentinel(value: unknown): value is { __buffer_b64: string } {
if (typeof value !== 'object' || value === null) return false;
const keys = Object.keys(value);
return (
keys.length === 1 &&
keys[0] === '__buffer_b64' &&
typeof (value as Record<string, unknown>)['__buffer_b64'] === 'string'
);
}
// ---------------------------------------------------------------------------
// Reviver
// ---------------------------------------------------------------------------
/**
* JSON.parse reviver that reconstructs the live types from sentinel encodings.
*
* Called by JSON.parse for every key-value pair in the document, bottom-up.
* By the time `attributes` is visited, each attribute value has already been
* converted (sentinels → bigint/Buffer), because JSON.parse visits leaves first.
*
* Reviver must return `unknown` because the result type depends on the key.
* The caller casts the final result to `PositionJson` after validation.
*/
function reviver(key: string, value: unknown): unknown {
// Timestamp field: ISO string → Date
if (key === 'timestamp' && typeof value === 'string') {
const date = new Date(value);
if (isNaN(date.getTime())) {
throw new CodecError(`Invalid timestamp value: "${value}"`);
}
return date;
}
// bigint sentinel
if (isBigintSentinel(value)) {
const digits = value.__bigint;
// Validate: only decimal digits (including optional leading minus for
// negative bigints, though Teltonika IO elements are unsigned).
if (!/^-?\d+$/.test(digits)) {
throw new CodecError(
`Malformed __bigint sentinel: expected decimal digits, got "${digits}"`,
);
}
return BigInt(digits);
}
// Buffer sentinel
if (isBufferSentinel(value)) {
const b64 = value.__buffer_b64;
// Validate base64 characters (standard + URL-safe alphabets, with padding)
if (!/^[A-Za-z0-9+/\-_]*={0,2}$/.test(b64)) {
throw new CodecError(
`Malformed __buffer_b64 sentinel: invalid base64 string "${b64}"`,
);
}
return Buffer.from(b64, 'base64');
}
return value;
}
// ---------------------------------------------------------------------------
// Required field validation
// ---------------------------------------------------------------------------
const REQUIRED_NUMERIC_FIELDS = [
'latitude',
'longitude',
'altitude',
'angle',
'speed',
'satellites',
'priority',
] as const;
/**
* Validates the decoded object has all required Position fields with the
* correct types. Throws `CodecError` naming the first failing field.
*/
function validateDecodedPosition(obj: Record<string, unknown>): asserts obj is {
device_id: string;
timestamp: Date;
latitude: number;
longitude: number;
altitude: number;
angle: number;
speed: number;
satellites: number;
priority: number;
attributes: Record<string, AttributeValue>;
} {
if (typeof obj['device_id'] !== 'string' || obj['device_id'].length === 0) {
throw new CodecError('Missing or invalid field: device_id (expected non-empty string)');
}
if (!(obj['timestamp'] instanceof Date)) {
throw new CodecError(
'Missing or invalid field: timestamp (expected Date after reviver; was ISO string decoded?)',
);
}
for (const field of REQUIRED_NUMERIC_FIELDS) {
if (typeof obj[field] !== 'number') {
throw new CodecError(
`Missing or invalid field: ${field} (expected number, got ${typeof obj[field]})`,
);
}
}
if (typeof obj['attributes'] !== 'object' || obj['attributes'] === null) {
throw new CodecError('Missing or invalid field: attributes (expected object)');
}
// Validate priority is exactly 0, 1, or 2
const priority = obj['priority'] as number;
if (priority !== 0 && priority !== 1 && priority !== 2) {
throw new CodecError(
`Invalid field: priority (expected 0 | 1 | 2, got ${priority})`,
);
}
// Validate attributes values are only AttributeValue types
const attrs = obj['attributes'] as Record<string, unknown>;
for (const [attrKey, attrVal] of Object.entries(attrs)) {
if (
typeof attrVal !== 'number' &&
typeof attrVal !== 'bigint' &&
!Buffer.isBuffer(attrVal)
) {
throw new CodecError(
`Invalid attribute "${attrKey}": expected number | bigint | Buffer, got ${typeof attrVal}`,
);
}
}
}
// ---------------------------------------------------------------------------
// Public API
// ---------------------------------------------------------------------------
/**
* Decodes a JSON-encoded Position string (with sentinel encoding applied by
* tcp-ingestion's `serializePosition`) into a fully-typed `Position` object.
*
* Throws `CodecError` if the JSON is malformed, a sentinel is invalid, a
* required field is missing, or a field has the wrong type.
*/
export function decodePosition(payload: string): Position {
let parsed: unknown;
try {
parsed = JSON.parse(payload, reviver);
} catch (err) {
if (err instanceof CodecError) {
throw err;
}
throw new CodecError(
`Failed to parse Position payload as JSON: ${err instanceof Error ? err.message : String(err)}`,
{ cause: err },
);
}
if (typeof parsed !== 'object' || parsed === null || Array.isArray(parsed)) {
throw new CodecError('Position payload must be a JSON object');
}
const obj = parsed as Record<string, unknown>;
validateDecodedPosition(obj);
return obj as unknown as Position;
}
export { decodePosition, CodecError } from '../shared/codec.js';
+6 -32
View File
@@ -7,40 +7,14 @@
*/
// ---------------------------------------------------------------------------
// Shared value types
// Shared value types — re-exported from src/shared/types.ts
// ---------------------------------------------------------------------------
/**
* A single IO attribute value from the Teltonika AVL record.
* - number : fixed-width IO elements (N1/N2/N4 — fit safely in JS number)
* - bigint : N8 elements (u64, may exceed Number.MAX_SAFE_INTEGER)
* - Buffer : NX variable-length elements (Codec 8 Extended)
*/
export type AttributeValue = number | bigint | Buffer;
// ---------------------------------------------------------------------------
// Position — input contract from tcp-ingestion
// ---------------------------------------------------------------------------
/**
* Normalized GPS position record. Byte-equivalent to tcp-ingestion's `Position`
* type (docs/wiki/concepts/position-record.md).
*
* `priority` is typed as a union rather than `number` to stay consistent with
* tcp-ingestion and make exhaustive switches possible in domain logic.
*/
export type Position = {
readonly device_id: string;
readonly timestamp: Date;
readonly latitude: number;
readonly longitude: number;
readonly altitude: number;
readonly angle: number; // heading 0360°
readonly speed: number; // km/h; 0 may mean "GPS invalid" — preserve verbatim
readonly satellites: number;
readonly priority: 0 | 1 | 2; // 0=Low, 1=High, 2=Panic
readonly attributes: Readonly<Record<string, AttributeValue>>;
};
// Position and AttributeValue live in src/shared/types.ts so that src/live/
// can import them without crossing the src/core/ ↔ src/live/ boundary.
// Re-exported here to preserve all existing Phase 1 import paths.
export type { AttributeValue, Position } from '../shared/types.js';
import type { Position } from '../shared/types.js';
// ---------------------------------------------------------------------------
// StreamRecord — raw shape returned by XREADGROUP before codec decoding
@@ -0,0 +1,19 @@
-- Migration: 0002_positions_faulty
-- Adds the faulty column to positions and ensures the (device_id, ts DESC) index
-- needed by the snapshot-on-subscribe query exists.
--
-- The faulty column is set post-hoc by operators in Directus when a position is
-- flagged as unrealistic. The snapshot-on-subscribe query (task 1.5.5) filters
-- WHERE faulty = false to exclude flagged positions from the initial map state.
-- The live broadcast path (Redis stream → fan-out) never touches this column
-- because faulty flags are applied after the fact.
ALTER TABLE positions
ADD COLUMN IF NOT EXISTS faulty boolean NOT NULL DEFAULT false;
-- Index for the snapshot DISTINCT ON query:
-- SELECT DISTINCT ON (device_id) ... ORDER BY device_id, ts DESC
-- TimescaleDB scans only the latest chunks for devices with recent activity,
-- but the (device_id, ts DESC) index makes per-device latest-position lookups
-- efficient regardless of chunk age.
CREATE INDEX IF NOT EXISTS positions_device_ts_idx ON positions (device_id, ts DESC);
+152
View File
@@ -0,0 +1,152 @@
/**
* Cookie-based authentication for WebSocket connections.
*
* Validates the Directus-issued cookie attached to the upgrade request by
* making a single GET /users/me round-trip to Directus. On success, returns
* the user identity that is bound to the connection for its lifetime.
*
* Design notes:
* - No JWT validation locally — the round-trip is simpler, correct, and fast
* enough at pilot scale (≤500 viewers).
* - No retries — a failed validation immediately closes the upgrade. The SPA
* reconnects, giving a natural retry. Server-side retry masks credential bugs.
* - The entire cookie header is forwarded verbatim to Directus — Directus owns
* cookie parsing and session lookup.
*
* Spec: docs/wiki/synthesis/processor-ws-contract.md §Auth handshake
*/
import { z } from 'zod';
import type { Config } from '../config/load.js';
import type { Metrics } from '../shared/types.js';
import type { Logger } from 'pino';
// ---------------------------------------------------------------------------
// Types
// ---------------------------------------------------------------------------
/**
* The minimum user fields needed for per-subscription authorization (1.5.3)
* and Phase 4 permission enforcement.
*/
const AuthenticatedUserSchema = z.object({
id: z.string().uuid(),
email: z.string().email().nullable().optional(),
role: z.string().uuid().nullable().optional(),
first_name: z.string().nullable().optional(),
last_name: z.string().nullable().optional(),
});
export type AuthenticatedUser = z.infer<typeof AuthenticatedUserSchema>;
/**
* Public interface returned by `createAuthClient`.
*/
export type AuthClient = {
/**
* Validates a raw `Cookie:` header value against Directus's `/users/me`.
*
* Returns the user identity on success, or `null` on any failure
* (network error, 401, malformed response, timeout). Never throws.
*/
readonly validate: (cookieHeader: string) => Promise<AuthenticatedUser | null>;
};
// ---------------------------------------------------------------------------
// Factory
// ---------------------------------------------------------------------------
export function createAuthClient(
config: Config,
logger: Logger,
metrics: Metrics,
): AuthClient {
async function validate(cookieHeader: string): Promise<AuthenticatedUser | null> {
if (!cookieHeader) return null;
const controller = new AbortController();
const timer = setTimeout(
() => controller.abort(),
config.DIRECTUS_AUTH_TIMEOUT_MS,
);
const start = performance.now();
try {
const res = await fetch(
`${config.DIRECTUS_BASE_URL}/users/me?fields=id,email,role,first_name,last_name`,
{
method: 'GET',
headers: { cookie: cookieHeader },
signal: controller.signal,
},
);
if (res.status === 401 || res.status === 403) {
metrics.inc('processor_live_auth_attempts_total', { result: 'unauthorized' });
return null;
}
if (!res.ok) {
logger.warn({ status: res.status }, 'directus /users/me returned non-2xx');
metrics.inc('processor_live_auth_attempts_total', { result: 'error' });
return null;
}
// Directus returns { data: {...} } for /users/me.
const body = await res.json() as Record<string, unknown>;
// Check whether the `data` key is present at all. If it is missing
// entirely, that is an unexpected Directus response shape.
if (!('data' in body)) {
logger.warn('directus /users/me response missing data field');
metrics.inc('processor_live_auth_attempts_total', { result: 'error' });
return null;
}
const data = body['data'];
if (data === null || data === undefined) {
// Directus returns data: null when the session is expired but the
// cookie is structurally valid. Treat as unauthorized.
logger.warn('directus /users/me returned null data (expired session)');
metrics.inc('processor_live_auth_attempts_total', { result: 'unauthorized' });
return null;
}
if (typeof data !== 'object') {
logger.warn({ data }, 'directus /users/me data field is not an object');
metrics.inc('processor_live_auth_attempts_total', { result: 'error' });
return null;
}
const parsed = AuthenticatedUserSchema.safeParse(data);
if (!parsed.success) {
logger.warn(
{ issues: parsed.error.issues },
'directus /users/me returned unexpected shape',
);
metrics.inc('processor_live_auth_attempts_total', { result: 'error' });
return null;
}
metrics.inc('processor_live_auth_attempts_total', { result: 'success' });
return parsed.data;
} catch (err) {
if (err instanceof Error && err.name === 'AbortError') {
logger.warn(
{ timeoutMs: config.DIRECTUS_AUTH_TIMEOUT_MS },
'directus auth call timed out',
);
} else {
logger.warn({ err }, 'directus auth call failed');
}
metrics.inc('processor_live_auth_attempts_total', { result: 'error' });
return null;
} finally {
clearTimeout(timer);
metrics.observe('processor_live_auth_latency_ms', performance.now() - start);
}
}
return { validate };
}
+90
View File
@@ -0,0 +1,90 @@
/**
* Per-event authorization client.
*
* Checks whether a user has access to a specific event by delegating to
* Directus's REST API with the user's cookie. Directus enforces row-level
* security; if Directus returns 200 the user has access. If 403, they don't.
*
* Authorization is checked ONCE at subscribe time. The hot fan-out path has
* zero Directus calls — it operates entirely on in-memory subscription state.
*
* Spec: docs/wiki/synthesis/processor-ws-contract.md §Subscription model
*/
import type { Config } from '../config/load.js';
import type { Metrics } from '../shared/types.js';
import type { Logger } from 'pino';
// ---------------------------------------------------------------------------
// Types
// ---------------------------------------------------------------------------
/**
* Result of an authorization check.
* `allowed: true` → user may subscribe to the topic.
* `allowed: false` → user is rejected; `reason` tells the client why.
*/
export type AuthzResult =
| { readonly allowed: true }
| { readonly allowed: false; readonly reason: 'forbidden' | 'not-found' | 'error' };
export type AuthzClient = {
/**
* Checks whether the user identified by `cookieHeader` can access
* the event with `eventId`.
*
* Delegates to `GET /items/events/<eventId>?fields=id` with the user's
* cookie. Directus's row-level security does the org-membership check.
*
* Never throws. Returns `{ allowed: false, reason: 'error' }` on any
* transient failure.
*/
readonly canAccessEvent: (
cookieHeader: string,
eventId: string,
) => Promise<AuthzResult>;
};
// ---------------------------------------------------------------------------
// Factory
// ---------------------------------------------------------------------------
export function createAuthzClient(
config: Config,
logger: Logger,
metrics: Metrics,
): AuthzClient {
async function canAccessEvent(
cookieHeader: string,
eventId: string,
): Promise<AuthzResult> {
const start = performance.now();
try {
const res = await fetch(
`${config.DIRECTUS_BASE_URL}/items/events/${eventId}?fields=id`,
{
method: 'GET',
headers: { cookie: cookieHeader },
signal: AbortSignal.timeout(config.DIRECTUS_AUTHZ_TIMEOUT_MS),
},
);
if (res.status === 200) return { allowed: true };
if (res.status === 403) return { allowed: false, reason: 'forbidden' };
if (res.status === 404) return { allowed: false, reason: 'not-found' };
logger.warn(
{ status: res.status, eventId },
'directus /items/events returned unexpected status',
);
return { allowed: false, reason: 'error' };
} catch (err) {
logger.warn({ err, eventId }, 'directus authz call failed');
return { allowed: false, reason: 'error' };
} finally {
metrics.observe('processor_live_authz_latency_ms', performance.now() - start);
}
}
return { canAccessEvent };
}
+297
View File
@@ -0,0 +1,297 @@
/**
* Broadcast consumer group — per-instance Redis Stream reader for live fan-out.
*
* Reads the same `telemetry:teltonika` stream as the durable-write consumer
* (task 1.5) but on a SEPARATE per-instance consumer group:
* `live-broadcast-{instance_id}`
*
* This means every Processor instance sees every record for its own connected
* clients. The durable-write group splits records across instances for exactly-
* once Postgres writes; the broadcast group replicates records to every
* instance for fan-out. The two groups operate independently with separate
* offsets; a slow Postgres write does not slow down broadcast.
*
* ACK semantics: ACK immediately on consume (no durability required for
* broadcast — missing a position is fine; only the latest position matters).
*
* Back-pressure: sendOutbound closes slow connections at BACKPRESSURE_THRESHOLD
* (already handled in server.ts).
*
* Spec: processor-ws-contract.md §Multi-instance behaviour;
* task 1.5.4 §Broadcast consumer group
*/
import type { Redis } from 'ioredis';
import type { Logger } from 'pino';
import type { Config } from '../config/load.js';
import type { Metrics, Position } from '../shared/types.js';
import type { SubscriptionRegistry } from './registry.js';
import type { DeviceEventMap } from './device-event-map.js';
import type { PositionMessage } from './protocol.js';
import { sendOutbound } from './server.js';
import type { LiveConnection } from './server.js';
import { decodePosition, CodecError } from '../shared/codec.js';
// ---------------------------------------------------------------------------
// Public interface
// ---------------------------------------------------------------------------
export type BroadcastConsumer = {
readonly start: () => Promise<void>;
readonly stop: () => Promise<void>;
};
// ---------------------------------------------------------------------------
// Wire-format mapper
// ---------------------------------------------------------------------------
/**
* Maps a decoded Position to a PositionMessage (minus `topic`).
* Omits fields rather than sending `null` for absent / zero values.
*
* Field mapping:
* - Position.device_id → deviceId (IMEI string; not a UUID in Phase 1)
* - Position.latitude → lat
* - Position.longitude → lon
* - Position.timestamp → ts (epoch ms)
* - Position.speed → speed (omitted if 0 — may indicate invalid GPS fix)
* - Position.angle → course (omitted if 0)
* - No accuracy field in Phase 1's Position type.
*
* Note: The WS contract spec says deviceId should be `devices.id` (UUID), but
* Phase 1's positions table stores the raw IMEI as device_id. The SPA will
* need to join on the IMEI until Phase 2 introduces UUID-based device tracking.
* This is documented as an open deviation.
*/
function toPositionMessage(
position: Position,
): Omit<PositionMessage, 'topic'> {
const msg: Omit<PositionMessage, 'topic'> = {
type: 'position',
deviceId: position.device_id,
lat: position.latitude,
lon: position.longitude,
ts: position.timestamp.getTime(),
};
// Omit speed when 0 — per Teltonika convention, 0 may indicate invalid GPS.
if (position.speed > 0) {
(msg as Record<string, unknown>)['speed'] = position.speed;
}
// Omit angle/course when 0.
if (position.angle > 0) {
(msg as Record<string, unknown>)['course'] = position.angle;
}
return msg;
}
// ---------------------------------------------------------------------------
// Raw stream entry shape (ioredis XREADGROUP return type)
// ---------------------------------------------------------------------------
type RawStreamEntry = [id: string, fields: string[]];
// ---------------------------------------------------------------------------
// Factory
// ---------------------------------------------------------------------------
export function createBroadcastConsumer(
redis: Redis,
registry: SubscriptionRegistry,
deviceToEvent: DeviceEventMap,
config: Config,
logger: Logger,
metrics: Metrics,
): BroadcastConsumer {
const stream = config.REDIS_TELEMETRY_STREAM;
const groupName = `${config.LIVE_BROADCAST_GROUP_PREFIX}-${config.INSTANCE_ID}`;
const consumerName = config.INSTANCE_ID;
const batchSize = config.LIVE_BROADCAST_BATCH_SIZE;
const batchBlockMs = config.LIVE_BROADCAST_BATCH_BLOCK_MS;
let stopping = false;
let loopPromise: Promise<void> = Promise.resolve();
// -------------------------------------------------------------------------
// Consumer group setup
// -------------------------------------------------------------------------
async function ensureGroup(): Promise<void> {
try {
await redis.xgroup('CREATE', stream, groupName, '$', 'MKSTREAM');
logger.info({ stream, group: groupName }, 'broadcast consumer group created');
} catch (err: unknown) {
if (err instanceof Error && err.message.startsWith('BUSYGROUP')) {
logger.info({ stream, group: groupName }, 'broadcast consumer group already exists');
return;
}
throw err;
}
}
// -------------------------------------------------------------------------
// Fan-out
// -------------------------------------------------------------------------
function fanOut(
entryId: string,
position: Position,
): void {
const eventIds = deviceToEvent.lookup(position.device_id);
if (eventIds.length === 0) {
metrics.inc('processor_live_broadcast_orphan_records_total', {
instance_id: config.INSTANCE_ID,
});
return;
}
const baseMsg = toPositionMessage(position);
for (const eventId of eventIds) {
const topic = `event:${eventId}`;
const conns = registry.connectionsForTopic(topic);
for (const conn of conns as Iterable<LiveConnection>) {
sendOutbound(
conn,
{ ...baseMsg, topic },
metrics,
config.LIVE_WS_BACKPRESSURE_THRESHOLD_BYTES,
);
metrics.inc('processor_live_broadcast_fanout_messages_total', {
instance_id: config.INSTANCE_ID,
});
}
}
// Broadcast lag: time from GPS fix to fan-out send.
const lagMs = Date.now() - position.timestamp.getTime();
if (lagMs >= 0) {
metrics.observe('processor_live_broadcast_lag_ms', lagMs);
}
logger.debug({ entryId, device: position.device_id, events: eventIds.length }, 'fanned out');
}
// -------------------------------------------------------------------------
// Batch decoder (mirrors core/consumer.ts decodeBatch pattern)
// -------------------------------------------------------------------------
function decodeBatch(entries: RawStreamEntry[]): Array<{
id: string;
position: Position;
}> {
const decoded: Array<{ id: string; position: Position }> = [];
for (const [id, fields] of entries) {
const fieldMap: Record<string, string> = {};
for (let i = 0; i + 1 < fields.length; i += 2) {
const key = fields[i];
const val = fields[i + 1];
if (key !== undefined && val !== undefined) {
fieldMap[key] = val;
}
}
const payload = fieldMap['payload'];
if (payload === undefined) {
logger.warn({ id, stream }, 'broadcast entry missing payload; skipping');
continue;
}
try {
const position = decodePosition(payload);
decoded.push({ id, position });
} catch (err) {
if (err instanceof CodecError) {
logger.warn({ id, err }, 'broadcast decode error; skipping record');
} else {
logger.error({ id, err }, 'unexpected broadcast decode error');
}
// Do not ACK — leave in PEL (though broadcast doesn't retry, this is
// consistent with the "never silently skip" principle for decode errors).
}
}
return decoded;
}
// -------------------------------------------------------------------------
// Read loop
// -------------------------------------------------------------------------
async function runLoop(): Promise<void> {
logger.info({ stream, group: groupName, consumer: consumerName }, 'broadcast consumer started');
while (!stopping) {
let rawResult: [string, [string, string[]][]][] | null;
try {
rawResult = (await redis.xreadgroup(
'GROUP',
groupName,
consumerName,
'COUNT',
String(batchSize),
'BLOCK',
String(batchBlockMs),
'STREAMS',
stream,
'>',
)) as [string, [string, string[]][]][] | null;
} catch (err) {
if (stopping) break;
logger.error({ err }, 'broadcast XREADGROUP failed; backing off 1s');
await new Promise<void>((resolve) => setTimeout(resolve, 1_000));
continue;
}
if (rawResult === null) continue; // BLOCK timeout — check stopping flag
const streamEntries = rawResult[0]?.[1] ?? [];
if (streamEntries.length === 0) continue;
metrics.inc('processor_live_broadcast_records_total', {
instance_id: config.INSTANCE_ID,
}, streamEntries.length);
const decoded = decodeBatch(streamEntries);
// ACK all entries immediately — broadcast has no durability requirement.
const allIds = streamEntries.map(([id]) => id);
if (allIds.length > 0) {
await redis.xack(stream, groupName, ...allIds);
}
// Fan out decoded records to subscribed clients.
for (const { id, position } of decoded) {
fanOut(id, position);
}
}
logger.info({ stream, group: groupName }, 'broadcast consumer loop exited');
}
// -------------------------------------------------------------------------
// Lifecycle
// -------------------------------------------------------------------------
async function start(): Promise<void> {
await ensureGroup();
loopPromise = runLoop();
loopPromise.catch((err: unknown) => {
logger.fatal({ err }, 'broadcast consumer loop crashed; exiting');
process.exit(1);
});
}
async function stop(): Promise<void> {
stopping = true;
await loopPromise;
}
return { start, stop };
}
+118
View File
@@ -0,0 +1,118 @@
/**
* In-memory cache of device → event mappings.
*
* The fan-out loop needs to answer "which events does this device belong to?"
* for every position record. The naive answer — query Postgres on each record —
* is wrong at any meaningful throughput. This module caches the full
* `entry_devices entries` join in memory and refreshes it on a configurable
* cadence (default: every 30 s).
*
* Staleness window: up to LIVE_DEVICE_EVENT_REFRESH_MS. This is acceptable for
* pilot — operators register devices before the event starts, and "the device
* appeared on the map after 30 s" is a tolerable UX gap. Phase 3+ can add
* invalidation signals if needed.
*
* Spec: processor-ws-contract.md §Multi-instance behaviour;
* task 1.5.4 §DeviceEventMap design
*/
import type pg from 'pg';
import type { Logger } from 'pino';
import type { Metrics } from '../shared/types.js';
import type { Config } from '../config/load.js';
// ---------------------------------------------------------------------------
// Public interface
// ---------------------------------------------------------------------------
export type DeviceEventMap = {
/** Returns the event IDs the device is currently registered to. */
readonly lookup: (deviceId: string) => readonly string[];
/** Starts the refresh timer. Immediately runs the first refresh. */
readonly start: () => Promise<void>;
/** Cancels the refresh timer. */
readonly stop: () => void;
};
// ---------------------------------------------------------------------------
// Query result type
// ---------------------------------------------------------------------------
type DeviceEventRow = {
device_id: string;
event_id: string;
};
// ---------------------------------------------------------------------------
// Factory
// ---------------------------------------------------------------------------
export function createDeviceEventMap(
pool: pg.Pool,
config: Config,
logger: Logger,
metrics: Metrics,
): DeviceEventMap {
// Mutable map; atomically swapped on each refresh.
let cache = new Map<string, Set<string>>();
let timer: ReturnType<typeof setInterval> | null = null;
async function refresh(): Promise<void> {
const start = performance.now();
try {
const result = await pool.query<DeviceEventRow>(
`SELECT ed.device_id, e.event_id
FROM entry_devices ed
JOIN entries e ON e.id = ed.entry_id`,
);
const next = new Map<string, Set<string>>();
for (const row of result.rows) {
let eventSet = next.get(row.device_id);
if (!eventSet) {
eventSet = new Set<string>();
next.set(row.device_id, eventSet);
}
eventSet.add(row.event_id);
}
cache = next;
const elapsed = performance.now() - start;
metrics.observe('processor_live_device_event_refresh_latency_ms', elapsed);
metrics.observe('processor_live_device_event_entries', next.size);
logger.debug({ devices: next.size, elapsedMs: Math.round(elapsed) }, 'device-event map refreshed');
} catch (err) {
logger.warn({ err }, 'device-event map refresh failed; retaining stale cache');
// Retain the stale cache — a stale map is better than an empty map
// which would silently drop all fan-out until the next refresh.
}
}
async function start(): Promise<void> {
await refresh();
timer = setInterval(() => {
refresh().catch((err: unknown) => {
logger.warn({ err }, 'device-event map refresh interval error');
});
}, config.LIVE_DEVICE_EVENT_REFRESH_MS);
// Do not hold the event loop open during shutdown.
timer.unref();
}
function stop(): void {
if (timer !== null) {
clearInterval(timer);
timer = null;
}
}
function lookup(deviceId: string): readonly string[] {
const events = cache.get(deviceId);
if (!events || events.size === 0) return [];
return [...events];
}
return { lookup, start, stop };
}
+3 -1
View File
@@ -130,7 +130,9 @@ export type ErrorCode =
| 'unknown-topic'
| 'protocol-violation'
| 'not-implemented'
| 'rate-limited';
| 'rate-limited'
/** Transient server-side error (e.g. Directus authz call failed). Retry. */
| 'error';
/**
* An error response from the server, scoped to a topic or connection-level.
+324
View File
@@ -0,0 +1,324 @@
/**
* Subscription registry — manages the bidirectional mapping between WebSocket
* connections and topics, and handles per-event authorization at subscribe time.
*
* Data structures:
* - connectionTopics: WeakMap<LiveConnection, Set<string>> (conn → topics)
* WeakMap allows GC cleanup if a connection somehow leaks the onConnectionClose call.
* - topicConnections: Map<string, Set<LiveConnection>> (topic → conns)
* Standard Map keyed by topic string; cleaned up by onConnectionClose.
*
* Authorization:
* - Checked ONCE per subscribe, via the authz client (Directus /items/events/<id>).
* - Zero Directus calls in the fan-out hot path.
*
* Snapshot:
* - Task 1.5.3 sends an empty snapshot with `subscribed`. Task 1.5.5 wires in
* the real snapshot provider to populate the array.
*
* Spec: docs/wiki/synthesis/processor-ws-contract.md §Subscription model
*/
import type { Logger } from 'pino';
import type { Metrics } from '../shared/types.js';
import type { LiveConnection } from './server.js';
import { sendOutbound } from './server.js';
import type { AuthzClient } from './authz.js';
import type { Config } from '../config/load.js';
import type { PositionSnapshotEntry } from './protocol.js';
// ---------------------------------------------------------------------------
// Topic parsing
// ---------------------------------------------------------------------------
const EVENT_TOPIC_REGEX =
/^event:([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})$/i;
type ParsedTopic = { readonly kind: 'event'; readonly eventId: string };
function parseTopic(topic: string): ParsedTopic | null {
const match = EVENT_TOPIC_REGEX.exec(topic);
if (match?.[1]) return { kind: 'event', eventId: match[1] };
return null;
}
// ---------------------------------------------------------------------------
// Snapshot provider type (injected from task 1.5.5)
// ---------------------------------------------------------------------------
/**
* Pluggable snapshot provider. Task 1.5.3 uses the stub (empty array).
* Task 1.5.5 injects the real Postgres-backed provider.
*/
export type SnapshotProvider = {
readonly forEvent: (eventId: string) => Promise<PositionSnapshotEntry[]>;
};
const STUB_SNAPSHOT_PROVIDER: SnapshotProvider = {
forEvent: () => Promise.resolve([]),
};
// ---------------------------------------------------------------------------
// Public interface
// ---------------------------------------------------------------------------
export type SubscriptionRegistry = {
/** Subscribe `conn` to `topic`. Authorizes, then sends `subscribed` or `error`. */
readonly subscribe: (
conn: LiveConnection,
topic: string,
correlationId?: string,
) => Promise<void>;
/** Unsubscribe `conn` from `topic`. Always sends `unsubscribed` (idempotent). */
readonly unsubscribe: (
conn: LiveConnection,
topic: string,
correlationId?: string,
) => void;
/** Remove all subscriptions for a closed connection (called on ws close). */
readonly onConnectionClose: (conn: LiveConnection) => void;
/** Iterates all connections currently subscribed to `topic`. Used by fan-out. */
readonly connectionsForTopic: (topic: string) => Iterable<LiveConnection>;
/** Iterates all topics the given connection is subscribed to. */
readonly topicsForConnection: (conn: LiveConnection) => Iterable<string>;
/** Aggregate stats for monitoring and sanity checks. */
readonly stats: () => { connections: number; topics: number; subscriptions: number };
};
// ---------------------------------------------------------------------------
// Factory
// ---------------------------------------------------------------------------
export function createSubscriptionRegistry(
authzClient: AuthzClient,
config: Config,
logger: Logger,
metrics: Metrics,
snapshotProvider: SnapshotProvider = STUB_SNAPSHOT_PROVIDER,
): SubscriptionRegistry {
// conn → Set of topic strings the connection is subscribed to.
// WeakMap: if a connection object is somehow not cleaned up via onConnectionClose,
// the GC will reclaim the Set when the connection is collected.
const connectionTopics = new WeakMap<LiveConnection, Set<string>>();
// topic string → Set of connections subscribed to that topic.
const topicConnections = new Map<string, Set<LiveConnection>>();
// Total active subscriptions counter (kept in sync with topicConnections).
let totalSubscriptions = 0;
// -------------------------------------------------------------------------
// Subscribe
// -------------------------------------------------------------------------
async function subscribe(
conn: LiveConnection,
topic: string,
correlationId?: string,
): Promise<void> {
const parsed = parseTopic(topic);
if (!parsed) {
sendOutbound(
conn,
{
type: 'error',
topic,
id: correlationId,
code: 'unknown-topic',
message: 'Unknown topic format. Supported: event:<uuid>',
},
metrics,
config.LIVE_WS_BACKPRESSURE_THRESHOLD_BYTES,
);
metrics.inc('processor_live_subscribe_attempts_total', { result: 'unknown-topic' });
return;
}
// Idempotent: if already subscribed, re-send `subscribed` with a fresh snapshot.
const existing = connectionTopics.get(conn);
if (existing?.has(topic)) {
const snapshot = await fetchSnapshot(parsed.eventId);
sendOutbound(
conn,
{ type: 'subscribed', topic, id: correlationId, snapshot },
metrics,
config.LIVE_WS_BACKPRESSURE_THRESHOLD_BYTES,
);
// Do not double-count in subscriptions gauge.
return;
}
// Authorization check — one Directus call per subscribe.
const verdict = await authzClient.canAccessEvent(conn.cookieHeader, parsed.eventId);
if (!verdict.allowed) {
sendOutbound(
conn,
{
type: 'error',
topic,
id: correlationId,
code: verdict.reason,
message: buildForbiddenMessage(verdict.reason),
},
metrics,
config.LIVE_WS_BACKPRESSURE_THRESHOLD_BYTES,
);
metrics.inc('processor_live_subscribe_attempts_total', { result: verdict.reason });
return;
}
// Fetch snapshot (fails open — snapshot failure does not block the subscribe).
const snapshot = await fetchSnapshot(parsed.eventId);
// Insert into both indexes.
if (!existing) connectionTopics.set(conn, new Set());
connectionTopics.get(conn)!.add(topic);
if (!topicConnections.has(topic)) topicConnections.set(topic, new Set());
topicConnections.get(topic)!.add(conn);
totalSubscriptions += 1;
metrics.observe('processor_live_subscriptions', totalSubscriptions);
metrics.inc('processor_live_subscribe_attempts_total', { result: 'success' });
logger.debug(
{ connId: conn.id, topic, userId: conn.user.id },
'subscribed',
);
sendOutbound(
conn,
{ type: 'subscribed', topic, id: correlationId, snapshot },
metrics,
config.LIVE_WS_BACKPRESSURE_THRESHOLD_BYTES,
);
}
// -------------------------------------------------------------------------
// Unsubscribe
// -------------------------------------------------------------------------
function unsubscribe(
conn: LiveConnection,
topic: string,
correlationId?: string,
): void {
const topics = connectionTopics.get(conn);
const wasSubscribed = topics?.has(topic) ?? false;
topics?.delete(topic);
const conns = topicConnections.get(topic);
if (conns) {
conns.delete(conn);
if (conns.size === 0) topicConnections.delete(topic);
}
if (wasSubscribed) {
totalSubscriptions -= 1;
metrics.observe('processor_live_subscriptions', totalSubscriptions);
}
logger.debug({ connId: conn.id, topic }, 'unsubscribed');
// Always reply, even if not subscribed (idempotent).
sendOutbound(
conn,
{ type: 'unsubscribed', topic, id: correlationId },
metrics,
config.LIVE_WS_BACKPRESSURE_THRESHOLD_BYTES,
);
}
// -------------------------------------------------------------------------
// onConnectionClose
// -------------------------------------------------------------------------
function onConnectionClose(conn: LiveConnection): void {
const topics = connectionTopics.get(conn);
if (!topics) return;
for (const topic of topics) {
const conns = topicConnections.get(topic);
if (conns) {
conns.delete(conn);
if (conns.size === 0) topicConnections.delete(topic);
}
totalSubscriptions -= 1;
}
connectionTopics.delete(conn);
metrics.observe('processor_live_subscriptions', totalSubscriptions);
logger.debug(
{ connId: conn.id, removedTopics: topics.size },
'connection closed — subscriptions cleaned up',
);
}
// -------------------------------------------------------------------------
// Query
// -------------------------------------------------------------------------
function connectionsForTopic(topic: string): Iterable<LiveConnection> {
return topicConnections.get(topic) ?? new Set<LiveConnection>();
}
function topicsForConnection(conn: LiveConnection): Iterable<string> {
return connectionTopics.get(conn) ?? new Set<string>();
}
function stats(): { connections: number; topics: number; subscriptions: number } {
return {
connections: topicConnections.size > 0
? [...topicConnections.values()].reduce((acc, s) => acc + s.size, 0)
: 0,
topics: topicConnections.size,
subscriptions: totalSubscriptions,
};
}
// -------------------------------------------------------------------------
// Snapshot helper
// -------------------------------------------------------------------------
async function fetchSnapshot(eventId: string): Promise<PositionSnapshotEntry[]> {
const start = performance.now();
try {
const snapshot = await snapshotProvider.forEvent(eventId);
metrics.observe('processor_live_snapshot_query_latency_ms', performance.now() - start);
metrics.observe('processor_live_snapshot_size', snapshot.length);
return snapshot;
} catch (err) {
logger.warn(
{ err, eventId },
'snapshot query failed; sending empty snapshot',
);
return [];
}
}
return {
subscribe,
unsubscribe,
onConnectionClose,
connectionsForTopic,
topicsForConnection,
stats,
};
}
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
function buildForbiddenMessage(reason: 'forbidden' | 'not-found' | 'error'): string {
switch (reason) {
case 'forbidden':
return 'User does not have access to this event.';
case 'not-found':
return 'Event not found.';
case 'error':
return 'Authorization check failed. Please try again.';
}
}
+79 -12
View File
@@ -8,14 +8,16 @@
* - Runs on its own http.Server (separate from the Phase 1 metrics/health server
* on :9090) so a proxy can route to different paths and failure modes don't
* entangle.
* - Auth happens in the `'upgrade'` handler (task 1.5.2). This scaffold accepts
* all upgrades and logs the connection.
* - Message dispatch is pluggable via the `onMessage` callback so tasks 1.5.2
* and 1.5.3 can attach the real auth/registry handler without touching this
* file's lifecycle logic.
* - Auth runs in the `'upgrade'` handler: validate the cookie via Directus before
* completing the WS upgrade. Rejected upgrades get an HTTP 401 response.
* - Message dispatch is pluggable via the `onMessage` callback so task 1.5.3
* can attach the real subscription-registry handler.
* - Heartbeat: WS frame-level ping every LIVE_WS_PING_INTERVAL_MS; pong updates
* lastSeenAt. Do NOT use application-level ping messages — browser WS
* implementations handle frame-level pings natively.
* - cookieHeader is stored on the connection so the authz client (task 1.5.3)
* can forward it to Directus for per-event authorization. It is sensitive
* material; never log it.
*/
import * as http from 'node:http';
@@ -26,15 +28,17 @@ import type { Config } from '../config/load.js';
import type { Metrics } from '../shared/types.js';
import { InboundMessage, WsCloseCodes } from './protocol.js';
import type { OutboundMessage } from './protocol.js';
import type { AuthClient, AuthenticatedUser } from './auth.js';
// ---------------------------------------------------------------------------
// Public types
// ---------------------------------------------------------------------------
/**
* Per-connection identity object. Augmented in later tasks (auth adds `user`;
* task 1.5.3 adds `cookieHeader`). Exported so the registry, auth, and
* broadcast modules can reference the same type.
* Per-connection identity object. Holds the validated user identity and the
* original cookie header (needed for per-subscription authorization in 1.5.3).
*
* `cookieHeader` is sensitive — never log it.
*/
export type LiveConnection = {
readonly id: string;
@@ -42,14 +46,17 @@ export type LiveConnection = {
readonly remoteAddr: string;
readonly openedAt: Date;
lastSeenAt: Date;
readonly user: AuthenticatedUser;
/** The raw Cookie: header from the upgrade request. Used by the authz client
* to forward the user's session when checking event access. */
readonly cookieHeader: string;
};
/**
* Message handler callback. The server calls this once per successfully parsed
* inbound message. The handler is responsible for sending replies.
*
* In task 1.5.1 this is a no-op stub that returns `error/not-implemented`.
* Tasks 1.5.2 and 1.5.3 replace it with the real auth+registry handler.
* Task 1.5.3 replaces the stub with the real subscription-registry handler.
*/
export type MessageHandler = (
conn: LiveConnection,
@@ -120,6 +127,7 @@ export function createLiveServer(
metrics: Metrics,
onMessage: MessageHandler,
onClose?: (conn: LiveConnection) => void,
authClient?: AuthClient,
): LiveServer {
const connections = new Map<string, LiveConnection>();
@@ -131,13 +139,54 @@ export function createLiveServer(
const wss = new WebSocketServer({ noServer: true });
// -------------------------------------------------------------------------
// Upgrade handler (auth injected in task 1.5.2; accepted immediately here)
// Upgrade handler — validates auth before completing the WS handshake
// -------------------------------------------------------------------------
httpServer.on('upgrade', (req, socket, head) => {
const cookieHeader = req.headers['cookie'] ?? '';
if (!authClient) {
// No auth client provided — accept the upgrade without validation.
// Used in tests that don't need auth.
wss.handleUpgrade(req, socket, head, (ws) => {
wss.emit('connection', ws, req, '', { id: 'anonymous', email: null, role: null, first_name: null, last_name: null } satisfies AuthenticatedUser);
});
return;
}
// Validate the cookie asynchronously. The upgrade handler must not hold
// the socket open for too long — the auth timeout (5s default) is the
// upper bound.
authClient.validate(cookieHeader).then((user) => {
if (!user) {
socket.write(
'HTTP/1.1 401 Unauthorized\r\n' +
'Content-Length: 0\r\n' +
'Connection: close\r\n' +
'\r\n',
);
socket.destroy();
return;
}
// Stash user + cookieHeader on the request so the connection handler
// can pick them up without a second async call.
(req as http.IncomingMessage & { _liveUser: AuthenticatedUser; _liveCookie: string })._liveUser = user;
(req as http.IncomingMessage & { _liveUser: AuthenticatedUser; _liveCookie: string })._liveCookie = cookieHeader;
wss.handleUpgrade(req, socket, head, (ws) => {
wss.emit('connection', ws, req);
});
}).catch((err: unknown) => {
logger.error({ err }, 'auth validation threw unexpectedly during upgrade');
socket.write(
'HTTP/1.1 500 Internal Server Error\r\n' +
'Content-Length: 0\r\n' +
'Connection: close\r\n' +
'\r\n',
);
socket.destroy();
});
});
// -------------------------------------------------------------------------
@@ -145,19 +194,37 @@ export function createLiveServer(
// -------------------------------------------------------------------------
wss.on('connection', (ws, req: http.IncomingMessage) => {
// Retrieve the user stashed by the upgrade handler. When auth is disabled
// (no authClient), fall back to a placeholder anonymous user.
type AugmentedRequest = http.IncomingMessage & {
_liveUser?: AuthenticatedUser;
_liveCookie?: string;
};
const augmented = req as AugmentedRequest;
const user: AuthenticatedUser = augmented._liveUser ?? {
id: crypto.randomUUID(),
email: null,
role: null,
first_name: null,
last_name: null,
};
const cookieHeader = augmented._liveCookie ?? '';
const conn: LiveConnection = {
id: crypto.randomUUID(),
ws,
remoteAddr: req.socket.remoteAddress ?? 'unknown',
openedAt: new Date(),
lastSeenAt: new Date(),
user,
cookieHeader,
};
connections.set(conn.id, conn);
metrics.observe('processor_live_connections', connections.size);
logger.debug(
{ connId: conn.id, remote: conn.remoteAddr },
{ connId: conn.id, remote: conn.remoteAddr, userId: user.id },
'connection opened',
);
+127
View File
@@ -0,0 +1,127 @@
/**
* Snapshot provider — returns the latest non-faulty position per device for a
* given event at subscribe time.
*
* Called once per `subscribe` message, inside registry.ts's `subscribe()` after
* authorization succeeds. The result is included in the `subscribed` reply so
* the SPA map is fully populated immediately rather than waiting for the next
* live broadcast batch.
*
* Query:
* DISTINCT ON (p.device_id) ... ORDER BY p.device_id, p.ts DESC
* returns the row with the highest `ts` per device in one Postgres pass.
* Requires the `positions_device_ts_idx ON positions (device_id, ts DESC)`
* index created in migration 0002.
*
* Spec: processor-ws-contract.md §Server response — subscribed;
* task 1.5.5 §The query
*/
import type pg from 'pg';
import type { Logger } from 'pino';
import type { Metrics } from '../shared/types.js';
import type { PositionSnapshotEntry } from './protocol.js';
import type { SnapshotProvider } from './registry.js';
// ---------------------------------------------------------------------------
// Query result row shape
// ---------------------------------------------------------------------------
type SnapshotRow = {
device_id: string;
latitude: number;
longitude: number;
ts: Date;
speed: number;
angle: number;
};
// ---------------------------------------------------------------------------
// Factory
// ---------------------------------------------------------------------------
export function createSnapshotProvider(
pool: pg.Pool,
logger: Logger,
metrics: Metrics,
): SnapshotProvider {
/**
* Returns the latest non-faulty position for every device registered to the
* given event. Returns an empty array when:
* - the event has no `entry_devices` rows.
* - the registered devices have no positions yet.
* - all positions for a device are faulty.
*
* Never throws — the caller (registry.fetchSnapshot) already wraps in a
* try/catch that falls back to an empty snapshot.
*/
async function forEvent(eventId: string): Promise<PositionSnapshotEntry[]> {
const start = performance.now();
const result = await pool.query<SnapshotRow>(
`SELECT DISTINCT ON (p.device_id)
p.device_id,
p.latitude,
p.longitude,
p.ts,
p.speed,
p.angle
FROM positions p
JOIN entry_devices ed ON ed.device_id = p.device_id
JOIN entries e ON e.id = ed.entry_id
WHERE e.event_id = $1
AND p.faulty = false
ORDER BY p.device_id, p.ts DESC`,
[eventId],
);
const elapsed = performance.now() - start;
metrics.observe('processor_live_snapshot_query_latency_ms', elapsed);
metrics.observe('processor_live_snapshot_size', result.rows.length);
logger.debug(
{ eventId, count: result.rows.length, elapsedMs: Math.round(elapsed) },
'snapshot query completed',
);
return result.rows.map(rowToSnapshotEntry);
}
return { forEvent };
}
// ---------------------------------------------------------------------------
// Row → wire type
// ---------------------------------------------------------------------------
/**
* Maps a Postgres snapshot row to a PositionSnapshotEntry.
*
* Field omission convention: speed and course (angle) are omitted when zero,
* matching the broadcast consumer's `toPositionMessage` convention. Per Teltonika
* protocol, 0 speed may indicate an invalid GPS fix; 0 angle is meaningless when
* the device is stationary. Emit the field only when it carries information.
*
* `ts` is stored as a `timestamptz` in Postgres and returned as a JavaScript
* `Date` by node-postgres. Convert to epoch ms for the wire format.
*/
function rowToSnapshotEntry(row: SnapshotRow): PositionSnapshotEntry {
const entry: PositionSnapshotEntry = {
deviceId: row.device_id,
lat: row.latitude,
lon: row.longitude,
ts: row.ts instanceof Date ? row.ts.getTime() : Number(row.ts),
};
// Omit speed when 0 — matches broadcast.ts toPositionMessage convention.
if (row.speed > 0) {
(entry as Record<string, unknown>)['speed'] = row.speed;
}
// Omit course when 0 — angle of 0 is uninformative when stationary.
if (row.angle > 0) {
(entry as Record<string, unknown>)['course'] = row.angle;
}
return entry;
}
+53 -16
View File
@@ -16,9 +16,15 @@ import { connectRedis, createConsumer } from './core/consumer.js';
import type { ConsumedRecord } from './core/consumer.js';
import { createDeviceStateStore } from './core/state.js';
import { createWriter } from './core/writer.js';
import { createLiveServer, sendOutbound } from './live/server.js';
import { createLiveServer } from './live/server.js';
import type { LiveServer, LiveConnection } from './live/server.js';
import type { InboundMessage } from './live/protocol.js';
import { createAuthClient } from './live/auth.js';
import { createAuthzClient } from './live/authz.js';
import { createSubscriptionRegistry } from './live/registry.js';
import { createBroadcastConsumer } from './live/broadcast.js';
import { createDeviceEventMap } from './live/device-event-map.js';
import { createSnapshotProvider } from './live/snapshot.js';
// -------------------------------------------------------------------------
// Startup: validate config (fail fast on bad env), build logger
@@ -131,40 +137,61 @@ async function main(): Promise<void> {
return ackIds;
};
// 10. Build the live WebSocket server (task 1.5.1).
// The stub message handler replies with `error/not-implemented` until
// tasks 1.5.2 and 1.5.3 wire in the real auth + registry handler.
const stubMessageHandler = async (
// 10. Build the live WebSocket server (tasks 1.5.2 and 1.5.3).
const authClient = createAuthClient(config, logger, metrics);
const authzClient = createAuthzClient(config, logger, metrics);
const snapshotProvider = createSnapshotProvider(pool, logger, metrics);
const registry = createSubscriptionRegistry(authzClient, config, logger, metrics, snapshotProvider);
const messageHandler = async (
conn: LiveConnection,
_message: InboundMessage,
message: InboundMessage,
): Promise<void> => {
sendOutbound(
conn,
{ type: 'error', code: 'not-implemented' },
metrics,
config.LIVE_WS_BACKPRESSURE_THRESHOLD_BYTES,
);
if (message.type === 'subscribe') {
await registry.subscribe(conn, message.topic, message.id);
} else if (message.type === 'unsubscribe') {
registry.unsubscribe(conn, message.topic, message.id);
}
};
const liveServer: LiveServer = createLiveServer(
config,
logger,
metrics,
stubMessageHandler,
messageHandler,
(conn) => registry.onConnectionClose(conn),
authClient,
);
// 10b. Build the device-event map (Postgres-backed, periodic refresh).
const deviceEventMap = createDeviceEventMap(pool, config, logger, metrics);
// 10c. Build the broadcast consumer (per-instance consumer group fan-out).
const broadcastConsumer = createBroadcastConsumer(
redis,
registry,
deviceEventMap,
config,
logger,
metrics,
);
await liveServer.start();
await deviceEventMap.start();
await broadcastConsumer.start();
// 11. Build and start the durable-write consumer
const consumer = createConsumer(redis, config, logger, metrics, sink);
await consumer.start();
// 12. Install graceful shutdown.
// Shutdown order: live server first (no new connections), then
// broadcast consumer (task 1.5.4 adds this), then durable-write consumer.
// Shutdown order: live server first (no new connections),
// then broadcast consumer, then durable-write consumer last.
installGracefulShutdown({
redis,
pool,
consumer,
broadcastConsumer,
deviceEventMap,
liveServer,
metricsServer,
pgHealth,
@@ -192,6 +219,8 @@ type ShutdownDeps = {
readonly redis: Redis;
readonly pool: pg.Pool;
readonly consumer: { stop: () => Promise<void> };
readonly broadcastConsumer: { stop: () => Promise<void> };
readonly deviceEventMap: { stop: () => void };
readonly liveServer: LiveServer;
readonly metricsServer: http.Server;
readonly pgHealth: { stop: () => void };
@@ -200,7 +229,10 @@ type ShutdownDeps = {
};
function installGracefulShutdown(deps: ShutdownDeps): void {
const { redis, pool, consumer, liveServer, metricsServer, pgHealth, lagSampler, logger: log } = deps;
const {
redis, pool, consumer, broadcastConsumer, deviceEventMap,
liveServer, metricsServer, pgHealth, lagSampler, logger: log,
} = deps;
let shuttingDown = false;
@@ -226,6 +258,11 @@ function installGracefulShutdown(deps: ShutdownDeps): void {
.stop()
.then(() => {
log.info('live server stopped');
deviceEventMap.stop();
return broadcastConsumer.stop();
})
.then(() => {
log.info('broadcast consumer stopped');
return consumer.stop();
})
.then(() => {
+229
View File
@@ -0,0 +1,229 @@
/**
* Sentinel decoder for Position records arriving from the Redis Stream.
*
* Moved from src/core/codec.ts to src/shared/ so that both src/core/ and
* src/live/ can import it without crossing the enforced boundary between those
* two layers.
*
* tcp-ingestion serializes Position objects with a custom JSON replacer that
* encodes types not natively supported by JSON:
* - bigint → { __bigint: "<decimal-digits>" }
* - Buffer → { __buffer_b64: "<base64>" }
* - Date → ISO8601 string
*
* This module reverses that encoding so the Processor receives fully-typed
* Position objects. The contract is documented in:
* docs/wiki/concepts/position-record.md
* tcp-ingestion/src/core/publish.ts (jsonReplacer)
*/
import type { Position, AttributeValue } from './types.js';
// ---------------------------------------------------------------------------
// Error type
// ---------------------------------------------------------------------------
export class CodecError extends Error {
override readonly name = 'CodecError';
constructor(message: string, options?: ErrorOptions) {
super(message, options);
}
}
// ---------------------------------------------------------------------------
// Sentinel detection helpers
// ---------------------------------------------------------------------------
/**
* Returns true when the value is exactly `{ __bigint: "<string>" }`.
* The shape must have exactly one key — any extra keys indicate a user-defined
* object that coincidentally has a `__bigint` field, which is not a sentinel.
* In practice tcp-ingestion only emits single-key sentinels; validate strictly.
*/
function isBigintSentinel(value: unknown): value is { __bigint: string } {
if (typeof value !== 'object' || value === null) return false;
const keys = Object.keys(value);
return (
keys.length === 1 &&
keys[0] === '__bigint' &&
typeof (value as Record<string, unknown>)['__bigint'] === 'string'
);
}
/**
* Returns true when the value is exactly `{ __buffer_b64: "<string>" }`.
*/
function isBufferSentinel(value: unknown): value is { __buffer_b64: string } {
if (typeof value !== 'object' || value === null) return false;
const keys = Object.keys(value);
return (
keys.length === 1 &&
keys[0] === '__buffer_b64' &&
typeof (value as Record<string, unknown>)['__buffer_b64'] === 'string'
);
}
// ---------------------------------------------------------------------------
// Reviver
// ---------------------------------------------------------------------------
/**
* JSON.parse reviver that reconstructs the live types from sentinel encodings.
*
* Called by JSON.parse for every key-value pair in the document, bottom-up.
* By the time `attributes` is visited, each attribute value has already been
* converted (sentinels → bigint/Buffer), because JSON.parse visits leaves first.
*
* Reviver must return `unknown` because the result type depends on the key.
* The caller casts the final result to `PositionJson` after validation.
*/
function reviver(key: string, value: unknown): unknown {
// Timestamp field: ISO string → Date
if (key === 'timestamp' && typeof value === 'string') {
const date = new Date(value);
if (isNaN(date.getTime())) {
throw new CodecError(`Invalid timestamp value: "${value}"`);
}
return date;
}
// bigint sentinel
if (isBigintSentinel(value)) {
const digits = value.__bigint;
// Validate: only decimal digits (including optional leading minus for
// negative bigints, though Teltonika IO elements are unsigned).
if (!/^-?\d+$/.test(digits)) {
throw new CodecError(
`Malformed __bigint sentinel: expected decimal digits, got "${digits}"`,
);
}
return BigInt(digits);
}
// Buffer sentinel
if (isBufferSentinel(value)) {
const b64 = value.__buffer_b64;
// Validate base64 characters (standard + URL-safe alphabets, with padding)
if (!/^[A-Za-z0-9+/\-_]*={0,2}$/.test(b64)) {
throw new CodecError(
`Malformed __buffer_b64 sentinel: invalid base64 string "${b64}"`,
);
}
return Buffer.from(b64, 'base64');
}
return value;
}
// ---------------------------------------------------------------------------
// Required field validation
// ---------------------------------------------------------------------------
const REQUIRED_NUMERIC_FIELDS = [
'latitude',
'longitude',
'altitude',
'angle',
'speed',
'satellites',
'priority',
] as const;
/**
* Validates the decoded object has all required Position fields with the
* correct types. Throws `CodecError` naming the first failing field.
*/
function validateDecodedPosition(obj: Record<string, unknown>): asserts obj is {
device_id: string;
timestamp: Date;
latitude: number;
longitude: number;
altitude: number;
angle: number;
speed: number;
satellites: number;
priority: number;
attributes: Record<string, AttributeValue>;
} {
if (typeof obj['device_id'] !== 'string' || obj['device_id'].length === 0) {
throw new CodecError('Missing or invalid field: device_id (expected non-empty string)');
}
if (!(obj['timestamp'] instanceof Date)) {
throw new CodecError(
'Missing or invalid field: timestamp (expected Date after reviver; was ISO string decoded?)',
);
}
for (const field of REQUIRED_NUMERIC_FIELDS) {
if (typeof obj[field] !== 'number') {
throw new CodecError(
`Missing or invalid field: ${field} (expected number, got ${typeof obj[field]})`,
);
}
}
if (typeof obj['attributes'] !== 'object' || obj['attributes'] === null) {
throw new CodecError('Missing or invalid field: attributes (expected object)');
}
// Validate priority is exactly 0, 1, or 2
const priority = obj['priority'] as number;
if (priority !== 0 && priority !== 1 && priority !== 2) {
throw new CodecError(
`Invalid field: priority (expected 0 | 1 | 2, got ${priority})`,
);
}
// Validate attributes values are only AttributeValue types
const attrs = obj['attributes'] as Record<string, unknown>;
for (const [attrKey, attrVal] of Object.entries(attrs)) {
if (
typeof attrVal !== 'number' &&
typeof attrVal !== 'bigint' &&
!Buffer.isBuffer(attrVal)
) {
throw new CodecError(
`Invalid attribute "${attrKey}": expected number | bigint | Buffer, got ${typeof attrVal}`,
);
}
}
}
// ---------------------------------------------------------------------------
// Public API
// ---------------------------------------------------------------------------
/**
* Decodes a JSON-encoded Position string (with sentinel encoding applied by
* tcp-ingestion's `serializePosition`) into a fully-typed `Position` object.
*
* Throws `CodecError` if the JSON is malformed, a sentinel is invalid, a
* required field is missing, or a field has the wrong type.
*/
export function decodePosition(payload: string): Position {
let parsed: unknown;
try {
parsed = JSON.parse(payload, reviver);
} catch (err) {
if (err instanceof CodecError) {
throw err;
}
throw new CodecError(
`Failed to parse Position payload as JSON: ${err instanceof Error ? err.message : String(err)}`,
{ cause: err },
);
}
if (typeof parsed !== 'object' || parsed === null || Array.isArray(parsed)) {
throw new CodecError('Position payload must be a JSON object');
}
const obj = parsed as Record<string, unknown>;
validateDecodedPosition(obj);
return obj as unknown as Position;
}
+37 -2
View File
@@ -4,8 +4,11 @@
* Both modules need the `Metrics` interface for observability. Placing it here
* avoids an import across the enforced src/core/ ↔ src/live/ boundary.
*
* src/core/types.ts re-exports Metrics from here to preserve the existing
* import path for Phase 1 call sites.
* `Position` and `AttributeValue` are placed here so that src/live/broadcast.ts
* can reference them without importing across the src/core/ ↔ src/live/ boundary.
*
* src/core/types.ts re-exports all shared types to preserve existing import
* paths for Phase 1 call sites.
*/
// ---------------------------------------------------------------------------
@@ -27,3 +30,35 @@ export type Metrics = {
) => void;
readonly observe: (name: string, value: number, labels?: Record<string, string>) => void;
};
// ---------------------------------------------------------------------------
// Position — input contract from tcp-ingestion
// ---------------------------------------------------------------------------
/**
* A single IO attribute value from the Teltonika AVL record.
* - number : fixed-width IO elements (N1/N2/N4 — fit safely in JS number)
* - bigint : N8 elements (u64, may exceed Number.MAX_SAFE_INTEGER)
* - Buffer : NX variable-length elements (Codec 8 Extended)
*/
export type AttributeValue = number | bigint | Buffer;
/**
* Normalized GPS position record. Byte-equivalent to tcp-ingestion's `Position`
* type (docs/wiki/concepts/position-record.md).
*
* `priority` is typed as a union rather than `number` to stay consistent with
* tcp-ingestion and make exhaustive switches possible in domain logic.
*/
export type Position = {
readonly device_id: string;
readonly timestamp: Date;
readonly latitude: number;
readonly longitude: number;
readonly altitude: number;
readonly angle: number; // heading 0360°
readonly speed: number; // km/h; 0 may mean "GPS invalid" — preserve verbatim
readonly satellites: number;
readonly priority: 0 | 1 | 2; // 0=Low, 1=High, 2=Panic
readonly attributes: Readonly<Record<string, AttributeValue>>;
};
+39
View File
@@ -0,0 +1,39 @@
-- test/fixtures/test-schema.sql
--
-- Minimum subset of the production schema required by live.integration.test.ts.
-- This is intentionally a simplified version — NOT the full Directus-managed schema.
--
-- Maintenance note: keep in sync with the real schema when column types change on
-- these tables. Specifically: entries.event_id, entry_devices.device_id (Phase 1
-- uses IMEI text; Phase 2 introduces UUID-based devices table).
--
-- Phase 1 deviation: entry_devices.device_id is TEXT (IMEI) here, matching
-- positions.device_id. The real Directus schema uses a UUID FK to devices.id.
-- The integration test uses the real queries from device-event-map.ts and
-- snapshot.ts, so this simplified schema must satisfy those joins.
-- events — the container for entries
-- The Processor reads events.id (used in snapshot WHERE e.event_id = $1).
CREATE TABLE IF NOT EXISTS events (
id uuid PRIMARY KEY DEFAULT gen_random_uuid()
-- Real schema also has: organization_id FK, name, slug, discipline, starts_at, ends_at.
-- Only columns the Processor queries are included here.
);
-- entries — race entries belonging to an event
-- The Processor reads entries.id and entries.event_id.
CREATE TABLE IF NOT EXISTS entries (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
event_id uuid NOT NULL REFERENCES events (id) ON DELETE CASCADE
-- Real schema also has: vehicle_id, class_id, number, etc.
);
-- entry_devices — maps a device (IMEI) to an entry.
-- Phase 1: device_id is IMEI text, matching positions.device_id.
-- Real schema: device_id is UUID FK to devices.id, joined via devices.imei.
-- This simplified form is intentional for the integration test fixture.
CREATE TABLE IF NOT EXISTS entry_devices (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
entry_id uuid NOT NULL REFERENCES entries (id) ON DELETE CASCADE,
device_id text NOT NULL -- IMEI in Phase 1
);
+107
View File
@@ -0,0 +1,107 @@
/**
* Minimal HTTP server stub impersonating the two Directus endpoints the
* Processor calls:
*
* GET /users/me — returns a fake user if the cookie matches
* GET /items/events/:id — returns 200 if (cookie, eventId) is allowed
*
* Instantiate with `createDirectusStub(opts)` and tear down with
* `stub.close()`. The stub binds to a random OS port and exposes `stub.url`
* for config injection.
*
* Design: bare `node:http` — no Express dependency.
*/
import * as http from 'node:http';
import type { AddressInfo } from 'node:net';
// ---------------------------------------------------------------------------
// Public types
// ---------------------------------------------------------------------------
export type FakeUser = {
readonly id: string;
readonly email: string;
readonly role: string | null;
readonly first_name: string;
readonly last_name: string;
};
export type StubOptions = {
/**
* Map from the raw cookie header value (e.g. `"session=abc"`) to the fake
* user that cookie represents. Any cookie not in this map → 401.
*/
readonly allowedCookieToUser: Map<string, FakeUser>;
/**
* Map from Directus user ID → set of event IDs that user may access.
* A request from a valid user for an event not in their set → 403.
*/
readonly allowedEvents: Map<string, Set<string>>;
};
export type DirectusStub = {
readonly url: string;
readonly close: () => Promise<void>;
};
// ---------------------------------------------------------------------------
// Factory
// ---------------------------------------------------------------------------
/**
* Creates and starts a Directus stub server on a random port.
* Returns a promise that resolves once the server is listening.
*/
export function createDirectusStub(opts: StubOptions): Promise<DirectusStub> {
const server = http.createServer((req, res) => {
const cookie = req.headers['cookie'] ?? '';
const user = opts.allowedCookieToUser.get(cookie);
// GET /users/me
if (req.url === '/users/me') {
if (!user) {
res.writeHead(401).end();
return;
}
res.writeHead(200, { 'content-type': 'application/json' });
res.end(JSON.stringify({ data: user }));
return;
}
// GET /items/events/:id
const eventMatch = /^\/items\/events\/([0-9a-f-]+)/i.exec(req.url ?? '');
if (eventMatch) {
if (!user) {
res.writeHead(401).end();
return;
}
const eventId = eventMatch[1]!;
const allowed = opts.allowedEvents.get(user.id)?.has(eventId) ?? false;
if (!allowed) {
res.writeHead(403).end();
return;
}
res.writeHead(200, { 'content-type': 'application/json' });
res.end(JSON.stringify({ data: { id: eventId } }));
return;
}
res.writeHead(404).end();
});
return new Promise((resolve, reject) => {
server.on('error', reject);
server.listen(0, '127.0.0.1', () => {
server.off('error', reject);
const addr = server.address() as AddressInfo;
resolve({
url: `http://127.0.0.1:${addr.port}`,
close: () =>
new Promise((res, rej) =>
server.close((err) => (err ? rej(err) : res())),
),
});
});
});
}
+273
View File
@@ -0,0 +1,273 @@
/**
* Unit tests for src/live/auth.ts — Cookie auth handshake.
*
* All Directus HTTP calls are intercepted by mocking globalThis.fetch.
* No real network calls.
*
* Covers:
* - 200 + valid user payload → returns parsed AuthenticatedUser.
* - 401 → returns null and increments `unauthorized` counter.
* - 403 → returns null and increments `unauthorized` counter.
* - Non-2xx (500) → returns null and increments `error` counter.
* - Network error (fetch throws) → returns null and increments `error` counter.
* - AbortError (timeout) → returns null and increments `error` counter.
* - 200 but missing `data` field → returns null and increments `error` counter.
* - 200 with `data: null` (expired session) → returns null and increments `unauthorized`.
* - 200 but user object missing `id` → returns null and increments `error`.
* - Empty cookie header → returns null immediately (no fetch call).
* - Auth latency histogram is observed on success.
*/
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
import type { Logger } from 'pino';
import type { Config } from '../src/config/load.js';
import type { Metrics } from '../src/core/types.js';
import { createAuthClient } from '../src/live/auth.js';
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
function makeSilentLogger(): Logger {
return {
debug: vi.fn(),
info: vi.fn(),
warn: vi.fn(),
error: vi.fn(),
fatal: vi.fn(),
trace: vi.fn(),
child: vi.fn().mockReturnThis(),
level: 'silent',
silent: vi.fn(),
} as unknown as Logger;
}
type TestMetrics = Metrics & {
readonly incCalls: Array<{ name: string; labels?: Record<string, string> }>;
readonly observeCalls: Array<{ name: string; value: number }>;
};
function makeMetrics(): TestMetrics {
const incCalls: Array<{ name: string; labels?: Record<string, string> }> = [];
const observeCalls: Array<{ name: string; value: number }> = [];
return {
incCalls,
observeCalls,
inc(name, labels) { incCalls.push({ name, labels }); },
observe(name, value) { observeCalls.push({ name, value }); },
};
}
function makeConfig(overrides: Partial<Config> = {}): Config {
return {
NODE_ENV: 'test',
INSTANCE_ID: 'test-1',
LOG_LEVEL: 'silent',
REDIS_URL: 'redis://localhost:6379',
POSTGRES_URL: 'postgres://localhost:5432/test',
REDIS_TELEMETRY_STREAM: 'telemetry:t',
REDIS_CONSUMER_GROUP: 'processor',
REDIS_CONSUMER_NAME: 'test-consumer',
METRICS_PORT: 0,
BATCH_SIZE: 100,
BATCH_BLOCK_MS: 500,
WRITE_BATCH_SIZE: 50,
DEVICE_STATE_LRU_CAP: 10_000,
LIVE_WS_PORT: 8081,
LIVE_WS_HOST: '0.0.0.0',
LIVE_WS_PING_INTERVAL_MS: 30_000,
LIVE_WS_DRAIN_TIMEOUT_MS: 5_000,
LIVE_WS_BACKPRESSURE_THRESHOLD_BYTES: 1_048_576,
DIRECTUS_BASE_URL: 'http://directus.test',
DIRECTUS_AUTH_TIMEOUT_MS: 5_000,
DIRECTUS_AUTHZ_TIMEOUT_MS: 5_000,
LIVE_BROADCAST_GROUP_PREFIX: 'live-broadcast',
LIVE_BROADCAST_BATCH_SIZE: 100,
LIVE_BROADCAST_BATCH_BLOCK_MS: 1_000,
LIVE_DEVICE_EVENT_REFRESH_MS: 30_000,
...overrides,
};
}
const VALID_USER = {
id: 'ada60b3d-b29f-4017-b702-cd6b700f9f6c',
email: 'driver@example.com',
role: 'f6114c7e-1e94-488a-93c3-41060fcb06bc',
first_name: 'Test',
last_name: 'User',
};
function makeOkFetch(data: unknown): typeof fetch {
return vi.fn().mockResolvedValue({
status: 200,
ok: true,
json: () => Promise.resolve({ data }),
} as unknown as Response);
}
function makeStatusFetch(status: number): typeof fetch {
return vi.fn().mockResolvedValue({
status,
ok: status >= 200 && status < 300,
json: () => Promise.resolve({}),
} as unknown as Response);
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
describe('createAuthClient.validate', () => {
let metrics: TestMetrics;
let logger: Logger;
let originalFetch: typeof globalThis.fetch;
beforeEach(() => {
metrics = makeMetrics();
logger = makeSilentLogger();
originalFetch = globalThis.fetch;
});
afterEach(() => {
globalThis.fetch = originalFetch;
vi.restoreAllMocks();
});
it('returns the parsed user when Directus returns 200 with a valid user payload', async () => {
globalThis.fetch = makeOkFetch(VALID_USER);
const client = createAuthClient(makeConfig(), logger, metrics);
const user = await client.validate('session=abc123');
expect(user).not.toBeNull();
expect(user!.id).toBe(VALID_USER.id);
expect(user!.email).toBe(VALID_USER.email);
const successCalls = metrics.incCalls.filter(
(c) => c.name === 'processor_live_auth_attempts_total' && c.labels?.['result'] === 'success',
);
expect(successCalls).toHaveLength(1);
});
it('returns null and increments unauthorized counter on 401', async () => {
globalThis.fetch = makeStatusFetch(401);
const client = createAuthClient(makeConfig(), logger, metrics);
const user = await client.validate('session=bad');
expect(user).toBeNull();
const unauthorizedCalls = metrics.incCalls.filter(
(c) => c.name === 'processor_live_auth_attempts_total' && c.labels?.['result'] === 'unauthorized',
);
expect(unauthorizedCalls).toHaveLength(1);
});
it('returns null and increments unauthorized counter on 403', async () => {
globalThis.fetch = makeStatusFetch(403);
const client = createAuthClient(makeConfig(), logger, metrics);
const user = await client.validate('session=forbidden');
expect(user).toBeNull();
const unauthorizedCalls = metrics.incCalls.filter(
(c) => c.name === 'processor_live_auth_attempts_total' && c.labels?.['result'] === 'unauthorized',
);
expect(unauthorizedCalls).toHaveLength(1);
});
it('returns null and increments error counter on 500', async () => {
globalThis.fetch = makeStatusFetch(500);
const client = createAuthClient(makeConfig(), logger, metrics);
const user = await client.validate('session=boom');
expect(user).toBeNull();
const errorCalls = metrics.incCalls.filter(
(c) => c.name === 'processor_live_auth_attempts_total' && c.labels?.['result'] === 'error',
);
expect(errorCalls).toHaveLength(1);
});
it('returns null and increments error counter when fetch throws a network error', async () => {
globalThis.fetch = vi.fn().mockRejectedValue(new Error('ECONNREFUSED'));
const client = createAuthClient(makeConfig(), logger, metrics);
const user = await client.validate('session=abc');
expect(user).toBeNull();
const errorCalls = metrics.incCalls.filter(
(c) => c.name === 'processor_live_auth_attempts_total' && c.labels?.['result'] === 'error',
);
expect(errorCalls).toHaveLength(1);
});
it('returns null when fetch is aborted (simulated timeout)', async () => {
const abortErr = new DOMException('The operation was aborted', 'AbortError');
globalThis.fetch = vi.fn().mockRejectedValue(abortErr);
const client = createAuthClient(makeConfig({ DIRECTUS_AUTH_TIMEOUT_MS: 50 }), logger, metrics);
const user = await client.validate('session=slow');
expect(user).toBeNull();
const errorCalls = metrics.incCalls.filter(
(c) => c.name === 'processor_live_auth_attempts_total' && c.labels?.['result'] === 'error',
);
expect(errorCalls).toHaveLength(1);
});
it('returns null and increments error counter when response body is missing data field', async () => {
globalThis.fetch = vi.fn().mockResolvedValue({
status: 200,
ok: true,
json: () => Promise.resolve({}), // no `data` key at all
} as unknown as Response);
const client = createAuthClient(makeConfig(), logger, metrics);
const user = await client.validate('session=weird');
expect(user).toBeNull();
const errorCalls = metrics.incCalls.filter(
(c) => c.name === 'processor_live_auth_attempts_total' && c.labels?.['result'] === 'error',
);
expect(errorCalls).toHaveLength(1);
});
it('returns null and increments unauthorized counter when data is null (expired session)', async () => {
globalThis.fetch = makeOkFetch(null);
const client = createAuthClient(makeConfig(), logger, metrics);
const user = await client.validate('session=expired');
expect(user).toBeNull();
const unauthorizedCalls = metrics.incCalls.filter(
(c) => c.name === 'processor_live_auth_attempts_total' && c.labels?.['result'] === 'unauthorized',
);
expect(unauthorizedCalls).toHaveLength(1);
});
it('returns null and increments error counter when user object is missing id', async () => {
globalThis.fetch = makeOkFetch({ email: 'noId@example.com', role: null });
const client = createAuthClient(makeConfig(), logger, metrics);
const user = await client.validate('session=noid');
expect(user).toBeNull();
const errorCalls = metrics.incCalls.filter(
(c) => c.name === 'processor_live_auth_attempts_total' && c.labels?.['result'] === 'error',
);
expect(errorCalls).toHaveLength(1);
});
it('returns null immediately for an empty cookie header without making a fetch call', async () => {
const mockFetch = vi.fn();
globalThis.fetch = mockFetch;
const client = createAuthClient(makeConfig(), logger, metrics);
const user = await client.validate('');
expect(user).toBeNull();
expect(mockFetch).not.toHaveBeenCalled();
});
it('observes auth latency on a successful call', async () => {
globalThis.fetch = makeOkFetch(VALID_USER);
const client = createAuthClient(makeConfig(), logger, metrics);
await client.validate('session=ok');
const latencyCalls = metrics.observeCalls.filter(
(c) => c.name === 'processor_live_auth_latency_ms',
);
expect(latencyCalls.length).toBeGreaterThanOrEqual(1);
expect(latencyCalls[0]!.value).toBeGreaterThanOrEqual(0);
});
});
+148
View File
@@ -0,0 +1,148 @@
/**
* Unit tests for src/live/authz.ts — per-event authorization.
*
* Covers:
* - canAccessEvent returns { allowed: true } when /items/events/:id returns 200.
* - Returns { allowed: false, reason: 'forbidden' } on 403.
* - Returns { allowed: false, reason: 'not-found' } on 404.
* - Returns { allowed: false, reason: 'error' } on network failure (never throws).
* - Returns { allowed: false, reason: 'error' } on 500.
* - Authz latency histogram is observed on every call.
*/
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
import type { Logger } from 'pino';
import type { Config } from '../src/config/load.js';
import type { Metrics } from '../src/core/types.js';
import { createAuthzClient } from '../src/live/authz.js';
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
function makeSilentLogger(): Logger {
return {
debug: vi.fn(), info: vi.fn(), warn: vi.fn(), error: vi.fn(),
fatal: vi.fn(), trace: vi.fn(), child: vi.fn().mockReturnThis(),
level: 'silent', silent: vi.fn(),
} as unknown as Logger;
}
type TestMetrics = Metrics & {
readonly observeCalls: Array<{ name: string; value: number }>;
};
function makeMetrics(): TestMetrics {
const observeCalls: Array<{ name: string; value: number }> = [];
return {
observeCalls,
inc: vi.fn(),
observe(name, value) { observeCalls.push({ name, value }); },
};
}
function makeConfig(): Config {
return {
NODE_ENV: 'test',
INSTANCE_ID: 'test-1',
LOG_LEVEL: 'silent',
REDIS_URL: 'redis://localhost:6379',
POSTGRES_URL: 'postgres://localhost:5432/test',
REDIS_TELEMETRY_STREAM: 'telemetry:t',
REDIS_CONSUMER_GROUP: 'processor',
REDIS_CONSUMER_NAME: 'test-consumer',
METRICS_PORT: 0,
BATCH_SIZE: 100,
BATCH_BLOCK_MS: 500,
WRITE_BATCH_SIZE: 50,
DEVICE_STATE_LRU_CAP: 10_000,
LIVE_WS_PORT: 8081,
LIVE_WS_HOST: '0.0.0.0',
LIVE_WS_PING_INTERVAL_MS: 30_000,
LIVE_WS_DRAIN_TIMEOUT_MS: 5_000,
LIVE_WS_BACKPRESSURE_THRESHOLD_BYTES: 1_048_576,
DIRECTUS_BASE_URL: 'http://directus.test',
DIRECTUS_AUTH_TIMEOUT_MS: 5_000,
DIRECTUS_AUTHZ_TIMEOUT_MS: 5_000,
LIVE_BROADCAST_GROUP_PREFIX: 'live-broadcast',
LIVE_BROADCAST_BATCH_SIZE: 100,
LIVE_BROADCAST_BATCH_BLOCK_MS: 1_000,
LIVE_DEVICE_EVENT_REFRESH_MS: 30_000,
};
}
const EVENT_ID = 'ada60b3d-b29f-4017-b702-cd6b700f9f6c';
function makeStatusFetch(status: number): typeof fetch {
return vi.fn().mockResolvedValue({
status,
ok: status >= 200 && status < 300,
json: () => Promise.resolve({ data: { id: EVENT_ID } }),
} as unknown as Response);
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
describe('createAuthzClient.canAccessEvent', () => {
let originalFetch: typeof globalThis.fetch;
beforeEach(() => { originalFetch = globalThis.fetch; });
afterEach(() => {
globalThis.fetch = originalFetch;
vi.restoreAllMocks();
});
it('returns { allowed: true } when Directus returns 200', async () => {
globalThis.fetch = makeStatusFetch(200);
const client = createAuthzClient(makeConfig(), makeSilentLogger(), makeMetrics());
const result = await client.canAccessEvent('cookie=abc', EVENT_ID);
expect(result.allowed).toBe(true);
});
it('returns { allowed: false, reason: "forbidden" } on 403', async () => {
globalThis.fetch = makeStatusFetch(403);
const client = createAuthzClient(makeConfig(), makeSilentLogger(), makeMetrics());
const result = await client.canAccessEvent('cookie=abc', EVENT_ID);
expect(result.allowed).toBe(false);
if (!result.allowed) expect(result.reason).toBe('forbidden');
});
it('returns { allowed: false, reason: "not-found" } on 404', async () => {
globalThis.fetch = makeStatusFetch(404);
const client = createAuthzClient(makeConfig(), makeSilentLogger(), makeMetrics());
const result = await client.canAccessEvent('cookie=abc', EVENT_ID);
expect(result.allowed).toBe(false);
if (!result.allowed) expect(result.reason).toBe('not-found');
});
it('returns { allowed: false, reason: "error" } on 500', async () => {
globalThis.fetch = makeStatusFetch(500);
const client = createAuthzClient(makeConfig(), makeSilentLogger(), makeMetrics());
const result = await client.canAccessEvent('cookie=abc', EVENT_ID);
expect(result.allowed).toBe(false);
if (!result.allowed) expect(result.reason).toBe('error');
});
it('returns { allowed: false, reason: "error" } when fetch throws (never throws itself)', async () => {
globalThis.fetch = vi.fn().mockRejectedValue(new Error('ECONNREFUSED'));
const client = createAuthzClient(makeConfig(), makeSilentLogger(), makeMetrics());
const result = await client.canAccessEvent('cookie=abc', EVENT_ID);
expect(result.allowed).toBe(false);
if (!result.allowed) expect(result.reason).toBe('error');
});
it('observes authz latency on every call', async () => {
globalThis.fetch = makeStatusFetch(200);
const metrics = makeMetrics();
const client = createAuthzClient(makeConfig(), makeSilentLogger(), metrics);
await client.canAccessEvent('cookie=abc', EVENT_ID);
const latencyCalls = metrics.observeCalls.filter(
(c) => c.name === 'processor_live_authz_latency_ms',
);
expect(latencyCalls.length).toBeGreaterThanOrEqual(1);
expect(latencyCalls[0]!.value).toBeGreaterThanOrEqual(0);
});
});
+385
View File
@@ -0,0 +1,385 @@
/**
* Unit tests for src/live/broadcast.ts — broadcast consumer fan-out logic.
*
* Strategy: exercise fanOut in isolation by driving a single-iteration loop.
* We stub XREADGROUP to return one batch of entries, then immediately set
* `stopping = true` via `stop()`. The Redis `xgroup` CREATE call returns
* BUSYGROUP (group already exists) so `ensureGroup` succeeds without a real
* server.
*
* `sendOutbound` is called with real LiveConnection stubs that have a mock
* `ws.send`. This tests the full fanOut → sendOutbound → ws.send path without
* any module mocking.
*
* Covers (spec: task 1.5.4):
* 1. Single subscriber on an event receives a correctly-shaped position message.
* 2. Multiple subscribers on the same event each receive the message.
* 3. Orphan device (not in any event) increments orphan counter, sends nothing.
* 4. Device registered to multiple events emits one message per event topic.
*/
import { describe, it, expect, vi, beforeEach } from 'vitest';
import type { Logger } from 'pino';
import type { Config } from '../src/config/load.js';
import type { Metrics } from '../src/shared/types.js';
import type { SubscriptionRegistry } from '../src/live/registry.js';
import type { DeviceEventMap } from '../src/live/device-event-map.js';
import type { LiveConnection } from '../src/live/server.js';
import { createBroadcastConsumer } from '../src/live/broadcast.js';
import WebSocket from 'ws';
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
function makeSilentLogger(): Logger {
return {
debug: vi.fn(),
info: vi.fn(),
warn: vi.fn(),
error: vi.fn(),
fatal: vi.fn(),
trace: vi.fn(),
child: vi.fn().mockReturnThis(),
level: 'silent',
silent: vi.fn(),
} as unknown as Logger;
}
type RecordedMetrics = Metrics & {
incCalls: Array<{ name: string; labels?: Record<string, string>; value?: number }>;
observeCalls: Array<{ name: string; value: number }>;
};
function makeMetrics(): RecordedMetrics {
const incCalls: Array<{ name: string; labels?: Record<string, string>; value?: number }> = [];
const observeCalls: Array<{ name: string; value: number }> = [];
return {
incCalls,
observeCalls,
inc(name, labels?, value?) { incCalls.push({ name, labels, value }); },
observe(name, value) { observeCalls.push({ name, value }); },
};
}
function makeConfig(): Config {
return {
NODE_ENV: 'test',
INSTANCE_ID: 'test-instance',
LOG_LEVEL: 'silent',
REDIS_URL: 'redis://localhost:6379',
POSTGRES_URL: 'postgres://localhost:5432/test',
REDIS_TELEMETRY_STREAM: 'telemetry:teltonika',
REDIS_CONSUMER_GROUP: 'processor',
REDIS_CONSUMER_NAME: 'test-consumer',
METRICS_PORT: 0,
BATCH_SIZE: 100,
BATCH_BLOCK_MS: 500,
WRITE_BATCH_SIZE: 50,
DEVICE_STATE_LRU_CAP: 10_000,
LIVE_WS_PORT: 8081,
LIVE_WS_HOST: '0.0.0.0',
LIVE_WS_PING_INTERVAL_MS: 30_000,
LIVE_WS_DRAIN_TIMEOUT_MS: 5_000,
LIVE_WS_BACKPRESSURE_THRESHOLD_BYTES: 1_048_576,
DIRECTUS_BASE_URL: 'http://directus.test',
DIRECTUS_AUTH_TIMEOUT_MS: 5_000,
DIRECTUS_AUTHZ_TIMEOUT_MS: 5_000,
LIVE_BROADCAST_GROUP_PREFIX: 'live-broadcast',
LIVE_BROADCAST_BATCH_SIZE: 100,
LIVE_BROADCAST_BATCH_BLOCK_MS: 1_000,
LIVE_DEVICE_EVENT_REFRESH_MS: 30_000,
};
}
/**
* Builds a synthetic LiveConnection stub whose `ws.send` captures JSON-parsed
* outbound messages. `bufferedAmount` is 0 so sendOutbound never closes it.
*/
function makeConn(id = 'conn-1'): LiveConnection & { sentMessages: unknown[] } {
const sentMessages: unknown[] = [];
const ws = {
readyState: WebSocket.OPEN,
bufferedAmount: 0,
send: vi.fn((data: string) => { sentMessages.push(JSON.parse(data)); }),
close: vi.fn(),
} as unknown as WebSocket;
return {
id,
ws,
remoteAddr: '127.0.0.1',
openedAt: new Date(),
lastSeenAt: new Date(),
user: {
id: 'user-1',
email: 'test@test.com',
role: null,
first_name: 'T',
last_name: 'U',
},
cookieHeader: 'session=x',
sentMessages,
};
}
/** Serialises a Position into the flat wire payload that broadcast.ts expects. */
function makePositionPayload(overrides: Partial<{
device_id: string;
timestamp: string;
speed: number;
angle: number;
}> = {}): string {
return JSON.stringify({
device_id: overrides.device_id ?? 'IMEI123',
timestamp: overrides.timestamp ?? new Date('2025-01-01T12:00:00.000Z').toISOString(),
latitude: 41.33165,
longitude: 19.83177,
altitude: 50,
angle: overrides.angle ?? 0,
speed: overrides.speed ?? 0,
satellites: 8,
priority: 0,
attributes: {},
});
}
/**
* Builds a fake XREADGROUP result for a single stream entry.
* ioredis returns: `[[streamName, [[id, fieldValueArray]]]]`
*/
function makeXreadgroupResult(
stream: string,
id: string,
payload: string,
): [string, [string, string[]][]][] {
return [[stream, [[id, ['payload', payload]]]]];
}
/**
* Creates a Redis stub that:
* - `xgroup` returns BUSYGROUP error (group already exists — happy path).
* - `xreadgroup` returns the provided result on the first call, then blocks
* for up to 2s on subsequent calls (simulating real BLOCK behaviour).
* Blocking is implemented by waiting for `stopSignal` to resolve, capped
* at 2000ms so tests cannot hang indefinitely.
* - `xack` resolves immediately and triggers the stopSignal promise.
*/
function makeRedis(
firstXreadgroupResult: [string, [string, string[]][]][] | null,
): Redis & { stopSignal: Promise<void>; triggerStop: () => void } {
let xreadgroupCallCount = 0;
let triggerStop!: () => void;
const stopSignal = new Promise<void>((resolve) => { triggerStop = resolve; });
const redis: Redis & { stopSignal: Promise<void>; triggerStop: () => void } = {
xgroup: vi.fn().mockRejectedValue(Object.assign(new Error('BUSYGROUP group already exists'), {})),
xreadgroup: vi.fn((..._args: unknown[]) => {
xreadgroupCallCount += 1;
if (xreadgroupCallCount === 1) {
return Promise.resolve(firstXreadgroupResult);
}
// Block until stop() is called (or 2s timeout as safety valve).
return Promise.race([
stopSignal.then(() => null as null),
new Promise<null>((resolve) => setTimeout(() => resolve(null), 2_000)),
]);
}),
xack: vi.fn().mockImplementation(() => {
// Signal that the batch has been processed — stop() can now be called.
triggerStop();
return Promise.resolve(1);
}),
status: 'ready',
stopSignal,
triggerStop,
} as unknown as Redis & { stopSignal: Promise<void>; triggerStop: () => void };
return redis;
}
/** Creates a SubscriptionRegistry stub that maps topic → connections. */
function makeRegistry(
topicToConns: Map<string, LiveConnection[]>,
): SubscriptionRegistry {
return {
connectionsForTopic: vi.fn((topic: string) => topicToConns.get(topic) ?? []),
subscribe: vi.fn(),
unsubscribe: vi.fn(),
onConnectionClose: vi.fn(),
topicsForConnection: vi.fn().mockReturnValue([]),
stats: vi.fn().mockReturnValue({ connections: 0, subscriptions: 0, topics: 0 }),
};
}
/** Creates a DeviceEventMap stub. */
function makeDeviceEventMap(deviceToEvents: Map<string, string[]>): DeviceEventMap {
return {
lookup: vi.fn((deviceId: string) => deviceToEvents.get(deviceId) ?? []),
start: vi.fn().mockResolvedValue(undefined),
stop: vi.fn(),
};
}
/**
* Runs the broadcast consumer for one batch: starts it, waits until xack has
* been called (the batch was fully processed), then stops it.
*
* The Redis stub's xreadgroup blocks on the second call until xack fires
* (or 2s timeout), so `stop()` always finds the loop idle before terminating.
*/
async function runOneBatch(
redis: ReturnType<typeof makeRedis>,
registry: SubscriptionRegistry,
deviceEventMap: DeviceEventMap,
config: Config,
logger: Logger,
metrics: Metrics,
): Promise<void> {
const consumer = createBroadcastConsumer(redis, registry, deviceEventMap, config, logger, metrics);
await consumer.start();
// Wait until the xack mock fires (which also triggers stopSignal, causing the
// second xreadgroup call to unblock and return null). Give up after 3s to
// avoid hanging if the batch was empty / all entries were skipped.
await Promise.race([
redis.stopSignal,
new Promise<void>((resolve) => setTimeout(resolve, 3_000)),
]);
await consumer.stop();
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
describe('createBroadcastConsumer', () => {
let config: Config;
let logger: Logger;
let metrics: RecordedMetrics;
const STREAM = 'telemetry:teltonika';
const EVENT_A = 'aaa00000-0000-0000-0000-000000000001';
const EVENT_B = 'bbb00000-0000-0000-0000-000000000002';
const DEVICE_ID = 'IMEI999888777';
beforeEach(() => {
config = makeConfig();
logger = makeSilentLogger();
metrics = makeMetrics();
});
it('sends a correctly-shaped position message to a single subscriber', async () => {
const conn = makeConn('c1');
const topicToConns = new Map([[`event:${EVENT_A}`, [conn]]]);
const deviceToEvents = new Map([[DEVICE_ID, [EVENT_A]]]);
const payload = makePositionPayload({ device_id: DEVICE_ID, speed: 42, angle: 180 });
const redis = makeRedis(makeXreadgroupResult(STREAM, '1-0', payload));
const registry = makeRegistry(topicToConns);
const deviceEventMap = makeDeviceEventMap(deviceToEvents);
await runOneBatch(redis, registry, deviceEventMap, config, logger, metrics);
expect(conn.sentMessages).toHaveLength(1);
const msg = conn.sentMessages[0] as Record<string, unknown>;
expect(msg['type']).toBe('position');
expect(msg['topic']).toBe(`event:${EVENT_A}`);
expect(msg['deviceId']).toBe(DEVICE_ID);
expect(typeof msg['lat']).toBe('number');
expect(typeof msg['lon']).toBe('number');
expect(typeof msg['ts']).toBe('number');
// speed and course are included when non-zero
expect(msg['speed']).toBe(42);
expect(msg['course']).toBe(180);
});
it('sends to all subscribers on the same event', async () => {
const conn1 = makeConn('c1');
const conn2 = makeConn('c2');
const conn3 = makeConn('c3');
const topicToConns = new Map([[`event:${EVENT_A}`, [conn1, conn2, conn3]]]);
const deviceToEvents = new Map([[DEVICE_ID, [EVENT_A]]]);
const payload = makePositionPayload({ device_id: DEVICE_ID });
const redis = makeRedis(makeXreadgroupResult(STREAM, '1-0', payload));
const registry = makeRegistry(topicToConns);
const deviceEventMap = makeDeviceEventMap(deviceToEvents);
await runOneBatch(redis, registry, deviceEventMap, config, logger, metrics);
expect(conn1.sentMessages).toHaveLength(1);
expect(conn2.sentMessages).toHaveLength(1);
expect(conn3.sentMessages).toHaveLength(1);
// All received the same topic
for (const conn of [conn1, conn2, conn3]) {
expect((conn.sentMessages[0] as Record<string, unknown>)['topic']).toBe(`event:${EVENT_A}`);
}
});
it('increments orphan counter and sends nothing for an unregistered device', async () => {
const conn = makeConn('c1');
// Device has no events registered
const deviceToEvents = new Map<string, string[]>();
const topicToConns = new Map([[`event:${EVENT_A}`, [conn]]]);
const payload = makePositionPayload({ device_id: DEVICE_ID });
const redis = makeRedis(makeXreadgroupResult(STREAM, '1-0', payload));
const registry = makeRegistry(topicToConns);
const deviceEventMap = makeDeviceEventMap(deviceToEvents);
await runOneBatch(redis, registry, deviceEventMap, config, logger, metrics);
expect(conn.sentMessages).toHaveLength(0);
const orphanInc = metrics.incCalls.find(
(c) => c.name === 'processor_live_broadcast_orphan_records_total',
);
expect(orphanInc).toBeDefined();
});
it('emits one message per topic for a device registered to multiple events', async () => {
// conn1 subscribes to EVENT_A only, conn2 to EVENT_B only,
// conn3 subscribes to both. The device is registered to both events.
const conn1 = makeConn('c1');
const conn2 = makeConn('c2');
const conn3a = makeConn('c3a'); // conn3's subscription to EVENT_A
const conn3b = makeConn('c3b'); // conn3's subscription to EVENT_B (separate entry)
const topicToConns = new Map([
[`event:${EVENT_A}`, [conn1, conn3a]],
[`event:${EVENT_B}`, [conn2, conn3b]],
]);
const deviceToEvents = new Map([[DEVICE_ID, [EVENT_A, EVENT_B]]]);
const payload = makePositionPayload({ device_id: DEVICE_ID });
const redis = makeRedis(makeXreadgroupResult(STREAM, '1-0', payload));
const registry = makeRegistry(topicToConns);
const deviceEventMap = makeDeviceEventMap(deviceToEvents);
await runOneBatch(redis, registry, deviceEventMap, config, logger, metrics);
// conn1 is in EVENT_A only → 1 message with topic event:EVENT_A
expect(conn1.sentMessages).toHaveLength(1);
expect((conn1.sentMessages[0] as Record<string, unknown>)['topic']).toBe(`event:${EVENT_A}`);
// conn2 is in EVENT_B only → 1 message with topic event:EVENT_B
expect(conn2.sentMessages).toHaveLength(1);
expect((conn2.sentMessages[0] as Record<string, unknown>)['topic']).toBe(`event:${EVENT_B}`);
// conn3a is the EVENT_A entry for conn3 → 1 message
expect(conn3a.sentMessages).toHaveLength(1);
expect((conn3a.sentMessages[0] as Record<string, unknown>)['topic']).toBe(`event:${EVENT_A}`);
// conn3b is the EVENT_B entry for conn3 → 1 message
expect(conn3b.sentMessages).toHaveLength(1);
expect((conn3b.sentMessages[0] as Record<string, unknown>)['topic']).toBe(`event:${EVENT_B}`);
// Fanout counter: EVENT_A has 2 conns, EVENT_B has 2 conns → total 4 increments
const fanoutIncs = metrics.incCalls.filter(
(c) => c.name === 'processor_live_broadcast_fanout_messages_total',
);
expect(fanoutIncs).toHaveLength(4);
});
});
+345
View File
@@ -0,0 +1,345 @@
/**
* Unit tests for src/live/registry.ts — subscription registry.
*
* The registry is instantiated with a mocked authz client and a mocked
* sendOutbound path. LiveConnection objects are synthetic stubs.
*
* Covers:
* - Subscribe to event:<uuid> with permitted user → `subscribed` reply,
* registry counts go up.
* - Subscribe with forbidden user → `error/forbidden` reply, no registry change.
* - Subscribe to `device:<imei>` → `error/unknown-topic`, no registry change.
* - Subscribe twice to the same topic → idempotent (single subscription,
* subscribed reply each call, gauge does not double-count).
* - Unsubscribe from a topic → `unsubscribed` reply, gauge decrements.
* - Unsubscribe from a topic not subscribed → `unsubscribed` reply (idempotent),
* gauge unchanged.
* - Connection close removes all subscriptions; gauge returns to pre-connection level.
* - connectionsForTopic returns the correct set.
* - topicsForConnection returns the correct set.
*/
import { describe, it, expect, vi, beforeEach } from 'vitest';
import type { Logger } from 'pino';
import type { Config } from '../src/config/load.js';
import type { Metrics } from '../src/core/types.js';
import { createSubscriptionRegistry } from '../src/live/registry.js';
import type { AuthzClient } from '../src/live/authz.js';
import type { LiveConnection } from '../src/live/server.js';
import WebSocket from 'ws';
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
function makeSilentLogger(): Logger {
return {
debug: vi.fn(), info: vi.fn(), warn: vi.fn(), error: vi.fn(),
fatal: vi.fn(), trace: vi.fn(), child: vi.fn().mockReturnThis(),
level: 'silent', silent: vi.fn(),
} as unknown as Logger;
}
type TestMetrics = Metrics & {
readonly incCalls: Array<{ name: string; labels?: Record<string, string> }>;
readonly observeCalls: Array<{ name: string; value: number }>;
};
function makeMetrics(): TestMetrics {
const incCalls: Array<{ name: string; labels?: Record<string, string> }> = [];
const observeCalls: Array<{ name: string; value: number }> = [];
return {
incCalls,
observeCalls,
inc(name, labels) { incCalls.push({ name, labels }); },
observe(name, value) { observeCalls.push({ name, value }); },
};
}
function makeConfig(): Config {
return {
NODE_ENV: 'test',
INSTANCE_ID: 'test-1',
LOG_LEVEL: 'silent',
REDIS_URL: 'redis://localhost:6379',
POSTGRES_URL: 'postgres://localhost:5432/test',
REDIS_TELEMETRY_STREAM: 'telemetry:t',
REDIS_CONSUMER_GROUP: 'processor',
REDIS_CONSUMER_NAME: 'test-consumer',
METRICS_PORT: 0,
BATCH_SIZE: 100,
BATCH_BLOCK_MS: 500,
WRITE_BATCH_SIZE: 50,
DEVICE_STATE_LRU_CAP: 10_000,
LIVE_WS_PORT: 8081,
LIVE_WS_HOST: '0.0.0.0',
LIVE_WS_PING_INTERVAL_MS: 30_000,
LIVE_WS_DRAIN_TIMEOUT_MS: 5_000,
LIVE_WS_BACKPRESSURE_THRESHOLD_BYTES: 1_048_576,
DIRECTUS_BASE_URL: 'http://directus.test',
DIRECTUS_AUTH_TIMEOUT_MS: 5_000,
DIRECTUS_AUTHZ_TIMEOUT_MS: 5_000,
LIVE_BROADCAST_GROUP_PREFIX: 'live-broadcast',
LIVE_BROADCAST_BATCH_SIZE: 100,
LIVE_BROADCAST_BATCH_BLOCK_MS: 1_000,
LIVE_DEVICE_EVENT_REFRESH_MS: 30_000,
};
}
const EVENT_ID = 'ada60b3d-b29f-4017-b702-cd6b700f9f6c';
const EVENT_TOPIC = `event:${EVENT_ID}`;
/**
* Creates a synthetic LiveConnection stub that captures sent messages.
*/
function makeConn(id = 'conn-1'): LiveConnection & { sentMessages: unknown[] } {
const sentMessages: unknown[] = [];
const ws = {
readyState: WebSocket.OPEN,
bufferedAmount: 0,
send: vi.fn((data: string) => { sentMessages.push(JSON.parse(data)); }),
close: vi.fn(),
} as unknown as WebSocket;
return {
id,
ws,
remoteAddr: '127.0.0.1',
openedAt: new Date(),
lastSeenAt: new Date(),
user: {
id: 'user-ada60b3d',
email: 'test@example.com',
role: null,
first_name: 'Test',
last_name: 'User',
},
cookieHeader: 'session=valid',
sentMessages,
};
}
function makeAllowedAuthzClient(): AuthzClient {
return {
canAccessEvent: vi.fn().mockResolvedValue({ allowed: true }),
};
}
function makeForbiddenAuthzClient(): AuthzClient {
return {
canAccessEvent: vi.fn().mockResolvedValue({ allowed: false, reason: 'forbidden' }),
};
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
describe('createSubscriptionRegistry', () => {
let metrics: TestMetrics;
beforeEach(() => {
metrics = makeMetrics();
});
it('subscribe to a valid event topic with permitted user → subscribed reply and gauge increment', async () => {
const conn = makeConn();
const registry = createSubscriptionRegistry(
makeAllowedAuthzClient(), makeConfig(), makeSilentLogger(), metrics,
);
await registry.subscribe(conn, EVENT_TOPIC, 'corr-1');
// Should have sent a `subscribed` message.
expect(conn.sentMessages).toHaveLength(1);
const msg = conn.sentMessages[0] as Record<string, unknown>;
expect(msg['type']).toBe('subscribed');
expect(msg['topic']).toBe(EVENT_TOPIC);
expect(msg['id']).toBe('corr-1');
expect(Array.isArray(msg['snapshot'])).toBe(true);
// Gauge should have been updated.
const subGaugeCalls = metrics.observeCalls.filter(
(c) => c.name === 'processor_live_subscriptions',
);
expect(subGaugeCalls.length).toBeGreaterThanOrEqual(1);
expect(subGaugeCalls[subGaugeCalls.length - 1]!.value).toBe(1);
// Success counter.
const successCalls = metrics.incCalls.filter(
(c) => c.name === 'processor_live_subscribe_attempts_total' && c.labels?.['result'] === 'success',
);
expect(successCalls).toHaveLength(1);
});
it('subscribe with forbidden user → error/forbidden reply, no registry change', async () => {
const conn = makeConn();
const registry = createSubscriptionRegistry(
makeForbiddenAuthzClient(), makeConfig(), makeSilentLogger(), metrics,
);
await registry.subscribe(conn, EVENT_TOPIC, 'corr-2');
const msg = conn.sentMessages[0] as Record<string, unknown>;
expect(msg['type']).toBe('error');
expect(msg['code']).toBe('forbidden');
expect(msg['topic']).toBe(EVENT_TOPIC);
expect(msg['id']).toBe('corr-2');
// Gauge should NOT have changed.
const subGaugeCalls = metrics.observeCalls.filter(
(c) => c.name === 'processor_live_subscriptions',
);
// May have been called with 0 for snapshot, but never with a positive value.
const positiveGauge = subGaugeCalls.filter((c) => c.value > 0);
expect(positiveGauge).toHaveLength(0);
// connectionsForTopic should return empty.
expect([...registry.connectionsForTopic(EVENT_TOPIC)]).toHaveLength(0);
});
it('subscribe to device:<imei> → error/unknown-topic, no registry change', async () => {
const conn = makeConn();
const authz = makeAllowedAuthzClient();
const registry = createSubscriptionRegistry(
authz, makeConfig(), makeSilentLogger(), metrics,
);
await registry.subscribe(conn, 'device:356307042441013', 'corr-3');
const msg = conn.sentMessages[0] as Record<string, unknown>;
expect(msg['type']).toBe('error');
expect(msg['code']).toBe('unknown-topic');
// Authz client should NOT have been called.
expect(vi.mocked(authz.canAccessEvent)).not.toHaveBeenCalled();
});
it('subscribe twice to the same topic → idempotent (single subscription, subscribed each call)', async () => {
const conn = makeConn();
const registry = createSubscriptionRegistry(
makeAllowedAuthzClient(), makeConfig(), makeSilentLogger(), metrics,
);
await registry.subscribe(conn, EVENT_TOPIC);
await registry.subscribe(conn, EVENT_TOPIC); // second call
// Both calls send `subscribed`.
expect(conn.sentMessages).toHaveLength(2);
const msgs = conn.sentMessages as Array<Record<string, unknown>>;
expect(msgs[0]!['type']).toBe('subscribed');
expect(msgs[1]!['type']).toBe('subscribed');
// Gauge should only count once.
const finalGaugeCalls = metrics.observeCalls.filter(
(c) => c.name === 'processor_live_subscriptions',
);
// Last value should be 1, not 2.
expect(finalGaugeCalls[finalGaugeCalls.length - 1]!.value).toBe(1);
// connectionsForTopic should have exactly one connection.
expect([...registry.connectionsForTopic(EVENT_TOPIC)]).toHaveLength(1);
});
it('unsubscribe from a subscribed topic → unsubscribed reply and gauge decrement', async () => {
const conn = makeConn();
const registry = createSubscriptionRegistry(
makeAllowedAuthzClient(), makeConfig(), makeSilentLogger(), metrics,
);
await registry.subscribe(conn, EVENT_TOPIC);
registry.unsubscribe(conn, EVENT_TOPIC, 'corr-4');
const msgs = conn.sentMessages as Array<Record<string, unknown>>;
expect(msgs[1]!['type']).toBe('unsubscribed');
expect(msgs[1]!['topic']).toBe(EVENT_TOPIC);
expect(msgs[1]!['id']).toBe('corr-4');
// Gauge should be back at 0.
const finalGaugeCalls = metrics.observeCalls.filter(
(c) => c.name === 'processor_live_subscriptions',
);
expect(finalGaugeCalls[finalGaugeCalls.length - 1]!.value).toBe(0);
});
it('unsubscribe from a topic not subscribed to → unsubscribed reply (idempotent), gauge unchanged', async () => {
const conn = makeConn();
const registry = createSubscriptionRegistry(
makeAllowedAuthzClient(), makeConfig(), makeSilentLogger(), metrics,
);
// Unsubscribe without ever subscribing.
registry.unsubscribe(conn, EVENT_TOPIC, 'corr-5');
const msg = conn.sentMessages[0] as Record<string, unknown>;
expect(msg['type']).toBe('unsubscribed');
// Gauge should still be 0 (not go negative).
const finalGaugeCalls = metrics.observeCalls.filter(
(c) => c.name === 'processor_live_subscriptions',
);
const values = finalGaugeCalls.map((c) => c.value);
expect(values.every((v) => v >= 0)).toBe(true);
});
it('onConnectionClose removes all subscriptions; gauge returns to 0', async () => {
const conn = makeConn();
const registry = createSubscriptionRegistry(
makeAllowedAuthzClient(), makeConfig(), makeSilentLogger(), metrics,
);
await registry.subscribe(conn, EVENT_TOPIC);
await registry.subscribe(conn, `event:f6114c7e-1e94-488a-93c3-41060fcb06bc`);
registry.onConnectionClose(conn);
// Gauge should be 0.
const finalGaugeCalls = metrics.observeCalls.filter(
(c) => c.name === 'processor_live_subscriptions',
);
expect(finalGaugeCalls[finalGaugeCalls.length - 1]!.value).toBe(0);
// connectionsForTopic should be empty for both topics.
expect([...registry.connectionsForTopic(EVENT_TOPIC)]).toHaveLength(0);
});
it('connectionsForTopic returns only connections subscribed to that topic', async () => {
const conn1 = makeConn('conn-1');
const conn2 = makeConn('conn-2');
const otherTopic = 'event:f6114c7e-1e94-488a-93c3-41060fcb06bc';
const registry = createSubscriptionRegistry(
makeAllowedAuthzClient(), makeConfig(), makeSilentLogger(), metrics,
);
await registry.subscribe(conn1, EVENT_TOPIC);
await registry.subscribe(conn2, EVENT_TOPIC);
await registry.subscribe(conn1, otherTopic);
const connsForEvent = [...registry.connectionsForTopic(EVENT_TOPIC)];
expect(connsForEvent).toHaveLength(2);
expect(connsForEvent.map((c) => c.id).sort()).toEqual(['conn-1', 'conn-2'].sort());
const connsForOther = [...registry.connectionsForTopic(otherTopic)];
expect(connsForOther).toHaveLength(1);
expect(connsForOther[0]!.id).toBe('conn-1');
});
it('stats() returns correct counts', async () => {
const conn1 = makeConn('conn-1');
const conn2 = makeConn('conn-2');
const topic2 = 'event:f6114c7e-1e94-488a-93c3-41060fcb06bc';
const registry = createSubscriptionRegistry(
makeAllowedAuthzClient(), makeConfig(), makeSilentLogger(), metrics,
);
await registry.subscribe(conn1, EVENT_TOPIC);
await registry.subscribe(conn2, EVENT_TOPIC);
await registry.subscribe(conn1, topic2);
const s = registry.stats();
expect(s.subscriptions).toBe(3);
expect(s.topics).toBe(2);
});
});
+224
View File
@@ -0,0 +1,224 @@
/**
* Unit tests for src/live/snapshot.ts — snapshot provider.
*
* All Postgres I/O is mocked. The pool.query mock captures SQL and params so
* tests can assert the query is parameterized correctly.
*
* Covers (spec: task 1.5.5):
* 1. Three devices in event, two have non-faulty positions — two entries returned.
* 2. Event with no entry_devices rows — pool returns empty rows — empty array.
* 3. Positions with faulty=true are excluded from results (WHERE faulty=false
* is in the SQL; mock only returns non-faulty rows, mimicking Postgres).
* 4. Returns most recent non-faulty position per device (DISTINCT ON semantics;
* mock returns single rows as Postgres DISTINCT ON would).
* 5. ts returned as Date is converted to epoch ms in the output.
* 6. speed > 0 → included; speed = 0 → omitted.
* 7. angle > 0 → included as course; angle = 0 → omitted.
* 8. Metrics are observed (latency and snapshot size).
*/
import { describe, it, expect, vi } from 'vitest';
import type { Logger } from 'pino';
import type { Pool } from 'pg';
import type { Metrics } from '../src/shared/types.js';
import { createSnapshotProvider } from '../src/live/snapshot.js';
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
function makeSilentLogger(): Logger {
return {
debug: vi.fn(),
info: vi.fn(),
warn: vi.fn(),
error: vi.fn(),
fatal: vi.fn(),
trace: vi.fn(),
child: vi.fn().mockReturnThis(),
level: 'silent',
silent: vi.fn(),
} as unknown as Logger;
}
type RecordedMetrics = Metrics & {
incCalls: Array<{ name: string }>;
observeCalls: Array<{ name: string; value: number }>;
};
function makeMetrics(): RecordedMetrics {
const incCalls: Array<{ name: string }> = [];
const observeCalls: Array<{ name: string; value: number }> = [];
return {
incCalls,
observeCalls,
inc(name) { incCalls.push({ name }); },
observe(name, value) { observeCalls.push({ name, value }); },
};
}
/**
* Snapshot row shape returned by node-postgres (ts is a Date object).
*/
type SnapshotRow = {
device_id: string;
latitude: number;
longitude: number;
ts: Date;
speed: number;
angle: number;
};
/**
* Creates a mock pg.Pool whose query() returns the given rows.
*/
function makeMockPool(rows: SnapshotRow[]): {
pool: Pool;
queryCalls: Array<{ sql: string; params: unknown[] }>;
} {
const queryCalls: Array<{ sql: string; params: unknown[] }> = [];
const query = vi.fn(async (sql: string, params: unknown[] = []) => {
queryCalls.push({ sql, params });
return { rows };
});
return { pool: { query } as unknown as Pool, queryCalls };
}
/**
* Creates a mock pool that throws on query().
*/
function makeErrorPool(error: Error): Pool {
return {
query: vi.fn().mockRejectedValue(error),
} as unknown as Pool;
}
const EVENT_ID = 'aaa00000-0000-0000-0000-000000000001';
const TS_A = new Date('2025-06-01T10:00:00.000Z');
const TS_B = new Date('2025-06-01T11:00:00.000Z');
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
describe('createSnapshotProvider.forEvent', () => {
it('returns one entry per device when each has a non-faulty position', async () => {
const rows: SnapshotRow[] = [
{ device_id: 'IMEI001', latitude: 41.33, longitude: 19.83, ts: TS_A, speed: 60, angle: 90 },
{ device_id: 'IMEI002', latitude: 41.34, longitude: 19.84, ts: TS_B, speed: 0, angle: 0 },
];
const { pool } = makeMockPool(rows);
const provider = createSnapshotProvider(pool, makeSilentLogger(), makeMetrics());
const result = await provider.forEvent(EVENT_ID);
expect(result).toHaveLength(2);
const entry1 = result.find((e) => e.deviceId === 'IMEI001');
expect(entry1).toBeDefined();
expect(entry1!.lat).toBe(41.33);
expect(entry1!.lon).toBe(19.83);
expect(entry1!.ts).toBe(TS_A.getTime());
expect(entry1!.speed).toBe(60); // speed > 0 → included
expect(entry1!.course).toBe(90); // angle > 0 → included as course
const entry2 = result.find((e) => e.deviceId === 'IMEI002');
expect(entry2).toBeDefined();
expect(entry2!.speed).toBeUndefined(); // speed = 0 → omitted
expect(entry2!.course).toBeUndefined(); // angle = 0 → omitted
});
it('returns an empty array when the event has no registered devices', async () => {
const { pool } = makeMockPool([]);
const provider = createSnapshotProvider(pool, makeSilentLogger(), makeMetrics());
const result = await provider.forEvent(EVENT_ID);
expect(result).toEqual([]);
});
it('excludes faulty positions — returns only non-faulty positions', async () => {
// The Postgres query includes WHERE faulty=false; the mock returns what
// Postgres would: only IMEI001 has a non-faulty position, IMEI002 does not.
const rows: SnapshotRow[] = [
{ device_id: 'IMEI001', latitude: 41.33, longitude: 19.83, ts: TS_A, speed: 30, angle: 45 },
// IMEI002 has only faulty positions → Postgres returns no row for it
];
const { pool, queryCalls } = makeMockPool(rows);
const provider = createSnapshotProvider(pool, makeSilentLogger(), makeMetrics());
const result = await provider.forEvent(EVENT_ID);
expect(result).toHaveLength(1);
expect(result[0]!.deviceId).toBe('IMEI001');
// Verify the SQL contains the faulty filter
expect(queryCalls[0]!.sql).toContain('faulty = false');
});
it('returns the most recent non-faulty position per device (DISTINCT ON semantics)', async () => {
// Postgres DISTINCT ON (p.device_id) ORDER BY p.device_id, p.ts DESC returns
// one row per device — the one with the highest ts. The mock simulates this.
const rows: SnapshotRow[] = [
// IMEI001: Postgres selected the row with TS_B (more recent)
{
device_id: 'IMEI001',
latitude: 41.50,
longitude: 19.90,
ts: TS_B, // most recent
speed: 50,
angle: 0,
},
];
const { pool } = makeMockPool(rows);
const provider = createSnapshotProvider(pool, makeSilentLogger(), makeMetrics());
const result = await provider.forEvent(EVENT_ID);
expect(result).toHaveLength(1);
expect(result[0]!.ts).toBe(TS_B.getTime()); // epoch ms of the most recent position
});
it('passes eventId as a parameterized query argument', async () => {
const { pool, queryCalls } = makeMockPool([]);
const provider = createSnapshotProvider(pool, makeSilentLogger(), makeMetrics());
await provider.forEvent(EVENT_ID);
expect(queryCalls).toHaveLength(1);
expect(queryCalls[0]!.params).toEqual([EVENT_ID]);
});
it('observes snapshot query latency and snapshot size metrics', async () => {
const rows: SnapshotRow[] = [
{ device_id: 'IMEI001', latitude: 41.33, longitude: 19.83, ts: TS_A, speed: 10, angle: 5 },
{ device_id: 'IMEI002', latitude: 41.34, longitude: 19.84, ts: TS_B, speed: 0, angle: 0 },
];
const { pool } = makeMockPool(rows);
const metrics = makeMetrics();
const provider = createSnapshotProvider(pool, makeSilentLogger(), metrics);
await provider.forEvent(EVENT_ID);
const latency = metrics.observeCalls.find(
(c) => c.name === 'processor_live_snapshot_query_latency_ms',
);
expect(latency).toBeDefined();
expect(latency!.value).toBeGreaterThanOrEqual(0);
const size = metrics.observeCalls.find(
(c) => c.name === 'processor_live_snapshot_size',
);
expect(size).toBeDefined();
expect(size!.value).toBe(2);
});
it('propagates Postgres errors (registry.fetchSnapshot catches them)', async () => {
const pool = makeErrorPool(new Error('connection refused'));
const provider = createSnapshotProvider(pool, makeSilentLogger(), makeMetrics());
// snapshot.ts does NOT catch errors — registry.ts's fetchSnapshot does.
// This ensures the error propagates cleanly.
await expect(provider.forEvent(EVENT_ID)).rejects.toThrow('connection refused');
});
});
+690
View File
@@ -0,0 +1,690 @@
/**
* Integration test: live broadcast pipeline end-to-end.
*
* Spins up Redis 7-alpine + TimescaleDB-HA containers, starts an HTTP server
* impersonating Directus (/users/me + /items/events/:id), boots the full live
* broadcast pipeline (auth, registry, snapshot, broadcast consumer), and verifies
* the WebSocket protocol from the perspective of a real WS client.
*
* Skip-on-no-Docker: same pattern as pipeline.integration.test.ts.
* Each `it` block has an explicit `if (!dockerAvailable) return` guard.
*
* Tests:
* 1. Happy path: subscribe → snapshot with seeded positions → live position frame.
* 2. Auth rejection: connect without cookie → HTTP 401.
* 3. Forbidden subscription: valid user, unauthorized event → error/forbidden.
* 4. Multi-client fan-out: two clients subscribed → both receive the position.
* 5. Orphan position: device not in entry_devices → no WS frame.
* 6. Faulty snapshot exclusion: faulty position is excluded; next-best is returned.
*/
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import { GenericContainer, type StartedTestContainer, Wait } from 'testcontainers';
import { WebSocket } from 'ws';
import * as fs from 'node:fs/promises';
import * as path from 'node:path';
import * as http from 'node:http';
import type { Redis } from 'ioredis';
import type pg from 'pg';
import { vi } from 'vitest';
import type { Logger } from 'pino';
import type { Config } from '../src/config/load.js';
import type { Position } from '../src/shared/types.js';
import { createPool, connectWithRetry } from '../src/db/pool.js';
import { runMigrations } from '../src/db/migrate.js';
import { connectRedis } from '../src/core/consumer.js';
import { createMetrics } from '../src/observability/metrics.js';
import { createDeviceStateStore } from '../src/core/state.js';
import { createWriter } from '../src/core/writer.js';
import { createConsumer, ensureConsumerGroup } from '../src/core/consumer.js';
import { createLiveServer } from '../src/live/server.js';
import { createAuthClient } from '../src/live/auth.js';
import { createAuthzClient } from '../src/live/authz.js';
import { createSubscriptionRegistry } from '../src/live/registry.js';
import { createSnapshotProvider } from '../src/live/snapshot.js';
import { createBroadcastConsumer } from '../src/live/broadcast.js';
import { createDeviceEventMap } from '../src/live/device-event-map.js';
import { createDirectusStub } from './helpers/directus-stub.js';
import type { FakeUser } from './helpers/directus-stub.js';
import type { ConsumedRecord } from '../src/core/consumer.js';
import type { LiveConnection } from '../src/live/server.js';
import type { InboundMessage } from '../src/live/protocol.js';
import type { AddressInfo } from 'node:net';
import type { SubscribedMessage, PositionMessage, ErrorMessage } from '../src/live/protocol.js';
// ---------------------------------------------------------------------------
// Constants
// ---------------------------------------------------------------------------
const STREAM = 'telemetry:teltonika';
const GROUP = 'processor';
const BROADCAST_GROUP_PREFIX = 'live-broadcast';
const EVENT_ID = 'ee000000-0000-0000-0000-000000000001';
const OTHER_EVENT_ID = 'ee000000-0000-0000-0000-000000000002';
const ENTRY_ID = 'aa000000-0000-0000-0000-000000000001';
const DEVICE_1 = '111111111111111'; // IMEI
const DEVICE_2 = '222222222222222'; // IMEI
const DEVICE_ORPHAN = '999999999999999'; // not in entry_devices
const USER_A: FakeUser = {
id: 'user-aaaa-0000-0000-0000-000000000001',
email: 'user-a@test.com',
role: null,
first_name: 'User',
last_name: 'A',
};
const USER_B: FakeUser = {
id: 'user-bbbb-0000-0000-0000-000000000002',
email: 'user-b@test.com',
role: null,
first_name: 'User',
last_name: 'B',
};
const COOKIE_A = 'session=valid-user-a';
const COOKIE_B = 'session=valid-user-b';
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
function makeSilentLogger(): Logger {
return {
debug: vi.fn(),
info: vi.fn(),
warn: vi.fn(),
error: vi.fn(),
fatal: vi.fn(),
trace: vi.fn(),
child: vi.fn().mockReturnThis(),
level: 'silent',
silent: vi.fn(),
} as unknown as Logger;
}
function makeConfig(overrides: Partial<Config> = {}): Config {
return {
NODE_ENV: 'test',
INSTANCE_ID: 'test-live-integration',
LOG_LEVEL: 'silent',
REDIS_URL: 'redis://localhost:6379',
POSTGRES_URL: 'postgres://postgres:postgres@localhost:5432/trm',
REDIS_TELEMETRY_STREAM: STREAM,
REDIS_CONSUMER_GROUP: GROUP,
REDIS_CONSUMER_NAME: 'test-consumer',
METRICS_PORT: 0,
BATCH_SIZE: 100,
BATCH_BLOCK_MS: 500,
WRITE_BATCH_SIZE: 50,
DEVICE_STATE_LRU_CAP: 10_000,
LIVE_WS_PORT: 0, // OS-assigned; overridden below with the actual port
LIVE_WS_HOST: '127.0.0.1',
LIVE_WS_PING_INTERVAL_MS: 60_000,
LIVE_WS_DRAIN_TIMEOUT_MS: 2_000,
LIVE_WS_BACKPRESSURE_THRESHOLD_BYTES: 1_048_576,
DIRECTUS_BASE_URL: 'http://localhost:8055', // overridden below
DIRECTUS_AUTH_TIMEOUT_MS: 5_000,
DIRECTUS_AUTHZ_TIMEOUT_MS: 5_000,
LIVE_BROADCAST_GROUP_PREFIX: BROADCAST_GROUP_PREFIX,
LIVE_BROADCAST_BATCH_SIZE: 100,
LIVE_BROADCAST_BATCH_BLOCK_MS: 500,
LIVE_DEVICE_EVENT_REFRESH_MS: 5_000, // faster for tests
...overrides,
};
}
/**
* Serializes a Position into the flat field list for XADD.
* Mirrors tcp-ingestion's serializePosition format.
*/
function buildXaddFields(position: Position): string[] {
function jsonReplacer(_key: string, value: unknown): unknown {
if (typeof value === 'bigint') return { __bigint: value.toString() };
if (value instanceof Uint8Array) return { __buffer_b64: Buffer.from(value).toString('base64') };
if (value instanceof Date) return value.toISOString();
return value;
}
return [
'ts', position.timestamp.toISOString(),
'device_id', position.device_id,
'codec', '8',
'payload', JSON.stringify(position, jsonReplacer),
];
}
/**
* Waits for the next WS message matching `predicate`, with a timeout.
*/
async function waitForMessage<T>(
ws: WebSocket,
predicate: (msg: unknown) => msg is T,
timeoutMs = 5_000,
): Promise<T> {
return new Promise<T>((resolve, reject) => {
const timer = setTimeout(
() => reject(new Error(`Timeout after ${timeoutMs}ms waiting for matching WS message`)),
timeoutMs,
);
const handler = (data: Buffer | string): void => {
const msg: unknown = JSON.parse(data.toString());
if (predicate(msg)) {
clearTimeout(timer);
ws.off('message', handler);
resolve(msg);
}
};
ws.on('message', handler);
});
}
function isSubscribedMessage(msg: unknown): msg is SubscribedMessage {
return typeof msg === 'object' && msg !== null && (msg as Record<string, unknown>)['type'] === 'subscribed';
}
function isPositionMessage(msg: unknown): msg is PositionMessage {
return typeof msg === 'object' && msg !== null && (msg as Record<string, unknown>)['type'] === 'position';
}
function isErrorMessage(msg: unknown): msg is ErrorMessage {
return typeof msg === 'object' && msg !== null && (msg as Record<string, unknown>)['type'] === 'error';
}
/**
* Opens a WebSocket client and waits for the connection to be established.
*/
async function openClient(wsUrl: string, cookie?: string): Promise<WebSocket> {
return new Promise<WebSocket>((resolve, reject) => {
const ws = new WebSocket(wsUrl, {
headers: cookie ? { cookie } : undefined,
});
ws.once('open', () => resolve(ws));
ws.once('error', (err) => reject(err));
ws.once('unexpected-response', (_req, res) => {
reject(new Error(`WS upgrade rejected with HTTP ${res.statusCode}`));
});
});
}
// ---------------------------------------------------------------------------
// Test fixture — seed data
// ---------------------------------------------------------------------------
async function seedDatabase(pool: pg.Pool): Promise<void> {
// events
await pool.query(`INSERT INTO events (id) VALUES ($1), ($2)`, [EVENT_ID, OTHER_EVENT_ID]);
// entries — DEVICE_1 and DEVICE_2 are registered to EVENT_ID
await pool.query(
`INSERT INTO entries (id, event_id) VALUES ($1, $2)`,
[ENTRY_ID, EVENT_ID],
);
// entry_devices — Phase 1 uses IMEI as device_id
await pool.query(
`INSERT INTO entry_devices (id, entry_id, device_id) VALUES
(gen_random_uuid(), $1, $2),
(gen_random_uuid(), $1, $3)`,
[ENTRY_ID, DEVICE_1, DEVICE_2],
);
// positions for DEVICE_1 (two: one non-faulty, one faulty)
await pool.query(
`INSERT INTO positions (device_id, ts, latitude, longitude, altitude, angle, speed, satellites, priority, codec, attributes)
VALUES
($1, '2026-05-01T10:00:00Z', 41.33, 19.83, 50, 90, 60, 8, 0, '8', '{}'),
($1, '2026-05-01T09:00:00Z', 41.30, 19.80, 50, 0, 0, 8, 0, '8', '{}')`,
[DEVICE_1],
);
// positions for DEVICE_2 (one non-faulty)
await pool.query(
`INSERT INTO positions (device_id, ts, latitude, longitude, altitude, angle, speed, satellites, priority, codec, attributes)
VALUES ($1, '2026-05-01T10:00:00Z', 41.34, 19.84, 50, 0, 0, 8, 0, '8', '{}')`,
[DEVICE_2],
);
}
// ---------------------------------------------------------------------------
// Container and pipeline lifecycle
// ---------------------------------------------------------------------------
let redisContainer: StartedTestContainer | null = null;
let pgContainer: StartedTestContainer | null = null;
let redisClientXadd: Redis | null = null;
let pgPool: pg.Pool | null = null;
let wsUrl = '';
let liveServer: { start: () => Promise<void>; stop: () => Promise<void> } | null = null;
let broadcastConsumer: { start: () => Promise<void>; stop: () => Promise<void> } | null = null;
let durableConsumer: { start: () => Promise<void>; stop: () => Promise<void> } | null = null;
let directusStub: { url: string; close: () => Promise<void> } | null = null;
let metricsServer: http.Server | null = null;
let dockerAvailable = true;
beforeAll(async () => {
// --- Step 1: Redis container -----------------------------------------------
try {
redisContainer = await new GenericContainer('redis:7-alpine')
.withExposedPorts(6379)
.withWaitStrategy(Wait.forLogMessage('Ready to accept connections'))
.start();
} catch {
console.warn('[live.integration.test] Docker not available — skipping live integration tests');
dockerAvailable = false;
return;
}
// --- Step 2: TimescaleDB container -----------------------------------------
try {
pgContainer = await new GenericContainer('timescale/timescaledb-ha:pg16.6-ts2.17.2-all')
.withExposedPorts(5432)
.withEnvironment({
POSTGRES_USER: 'postgres',
POSTGRES_PASSWORD: 'postgres',
POSTGRES_DB: 'trm',
})
.withWaitStrategy(Wait.forLogMessage('database system is ready to accept connections', 2))
.start();
} catch (err) {
console.warn(`[live.integration.test] Failed to start TimescaleDB: ${String(err)} — skipping`);
dockerAvailable = false;
await redisContainer?.stop().catch(() => {});
redisContainer = null;
return;
}
const redisHost = redisContainer.getHost();
const redisPort = redisContainer.getMappedPort(6379);
const pgHost = pgContainer.getHost();
const pgPort = pgContainer.getMappedPort(5432);
const redisUrl = `redis://${redisHost}:${redisPort}`;
const postgresUrl = `postgres://postgres:postgres@${pgHost}:${pgPort}/trm`;
const logger = makeSilentLogger();
// --- Step 3: Directus stub -------------------------------------------------
directusStub = await createDirectusStub({
allowedCookieToUser: new Map([
[COOKIE_A, USER_A],
[COOKIE_B, USER_B],
]),
allowedEvents: new Map([
[USER_A.id, new Set([EVENT_ID])], // user A can access EVENT_ID only
[USER_B.id, new Set([OTHER_EVENT_ID])], // user B cannot access EVENT_ID
]),
});
// --- Step 4: Redis client for XADD in tests --------------------------------
const { default: IRedis } = await import('ioredis');
redisClientXadd = new IRedis(redisUrl, {
enableOfflineQueue: false,
lazyConnect: true,
maxRetriesPerRequest: 0,
});
await redisClientXadd.connect();
// --- Step 5: Postgres pool, migrations, test schema, seed ------------------
pgPool = createPool(postgresUrl);
await connectWithRetry(pgPool, logger);
await runMigrations(pgPool, logger);
// Load the test-only schema (entry_devices, entries, events simplified tables).
const fixtureSQL = await fs.readFile(
path.join(import.meta.dirname ?? __dirname, 'fixtures', 'test-schema.sql'),
'utf-8',
);
await pgPool.query(fixtureSQL);
await seedDatabase(pgPool);
// --- Step 6: Wire live broadcast pipeline ----------------------------------
const config = makeConfig({
REDIS_URL: redisUrl,
POSTGRES_URL: postgresUrl,
DIRECTUS_BASE_URL: directusStub.url,
LIVE_WS_PORT: 0, // OS-assigned
});
const metrics = createMetrics();
// Live server — bind to OS-assigned port.
const authClient = createAuthClient(config, logger, metrics);
const authzClient = createAuthzClient(config, logger, metrics);
const snapshotProvider = createSnapshotProvider(pgPool, logger, metrics);
const registry = createSubscriptionRegistry(authzClient, config, logger, metrics, snapshotProvider);
const messageHandler = async (
conn: LiveConnection,
message: InboundMessage,
): Promise<void> => {
if (message.type === 'subscribe') {
await registry.subscribe(conn, message.topic, message.id);
} else if (message.type === 'unsubscribe') {
registry.unsubscribe(conn, message.topic, message.id);
}
};
liveServer = createLiveServer(config, logger, metrics, messageHandler, (conn) => {
registry.onConnectionClose(conn);
}, authClient);
await liveServer.start();
// Get the actual bound port (LIVE_WS_PORT=0 means OS-assigned).
// createLiveServer stores the server internally; we need to get the port.
// The server is exposed via a dedicated port-query approach — use a fresh
// HTTP request to /healthz on the WS server's port to discover it.
// Actually, createLiveServer returns a LiveServer with a bound http.Server.
// We can't directly get the port from LiveServer without reading it.
// Workaround: bind to a fixed free port instead of 0.
// Re-create with a specific free port discovered via a probe server.
await liveServer.stop();
// Find a free port.
const wsPort = await new Promise<number>((resolve) => {
const probe = http.createServer();
probe.listen(0, '127.0.0.1', () => {
const port = (probe.address() as AddressInfo).port;
probe.close(() => resolve(port));
});
});
const configWithPort = makeConfig({
REDIS_URL: redisUrl,
POSTGRES_URL: postgresUrl,
DIRECTUS_BASE_URL: directusStub.url,
LIVE_WS_PORT: wsPort,
});
liveServer = createLiveServer(configWithPort, logger, metrics, messageHandler, (conn) => {
registry.onConnectionClose(conn);
}, authClient);
await liveServer.start();
wsUrl = `ws://127.0.0.1:${wsPort}`;
// Device event map (uses test-seeded entry_devices).
const deviceEventMap = createDeviceEventMap(pgPool, configWithPort, logger, metrics);
await deviceEventMap.start();
// Broadcast consumer (live fan-out).
const broadcastRedis = await connectRedis(redisUrl, logger);
broadcastConsumer = createBroadcastConsumer(
broadcastRedis, registry, deviceEventMap, configWithPort, logger, metrics,
);
await broadcastConsumer.start();
// Durable-write consumer (keeps the stream moving; acks records so they
// don't pile up in the broadcast group's PEL).
const state = createDeviceStateStore(configWithPort, logger, metrics);
const writer = createWriter(pgPool, configWithPort, logger, metrics);
await ensureConsumerGroup(redisClientXadd, STREAM, GROUP, logger);
const sink = async (records: ConsumedRecord[]): Promise<string[]> => {
for (const record of records) state.update(record.position);
const results = await writer.write(records);
return results
.filter((r) => r.status === 'inserted' || r.status === 'duplicate')
.map((r) => r.id);
};
const consumerRedis = await connectRedis(redisUrl, logger);
durableConsumer = createConsumer(consumerRedis, configWithPort, logger, metrics, sink);
await durableConsumer.start();
// Start a dummy metrics server (needed to avoid process.exit in GracefulShutdown
// patterns; not used by the test directly).
metricsServer = http.createServer((_req, res) => res.writeHead(200).end('ok'));
metricsServer.listen(0, '127.0.0.1');
}, 120_000);
afterAll(async () => {
await durableConsumer?.stop().catch(() => {});
await broadcastConsumer?.stop().catch(() => {});
await liveServer?.stop().catch(() => {});
await redisClientXadd?.quit().catch(() => {});
await pgPool?.end().catch(() => {});
await directusStub?.close().catch(() => {});
await new Promise<void>((res) => (metricsServer?.close(() => res()) ?? res()));
await redisContainer?.stop().catch(() => {});
await pgContainer?.stop().catch(() => {});
}, 60_000);
// ---------------------------------------------------------------------------
// Integration tests
// ---------------------------------------------------------------------------
describe('live broadcast integration', () => {
// -------------------------------------------------------------------------
// Test 1 — Happy path: subscribe → snapshot + live position
// -------------------------------------------------------------------------
it('subscribes to an event, receives snapshot, then receives live position', async () => {
if (!dockerAvailable) {
console.warn('[live.integration.test] skipping test 1: Docker not available');
return;
}
const ws = await openClient(wsUrl, COOKIE_A);
try {
// Subscribe to the seeded event.
ws.send(JSON.stringify({ type: 'subscribe', topic: `event:${EVENT_ID}`, id: 'req-1' }));
// Expect `subscribed` with a non-empty snapshot (2 devices seeded).
const subscribed = await waitForMessage<SubscribedMessage>(ws, isSubscribedMessage, 5_000);
expect(subscribed.type).toBe('subscribed');
expect(subscribed.topic).toBe(`event:${EVENT_ID}`);
expect(subscribed.id).toBe('req-1');
expect(subscribed.snapshot).toHaveLength(2);
const snap1 = subscribed.snapshot.find((e) => e.deviceId === DEVICE_1);
expect(snap1).toBeDefined();
expect(snap1!.lat).toBeCloseTo(41.33, 2);
expect(snap1!.lon).toBeCloseTo(19.83, 2);
// Publish a new live position for DEVICE_1.
const liveTs = new Date('2026-06-01T12:00:00.000Z');
const position: Position = {
device_id: DEVICE_1,
timestamp: liveTs,
latitude: 41.40,
longitude: 19.90,
altitude: 55,
angle: 45,
speed: 80,
satellites: 10,
priority: 0,
attributes: {},
};
void redisClientXadd!.xadd(STREAM, '*', ...buildXaddFields(position));
// Expect a `position` frame within 5s.
const posMsg = await waitForMessage<PositionMessage>(ws, isPositionMessage, 5_000);
expect(posMsg.type).toBe('position');
expect(posMsg.topic).toBe(`event:${EVENT_ID}`);
expect(posMsg.deviceId).toBe(DEVICE_1);
expect(posMsg.lat).toBeCloseTo(41.40, 2);
expect(posMsg.lon).toBeCloseTo(19.90, 2);
expect(posMsg.ts).toBe(liveTs.getTime());
expect(posMsg.speed).toBe(80);
} finally {
ws.close();
}
}, 30_000);
// -------------------------------------------------------------------------
// Test 2 — Auth rejection: no cookie → HTTP 401
// -------------------------------------------------------------------------
it('rejects WS upgrade with HTTP 401 when no cookie is presented', async () => {
if (!dockerAvailable) {
console.warn('[live.integration.test] skipping test 2: Docker not available');
return;
}
// openClient throws on unexpected-response (non-101 upgrade).
await expect(openClient(wsUrl)).rejects.toThrow();
}, 10_000);
// -------------------------------------------------------------------------
// Test 3 — Forbidden subscription
// -------------------------------------------------------------------------
it('receives error/forbidden when subscribing to an event the user cannot access', async () => {
if (!dockerAvailable) {
console.warn('[live.integration.test] skipping test 3: Docker not available');
return;
}
// USER_B can only access OTHER_EVENT_ID, not EVENT_ID.
const ws = await openClient(wsUrl, COOKIE_B);
try {
ws.send(JSON.stringify({ type: 'subscribe', topic: `event:${EVENT_ID}` }));
const errorMsg = await waitForMessage<ErrorMessage>(ws, isErrorMessage, 5_000);
expect(errorMsg.type).toBe('error');
expect(errorMsg.code).toBe('forbidden');
expect(errorMsg.topic).toBe(`event:${EVENT_ID}`);
} finally {
ws.close();
}
}, 10_000);
// -------------------------------------------------------------------------
// Test 4 — Multi-client fan-out
// -------------------------------------------------------------------------
it('delivers a live position to all subscribed clients on the same event', async () => {
if (!dockerAvailable) {
console.warn('[live.integration.test] skipping test 4: Docker not available');
return;
}
const ws1 = await openClient(wsUrl, COOKIE_A);
const ws2 = await openClient(wsUrl, COOKIE_A);
try {
// Subscribe both clients to the same event.
ws1.send(JSON.stringify({ type: 'subscribe', topic: `event:${EVENT_ID}` }));
ws2.send(JSON.stringify({ type: 'subscribe', topic: `event:${EVENT_ID}` }));
// Wait for both subscriptions to confirm.
await waitForMessage<SubscribedMessage>(ws1, isSubscribedMessage, 5_000);
await waitForMessage<SubscribedMessage>(ws2, isSubscribedMessage, 5_000);
// Publish a live position for DEVICE_1.
const liveTs = new Date('2026-06-01T13:00:00.000Z');
const position: Position = {
device_id: DEVICE_1,
timestamp: liveTs,
latitude: 41.50,
longitude: 19.95,
altitude: 60,
angle: 0,
speed: 0,
satellites: 10,
priority: 0,
attributes: {},
};
void redisClientXadd!.xadd(STREAM, '*', ...buildXaddFields(position));
// Both clients must receive the position frame.
const [pos1, pos2] = await Promise.all([
waitForMessage<PositionMessage>(ws1, isPositionMessage, 5_000),
waitForMessage<PositionMessage>(ws2, isPositionMessage, 5_000),
]);
expect(pos1.deviceId).toBe(DEVICE_1);
expect(pos2.deviceId).toBe(DEVICE_1);
expect(pos1.ts).toBe(liveTs.getTime());
expect(pos2.ts).toBe(liveTs.getTime());
} finally {
ws1.close();
ws2.close();
}
}, 30_000);
// -------------------------------------------------------------------------
// Test 5 — Orphan position: device not in entry_devices
// -------------------------------------------------------------------------
it('does not deliver a position for an unregistered device; client receives no frame', async () => {
if (!dockerAvailable) {
console.warn('[live.integration.test] skipping test 5: Docker not available');
return;
}
const ws = await openClient(wsUrl, COOKIE_A);
try {
ws.send(JSON.stringify({ type: 'subscribe', topic: `event:${EVENT_ID}` }));
await waitForMessage<SubscribedMessage>(ws, isSubscribedMessage, 5_000);
// Publish a position for DEVICE_ORPHAN (not in entry_devices).
const position: Position = {
device_id: DEVICE_ORPHAN,
timestamp: new Date('2026-06-01T14:00:00.000Z'),
latitude: 42.00,
longitude: 20.00,
altitude: 60,
angle: 0,
speed: 0,
satellites: 8,
priority: 0,
attributes: {},
};
void redisClientXadd!.xadd(STREAM, '*', ...buildXaddFields(position));
// Wait 2s — no position frame should arrive for this orphan device.
const noFrame = await Promise.race([
waitForMessage<PositionMessage>(ws, isPositionMessage, 2_000).then(() => 'received'),
new Promise<'timeout'>((resolve) => setTimeout(() => resolve('timeout'), 2_100)),
]);
expect(noFrame).toBe('timeout');
} finally {
ws.close();
}
}, 15_000);
// -------------------------------------------------------------------------
// Test 6 — Faulty snapshot exclusion
// -------------------------------------------------------------------------
it('excludes faulty positions from the snapshot; uses next-best non-faulty position', async () => {
if (!dockerAvailable || !pgPool) {
console.warn('[live.integration.test] skipping test 6: Docker not available');
return;
}
// Mark DEVICE_1's most recent position (10:00:00) as faulty.
await pgPool.query(
`UPDATE positions SET faulty = true WHERE device_id = $1 AND ts = '2026-05-01T10:00:00Z'`,
[DEVICE_1],
);
try {
const ws = await openClient(wsUrl, COOKIE_A);
try {
ws.send(JSON.stringify({ type: 'subscribe', topic: `event:${EVENT_ID}` }));
const subscribed = await waitForMessage<SubscribedMessage>(ws, isSubscribedMessage, 5_000);
// DEVICE_1's most recent faulty row is excluded; the next non-faulty
// (09:00:00 at lat 41.30) should be returned instead.
const snap1 = subscribed.snapshot.find((e) => e.deviceId === DEVICE_1);
expect(snap1).toBeDefined();
expect(snap1!.lat).toBeCloseTo(41.30, 2);
expect(snap1!.lon).toBeCloseTo(19.80, 2);
} finally {
ws.close();
}
} finally {
// Restore: un-mark the faulty position so it doesn't affect other tests.
await pgPool.query(
`UPDATE positions SET faulty = false WHERE device_id = $1 AND ts = '2026-05-01T10:00:00Z'`,
[DEVICE_1],
);
}
}, 15_000);
});