# Task 1.5.3 — Subscription registry & per-event authorization **Phase:** 1.5 — Live broadcast **Status:** ⬜ Not started **Depends on:** 1.5.2 **Wiki refs:** `docs/wiki/synthesis/processor-ws-contract.md` §Subscription model; `docs/wiki/concepts/live-channel-architecture.md` §Authorization flow; `docs/wiki/synthesis/directus-schema-draft.md` ## Goal Handle `subscribe` / `unsubscribe` messages: validate the topic format, authorize the user against the topic's organization, maintain in-memory bidirectional indexes (`connection → topics`, `topic → connections`), and emit the appropriate `subscribed` / `unsubscribed` / `error` responses. Authorization is a single Directus call per subscription; no per-message auth. After this task, a connected client can `subscribe` to an event they have permission for, get an immediate `subscribed` response, and the registry knows which connections want updates for which event. The actual fan-out and snapshot land in 1.5.4 and 1.5.5 respectively — this task just owns the bookkeeping. ## Deliverables - `src/live/registry.ts` exporting: - `createSubscriptionRegistry(authzClient, logger, metrics): SubscriptionRegistry` — factory. - `SubscriptionRegistry` interface: ```ts interface SubscriptionRegistry { subscribe(conn: LiveConnection, topic: string, correlationId?: string): Promise; unsubscribe(conn: LiveConnection, topic: string, correlationId?: string): Promise; onConnectionClose(conn: LiveConnection): void; // remove from all topics connectionsForTopic(topic: string): Iterable; // used by 1.5.4 fan-out topicsForConnection(conn: LiveConnection): Iterable; stats(): { connections: number; topics: number; subscriptions: number }; } ``` - Topic format validator: `event:` is the only accepted shape in v1; anything else returns `error/unknown-topic`. - `src/live/authz.ts` exporting: - `createAuthzClient(config, logger): AuthzClient` — factory. - `AuthzClient.canAccessEvent(user: AuthenticatedUser, eventId: string): Promise` — `{ allowed: true } | { allowed: false; reason: 'forbidden' | 'not-found' | 'error' }`. - `src/live/server.ts` updated: the `onMessage` placeholder from 1.5.1 is replaced with a real router that dispatches `subscribe` / `unsubscribe` to the registry, calls `registry.onConnectionClose` in the `'close'` event handler. - New Prometheus metrics: - `processor_live_subscriptions{instance_id}` (gauge) — current total subscriptions. - `processor_live_subscribe_attempts_total{result}` — `success` / `forbidden` / `not-found` / `unknown-topic` / `error`. - `processor_live_authz_latency_ms` (histogram). - `test/live-registry.test.ts`: - Subscribe to `event:` with a permitted user → `subscribed` reply, registry counts go up. - Subscribe to `event:` with a forbidden user → `error/forbidden` reply, no registry change. - Subscribe to `device:` → `error/unknown-topic`, no registry change. - Subscribe twice to the same topic → idempotent (single subscription, single `subscribed` reply on each call). - Unsubscribe from a topic the connection isn't subscribed to → `unsubscribed` reply (idempotent), no error. - Connection close removes all subscriptions; gauges decrement correctly. - `test/live-authz.test.ts`: - `canAccessEvent` returns `allowed: true` when `/items/events/` returns 200 (Directus enforces RLS via the cookie; if Directus says yes, we say yes). - Returns `allowed: false, reason: 'forbidden'` on 403. - Returns `allowed: false, reason: 'not-found'` on 404. - Returns `allowed: false, reason: 'error'` on network failure or 5xx (does not throw). ## Specification ### Topic parsing ```ts const EventTopicRegex = /^event:([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})$/i; function parseTopic(topic: string): { kind: 'event'; eventId: string } | null { const m = EventTopicRegex.exec(topic); if (m) return { kind: 'event', eventId: m[1] }; return null; // unknown topic shape } ``` Future shapes (`device:`, `entry:`, `org:`) get added here when they're needed. The unknown-topic path returns a clear error rather than silently failing — clients always know if they typed a topic the server doesn't understand. ### Authorization model The simplest correct authorization: **delegate to Directus's REST API with the user's cookie**. If `GET /items/events/` returns 200, the user has access (Directus's RLS already does the org-membership check). If 403, they don't. ```ts async function canAccessEvent(user: AuthenticatedUser, eventId: string): Promise { const start = performance.now(); try { const res = await fetch(`${config.DIRECTUS_BASE_URL}/items/events/${eventId}?fields=id`, { method: 'GET', headers: { cookie: user.cookieHeader }, // see "Carrying the cookie" below signal: AbortSignal.timeout(config.DIRECTUS_AUTHZ_TIMEOUT_MS ?? 5_000), }); if (res.status === 200) return { allowed: true }; if (res.status === 403) return { allowed: false, reason: 'forbidden' }; if (res.status === 404) return { allowed: false, reason: 'not-found' }; return { allowed: false, reason: 'error' }; } catch { return { allowed: false, reason: 'error' }; } finally { metrics.authzLatency.observe(performance.now() - start); } } ``` **Field projection** (`?fields=id`) keeps the response tiny — we don't need the event details, just the access verdict. ### Carrying the cookie The auth handshake (1.5.2) validated the cookie and discarded it. For per-subscription Directus calls we need the original cookie header. Two options: **Option A: Stash on the connection.** When 1.5.2 succeeds, save `cookieHeader` on `LiveConnection`. Trade-off: cookie material lives in process memory for the connection's lifetime. **Option B: Re-fetch via service account.** The Processor has its own credentials; at subscribe time, query as that service account with the user id as a filter. Trade-off: more complex, requires the Processor to have a Directus account with read access to all events. **Pick Option A.** Simpler, more honest (the user's own permissions are the source of truth for authorization), and the cookie is already on this server — we received it at upgrade. Memory cost is negligible (a cookie header is typically 100–500 bytes). Document that `LiveConnection` holds sensitive material and don't log it. Update `LiveConnection` in `server.ts`: ```ts export type LiveConnection = { id: string; ws: WebSocket; remoteAddr: string; openedAt: Date; lastSeenAt: Date; user: AuthenticatedUser; cookieHeader: string; // ← added }; ``` And update 1.5.2's upgrade handler to pass the cookie through. ### Registry data structures ```ts const connectionTopics = new WeakMap>(); // conn → topics const topicConnections = new Map>(); // topic → conns ``` `WeakMap` for `connectionTopics` lets garbage collection clean up if a connection somehow leaks the explicit `onConnectionClose` call. `Set` semantics give idempotent subscribe/unsubscribe for free. ### Subscribe flow ```ts async function subscribe(conn: LiveConnection, topic: string, correlationId?: string) { const parsed = parseTopic(topic); if (!parsed) { sendOutbound(conn, { type: 'error', topic, id: correlationId, code: 'unknown-topic', message: 'Unknown topic format' }); metrics.subscribeAttempts.inc({ result: 'unknown-topic' }); return; } // Idempotent: already subscribed? const existing = connectionTopics.get(conn); if (existing?.has(topic)) { // Re-send subscribed (snapshot will be fetched freshly in 1.5.5). sendOutbound(conn, { type: 'subscribed', topic, id: correlationId, snapshot: [] }); return; } const verdict = await authzClient.canAccessEvent(conn.user, parsed.eventId); if (!verdict.allowed) { sendOutbound(conn, { type: 'error', topic, id: correlationId, code: verdict.reason }); metrics.subscribeAttempts.inc({ result: verdict.reason }); return; } // Insert into both indexes. if (!existing) connectionTopics.set(conn, new Set()); connectionTopics.get(conn)!.add(topic); if (!topicConnections.has(topic)) topicConnections.set(topic, new Set()); topicConnections.get(topic)!.add(conn); metrics.subscriptions.inc(); metrics.subscribeAttempts.inc({ result: 'success' }); // 1.5.5 fills in the snapshot. For now, empty array. sendOutbound(conn, { type: 'subscribed', topic, id: correlationId, snapshot: [] }); } ``` ### Unsubscribe flow ```ts async function unsubscribe(conn: LiveConnection, topic: string, correlationId?: string) { connectionTopics.get(conn)?.delete(topic); const conns = topicConnections.get(topic); if (conns) { conns.delete(conn); if (conns.size === 0) topicConnections.delete(topic); } metrics.subscriptions.dec(); // Always reply, even if not subscribed (idempotent). sendOutbound(conn, { type: 'unsubscribed', topic, id: correlationId }); } ``` ### `onConnectionClose` ```ts function onConnectionClose(conn: LiveConnection) { const topics = connectionTopics.get(conn); if (!topics) return; for (const topic of topics) { const conns = topicConnections.get(topic); if (conns) { conns.delete(conn); if (conns.size === 0) topicConnections.delete(topic); } metrics.subscriptions.dec(); } connectionTopics.delete(conn); } ``` Hooked into the `ws.on('close', ...)` handler in `server.ts`. ## Acceptance criteria - [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean. - [ ] `wscat` flow: connect with a valid cookie → `{"type":"subscribe","topic":"event:"}` → `{"type":"subscribed","topic":"event:","snapshot":[]}`. - [ ] Forbidden flow: same client subscribing to an event in a different org → `{"type":"error","code":"forbidden"}`. - [ ] Unknown topic flow: `{"type":"subscribe","topic":"foo:bar"}` → `{"type":"error","code":"unknown-topic"}`. - [ ] Unsubscribe flow: client gets `unsubscribed` reply; gauge `processor_live_subscriptions` decrements. - [ ] Disconnect cleans up: `processor_live_subscriptions` returns to its pre-connection level after the client disconnects. - [ ] Idempotency: subscribing twice to the same topic doesn't double-count in `processor_live_subscriptions`. ## Risks / open questions - **Authz latency budget.** Each subscribe is one Directus call. At race-start with hundreds of viewers subscribing simultaneously, that's a thundering herd. Pilot scale (≤20 viewers per event) is fine. If we ever see a herd: cache `(userId, eventId) → verdict` for 60s with manual invalidation hooks. Defer until measured. - **What if the user is removed from the org mid-subscription?** Their existing subscriptions keep delivering until they disconnect. Phase 4 hardening can add periodic re-checks. For pilot, "trust the session" is fine. - **Filter subscriptions to the user's own entries vs all in-event?** Race directors want to see everyone; participants might want to see only their own crew. Current spec is "everyone in the event" — Phase 4 permissions can refine. Document that v1 is open within an event. - **Wildcard topics.** Not in scope. If we ever need it, the topic parser is the place to add `event:*` → "every event in the user's orgs." ## Done (Filled in when the task lands.)