# Task 1.5.2 — Cookie auth handshake **Phase:** 1.5 — Live broadcast **Status:** ⬜ Not started **Depends on:** 1.5.1 **Wiki refs:** `docs/wiki/synthesis/processor-ws-contract.md` §Auth handshake; `docs/wiki/entities/directus.md`; `docs/wiki/entities/react-spa.md` §Auth pattern ## Goal Authenticate WebSocket connections using the Directus-issued cookie attached to the upgrade request. Validate via a single `/users/me` round-trip to Directus; on success, bind the user identity to the `LiveConnection` for its lifetime; on failure, close with code `4401` before completing the upgrade. After this task, anonymous connections are rejected — only Directus-authenticated users can hold an open WebSocket. ## Deliverables - `src/live/auth.ts` exporting: - `createAuthClient(config, logger): AuthClient` — factory. - `AuthClient` interface: `validate(cookieHeader: string): Promise`. - `type AuthenticatedUser = { id: string; email: string; role: string | null; first_name: string | null; last_name: string | null }` — minimum fields used by the registry (1.5.3) for authorization decisions. - `validate` returns `null` on any failure (network, 401, malformed response). Logs at `warn` with the failure reason. - `src/live/server.ts` updated: - `LiveConnection` gains a `user: AuthenticatedUser` field (no longer optional). - The `'upgrade'` handler validates the cookie *before* calling `wss.handleUpgrade`. On `null`, write a 401 HTTP response on the raw socket and destroy it (this is how `ws` recommends rejecting upgrades cleanly). - On success, pass the validated user through to the `'connection'` handler via `req[USER_KEY]`. - New config keys (zod): - `DIRECTUS_BASE_URL` (default `http://directus:8055`) — where to call `/users/me`. - `DIRECTUS_AUTH_TIMEOUT_MS` (default `5_000`). - New Prometheus metrics: - `processor_live_auth_attempts_total{result}` — `success` / `unauthorized` / `error`. - `processor_live_auth_latency_ms` (histogram). - `test/live-auth.test.ts`: - With a mocked Directus returning 200 + a user payload, `validate` returns the parsed user. - With 401, returns `null` and increments `unauthorized` counter. - With a network error, returns `null` and increments `error` counter (does not throw). - With a 200 but malformed payload (no `id` field), returns `null` and logs at `warn`. - The HTTP timeout is enforced (`AbortController` after `DIRECTUS_AUTH_TIMEOUT_MS`). ## Specification ### Cookie extraction The browser attaches whatever cookies were set on the SPA's origin. Directus's refresh cookie default is named `directus_refresh_token`; the actual session is identified server-side via the access token in the `Authorization` header on REST calls — but for WebSocket upgrades there is no Authorization header, so we forward the cookie and let Directus handle session lookup. ```ts function extractCookieHeader(req: IncomingMessage): string | null { return req.headers.cookie ?? null; } ``` If the header is missing entirely, fail fast — no point calling Directus. ### `/users/me` call ```ts async function validate(cookieHeader: string): Promise { if (!cookieHeader) return null; const controller = new AbortController(); const timer = setTimeout(() => controller.abort(), config.DIRECTUS_AUTH_TIMEOUT_MS); const start = performance.now(); try { const res = await fetch(`${config.DIRECTUS_BASE_URL}/users/me?fields=id,email,role,first_name,last_name`, { method: 'GET', headers: { cookie: cookieHeader }, signal: controller.signal, }); if (res.status === 401 || res.status === 403) { metrics.authAttempts.inc({ result: 'unauthorized' }); return null; } if (!res.ok) { logger.warn({ status: res.status }, 'directus auth call returned non-2xx'); metrics.authAttempts.inc({ result: 'error' }); return null; } const body = await res.json(); const user = AuthenticatedUserSchema.safeParse(body.data); if (!user.success) { logger.warn({ issues: user.error.issues }, 'directus /users/me returned unexpected shape'); metrics.authAttempts.inc({ result: 'error' }); return null; } metrics.authAttempts.inc({ result: 'success' }); return user.data; } catch (err) { if ((err as Error).name === 'AbortError') { logger.warn('directus auth call timed out'); } else { logger.warn({ err }, 'directus auth call failed'); } metrics.authAttempts.inc({ result: 'error' }); return null; } finally { clearTimeout(timer); metrics.authLatency.observe(performance.now() - start); } } ``` Notes: - **Field projection** (`?fields=...`) keeps the response small. The full user record has dozens of fields we don't need. - **Forward the entire cookie header.** Directus may rotate the refresh cookie on this call (it shouldn't on `/users/me`, but be liberal); we ignore any `Set-Cookie` in the response — it's not our cookie to manage. - **No retries.** A failed validation immediately closes the upgrade. The SPA will reconnect, which gives a natural retry. Don't add server-side retry logic — masks bugs and slows down the bad-credential case. ### Rejecting the upgrade `ws` lets you reject by writing directly to the raw socket before `handleUpgrade`: ```ts httpServer.on('upgrade', async (req, socket, head) => { const cookie = extractCookieHeader(req); const user = cookie ? await authClient.validate(cookie) : null; if (!user) { socket.write( 'HTTP/1.1 401 Unauthorized\r\n' + 'Content-Length: 0\r\n' + 'Connection: close\r\n' + '\r\n' ); socket.destroy(); return; } // Stash the user on the request object so the connection handler can pick it up. (req as IncomingMessage & { user: AuthenticatedUser }).user = user; wss.handleUpgrade(req, socket, head, (ws) => { wss.emit('connection', ws, req); }); }); wss.on('connection', (ws, req: IncomingMessage & { user: AuthenticatedUser }) => { const conn: LiveConnection = { id: nanoid(), ws, remoteAddr: req.socket.remoteAddress ?? 'unknown', openedAt: new Date(), lastSeenAt: new Date(), user: req.user, }; // ... rest of connection setup }); ``` ### What `AuthenticatedUser` does and doesn't include Include only fields the registry (1.5.3) and Phase 4 permissions will need: ```ts const AuthenticatedUserSchema = z.object({ id: z.string().uuid(), email: z.string().email().nullable(), role: z.string().uuid().nullable(), // Directus role id, not the `organization_users.role` enum first_name: z.string().nullable(), last_name: z.string().nullable(), }); ``` Don't pull in `directus_users` extension fields or anything specific to the TRM domain — those are queried per-subscription, not per-connection. ### What we don't do (deferred) - **No JWT validation locally.** The simplest path is the round-trip; cache only if the round-trip becomes a bottleneck (it won't at pilot scale). - **No refresh handling.** The cookie's lifetime is the SPA's problem. If it expires mid-connection, server-side state is unaffected; the SPA will reconnect (which re-validates). - **No revocation re-checks.** A user removed from the database mid-session keeps their WebSocket until they disconnect or the server restarts. Phase 4 hardening can add periodic re-validation if needed. ## Acceptance criteria - [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean. - [ ] Connecting without a cookie returns HTTP 401 (visible in `wscat`'s output as a connection rejection with status code). - [ ] Connecting with a stale/invalid cookie returns HTTP 401. - [ ] Connecting with a valid cookie (obtained via Directus's `/auth/login` with `mode: cookie`) succeeds; the connection is logged with the user id. - [ ] `processor_live_auth_attempts_total{result="success"}` increments on a successful upgrade. - [ ] Auth latency p95 < 100ms against a stage-realistic Directus (single `/users/me` call against a warm DB). ## Risks / open questions - **Directus base URL in dev vs stage vs prod.** In dev the SPA might run via Vite proxy at `localhost:5173`, with Directus at `localhost:8055`. The Processor's `DIRECTUS_BASE_URL` should always be the *internal* Compose-network URL (`http://directus:8055`) — that's the path with the lowest latency and no proxy hops. Document this in `.env.example`. - **Cookie scope.** Directus issues the refresh cookie scoped to the public domain (e.g. `Domain=stage.trmtracking.org`). The Processor receives the same cookie because the upgrade request hits the same origin (proxy fronts both). Verify this works end-to-end during the integration test (1.5.6). - **What if `/users/me` returns 200 with `data: null`?** Directus does this when the cookie is well-formed but the session is expired. Treat as `null` user (return `null`, log at `warn`). ## Done Landed in `190254d`. Key deviations from spec: - Added distinction between `data: null` (unauthorized / expired session) and missing `data` key (error / malformed response) — the task spec only mentioned `data: null` but the missing-key case is equally important. - `authClient` is an optional parameter to `createLiveServer` (not required) so the existing unit tests that don't need auth work unchanged. - Used the `satisfies` operator to pass the anonymous user placeholder at the no-auth code path for type safety.