Files
processor/.planning/phase-1-5-live-broadcast/02-cookie-auth-handshake.md
T

194 lines
9.2 KiB
Markdown

# Task 1.5.2 — Cookie auth handshake
**Phase:** 1.5 — Live broadcast
**Status:** ⬜ Not started
**Depends on:** 1.5.1
**Wiki refs:** `docs/wiki/synthesis/processor-ws-contract.md` §Auth handshake; `docs/wiki/entities/directus.md`; `docs/wiki/entities/react-spa.md` §Auth pattern
## Goal
Authenticate WebSocket connections using the Directus-issued cookie attached to the upgrade request. Validate via a single `/users/me` round-trip to Directus; on success, bind the user identity to the `LiveConnection` for its lifetime; on failure, close with code `4401` before completing the upgrade.
After this task, anonymous connections are rejected — only Directus-authenticated users can hold an open WebSocket.
## Deliverables
- `src/live/auth.ts` exporting:
- `createAuthClient(config, logger): AuthClient` — factory.
- `AuthClient` interface: `validate(cookieHeader: string): Promise<AuthenticatedUser | null>`.
- `type AuthenticatedUser = { id: string; email: string; role: string | null; first_name: string | null; last_name: string | null }` — minimum fields used by the registry (1.5.3) for authorization decisions.
- `validate` returns `null` on any failure (network, 401, malformed response). Logs at `warn` with the failure reason.
- `src/live/server.ts` updated:
- `LiveConnection` gains a `user: AuthenticatedUser` field (no longer optional).
- The `'upgrade'` handler validates the cookie *before* calling `wss.handleUpgrade`. On `null`, write a 401 HTTP response on the raw socket and destroy it (this is how `ws` recommends rejecting upgrades cleanly).
- On success, pass the validated user through to the `'connection'` handler via `req[USER_KEY]`.
- New config keys (zod):
- `DIRECTUS_BASE_URL` (default `http://directus:8055`) — where to call `/users/me`.
- `DIRECTUS_AUTH_TIMEOUT_MS` (default `5_000`).
- New Prometheus metrics:
- `processor_live_auth_attempts_total{result}``success` / `unauthorized` / `error`.
- `processor_live_auth_latency_ms` (histogram).
- `test/live-auth.test.ts`:
- With a mocked Directus returning 200 + a user payload, `validate` returns the parsed user.
- With 401, returns `null` and increments `unauthorized` counter.
- With a network error, returns `null` and increments `error` counter (does not throw).
- With a 200 but malformed payload (no `id` field), returns `null` and logs at `warn`.
- The HTTP timeout is enforced (`AbortController` after `DIRECTUS_AUTH_TIMEOUT_MS`).
## Specification
### Cookie extraction
The browser attaches whatever cookies were set on the SPA's origin. Directus's refresh cookie default is named `directus_refresh_token`; the actual session is identified server-side via the access token in the `Authorization` header on REST calls — but for WebSocket upgrades there is no Authorization header, so we forward the cookie and let Directus handle session lookup.
```ts
function extractCookieHeader(req: IncomingMessage): string | null {
return req.headers.cookie ?? null;
}
```
If the header is missing entirely, fail fast — no point calling Directus.
### `/users/me` call
```ts
async function validate(cookieHeader: string): Promise<AuthenticatedUser | null> {
if (!cookieHeader) return null;
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), config.DIRECTUS_AUTH_TIMEOUT_MS);
const start = performance.now();
try {
const res = await fetch(`${config.DIRECTUS_BASE_URL}/users/me?fields=id,email,role,first_name,last_name`, {
method: 'GET',
headers: { cookie: cookieHeader },
signal: controller.signal,
});
if (res.status === 401 || res.status === 403) {
metrics.authAttempts.inc({ result: 'unauthorized' });
return null;
}
if (!res.ok) {
logger.warn({ status: res.status }, 'directus auth call returned non-2xx');
metrics.authAttempts.inc({ result: 'error' });
return null;
}
const body = await res.json();
const user = AuthenticatedUserSchema.safeParse(body.data);
if (!user.success) {
logger.warn({ issues: user.error.issues }, 'directus /users/me returned unexpected shape');
metrics.authAttempts.inc({ result: 'error' });
return null;
}
metrics.authAttempts.inc({ result: 'success' });
return user.data;
} catch (err) {
if ((err as Error).name === 'AbortError') {
logger.warn('directus auth call timed out');
} else {
logger.warn({ err }, 'directus auth call failed');
}
metrics.authAttempts.inc({ result: 'error' });
return null;
} finally {
clearTimeout(timer);
metrics.authLatency.observe(performance.now() - start);
}
}
```
Notes:
- **Field projection** (`?fields=...`) keeps the response small. The full user record has dozens of fields we don't need.
- **Forward the entire cookie header.** Directus may rotate the refresh cookie on this call (it shouldn't on `/users/me`, but be liberal); we ignore any `Set-Cookie` in the response — it's not our cookie to manage.
- **No retries.** A failed validation immediately closes the upgrade. The SPA will reconnect, which gives a natural retry. Don't add server-side retry logic — masks bugs and slows down the bad-credential case.
### Rejecting the upgrade
`ws` lets you reject by writing directly to the raw socket before `handleUpgrade`:
```ts
httpServer.on('upgrade', async (req, socket, head) => {
const cookie = extractCookieHeader(req);
const user = cookie ? await authClient.validate(cookie) : null;
if (!user) {
socket.write(
'HTTP/1.1 401 Unauthorized\r\n' +
'Content-Length: 0\r\n' +
'Connection: close\r\n' +
'\r\n'
);
socket.destroy();
return;
}
// Stash the user on the request object so the connection handler can pick it up.
(req as IncomingMessage & { user: AuthenticatedUser }).user = user;
wss.handleUpgrade(req, socket, head, (ws) => {
wss.emit('connection', ws, req);
});
});
wss.on('connection', (ws, req: IncomingMessage & { user: AuthenticatedUser }) => {
const conn: LiveConnection = {
id: nanoid(),
ws,
remoteAddr: req.socket.remoteAddress ?? 'unknown',
openedAt: new Date(),
lastSeenAt: new Date(),
user: req.user,
};
// ... rest of connection setup
});
```
### What `AuthenticatedUser` does and doesn't include
Include only fields the registry (1.5.3) and Phase 4 permissions will need:
```ts
const AuthenticatedUserSchema = z.object({
id: z.string().uuid(),
email: z.string().email().nullable(),
role: z.string().uuid().nullable(), // Directus role id, not the `organization_users.role` enum
first_name: z.string().nullable(),
last_name: z.string().nullable(),
});
```
Don't pull in `directus_users` extension fields or anything specific to the TRM domain — those are queried per-subscription, not per-connection.
### What we don't do (deferred)
- **No JWT validation locally.** The simplest path is the round-trip; cache only if the round-trip becomes a bottleneck (it won't at pilot scale).
- **No refresh handling.** The cookie's lifetime is the SPA's problem. If it expires mid-connection, server-side state is unaffected; the SPA will reconnect (which re-validates).
- **No revocation re-checks.** A user removed from the database mid-session keeps their WebSocket until they disconnect or the server restarts. Phase 4 hardening can add periodic re-validation if needed.
## Acceptance criteria
- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
- [ ] Connecting without a cookie returns HTTP 401 (visible in `wscat`'s output as a connection rejection with status code).
- [ ] Connecting with a stale/invalid cookie returns HTTP 401.
- [ ] Connecting with a valid cookie (obtained via Directus's `/auth/login` with `mode: cookie`) succeeds; the connection is logged with the user id.
- [ ] `processor_live_auth_attempts_total{result="success"}` increments on a successful upgrade.
- [ ] Auth latency p95 < 100ms against a stage-realistic Directus (single `/users/me` call against a warm DB).
## Risks / open questions
- **Directus base URL in dev vs stage vs prod.** In dev the SPA might run via Vite proxy at `localhost:5173`, with Directus at `localhost:8055`. The Processor's `DIRECTUS_BASE_URL` should always be the *internal* Compose-network URL (`http://directus:8055`) — that's the path with the lowest latency and no proxy hops. Document this in `.env.example`.
- **Cookie scope.** Directus issues the refresh cookie scoped to the public domain (e.g. `Domain=stage.trmtracking.org`). The Processor receives the same cookie because the upgrade request hits the same origin (proxy fronts both). Verify this works end-to-end during the integration test (1.5.6).
- **What if `/users/me` returns 200 with `data: null`?** Directus does this when the cookie is well-formed but the session is expired. Treat as `null` user (return `null`, log at `warn`).
## Done
Landed in `190254d`. Key deviations from spec:
- Added distinction between `data: null` (unauthorized / expired session) and missing `data` key (error / malformed response) — the task spec only mentioned `data: null` but the missing-key case is equally important.
- `authClient` is an optional parameter to `createLiveServer` (not required) so the existing unit tests that don't need auth work unchanged.
- Used the `satisfies` operator to pass the anonymous user placeholder at the no-auth code path for type safety.