9.2 KiB
Task 1.5.2 — Cookie auth handshake
Phase: 1.5 — Live broadcast
Status: ⬜ Not started
Depends on: 1.5.1
Wiki refs: docs/wiki/synthesis/processor-ws-contract.md §Auth handshake; docs/wiki/entities/directus.md; docs/wiki/entities/react-spa.md §Auth pattern
Goal
Authenticate WebSocket connections using the Directus-issued cookie attached to the upgrade request. Validate via a single /users/me round-trip to Directus; on success, bind the user identity to the LiveConnection for its lifetime; on failure, close with code 4401 before completing the upgrade.
After this task, anonymous connections are rejected — only Directus-authenticated users can hold an open WebSocket.
Deliverables
src/live/auth.tsexporting:createAuthClient(config, logger): AuthClient— factory.AuthClientinterface:validate(cookieHeader: string): Promise<AuthenticatedUser | null>.type AuthenticatedUser = { id: string; email: string; role: string | null; first_name: string | null; last_name: string | null }— minimum fields used by the registry (1.5.3) for authorization decisions.validatereturnsnullon any failure (network, 401, malformed response). Logs atwarnwith the failure reason.
src/live/server.tsupdated:LiveConnectiongains auser: AuthenticatedUserfield (no longer optional).- The
'upgrade'handler validates the cookie before callingwss.handleUpgrade. Onnull, write a 401 HTTP response on the raw socket and destroy it (this is howwsrecommends rejecting upgrades cleanly). - On success, pass the validated user through to the
'connection'handler viareq[USER_KEY].
- New config keys (zod):
DIRECTUS_BASE_URL(defaulthttp://directus:8055) — where to call/users/me.DIRECTUS_AUTH_TIMEOUT_MS(default5_000).
- New Prometheus metrics:
processor_live_auth_attempts_total{result}—success/unauthorized/error.processor_live_auth_latency_ms(histogram).
test/live-auth.test.ts:- With a mocked Directus returning 200 + a user payload,
validatereturns the parsed user. - With 401, returns
nulland incrementsunauthorizedcounter. - With a network error, returns
nulland incrementserrorcounter (does not throw). - With a 200 but malformed payload (no
idfield), returnsnulland logs atwarn. - The HTTP timeout is enforced (
AbortControllerafterDIRECTUS_AUTH_TIMEOUT_MS).
- With a mocked Directus returning 200 + a user payload,
Specification
Cookie extraction
The browser attaches whatever cookies were set on the SPA's origin. Directus's refresh cookie default is named directus_refresh_token; the actual session is identified server-side via the access token in the Authorization header on REST calls — but for WebSocket upgrades there is no Authorization header, so we forward the cookie and let Directus handle session lookup.
function extractCookieHeader(req: IncomingMessage): string | null {
return req.headers.cookie ?? null;
}
If the header is missing entirely, fail fast — no point calling Directus.
/users/me call
async function validate(cookieHeader: string): Promise<AuthenticatedUser | null> {
if (!cookieHeader) return null;
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), config.DIRECTUS_AUTH_TIMEOUT_MS);
const start = performance.now();
try {
const res = await fetch(`${config.DIRECTUS_BASE_URL}/users/me?fields=id,email,role,first_name,last_name`, {
method: 'GET',
headers: { cookie: cookieHeader },
signal: controller.signal,
});
if (res.status === 401 || res.status === 403) {
metrics.authAttempts.inc({ result: 'unauthorized' });
return null;
}
if (!res.ok) {
logger.warn({ status: res.status }, 'directus auth call returned non-2xx');
metrics.authAttempts.inc({ result: 'error' });
return null;
}
const body = await res.json();
const user = AuthenticatedUserSchema.safeParse(body.data);
if (!user.success) {
logger.warn({ issues: user.error.issues }, 'directus /users/me returned unexpected shape');
metrics.authAttempts.inc({ result: 'error' });
return null;
}
metrics.authAttempts.inc({ result: 'success' });
return user.data;
} catch (err) {
if ((err as Error).name === 'AbortError') {
logger.warn('directus auth call timed out');
} else {
logger.warn({ err }, 'directus auth call failed');
}
metrics.authAttempts.inc({ result: 'error' });
return null;
} finally {
clearTimeout(timer);
metrics.authLatency.observe(performance.now() - start);
}
}
Notes:
- Field projection (
?fields=...) keeps the response small. The full user record has dozens of fields we don't need. - Forward the entire cookie header. Directus may rotate the refresh cookie on this call (it shouldn't on
/users/me, but be liberal); we ignore anySet-Cookiein the response — it's not our cookie to manage. - No retries. A failed validation immediately closes the upgrade. The SPA will reconnect, which gives a natural retry. Don't add server-side retry logic — masks bugs and slows down the bad-credential case.
Rejecting the upgrade
ws lets you reject by writing directly to the raw socket before handleUpgrade:
httpServer.on('upgrade', async (req, socket, head) => {
const cookie = extractCookieHeader(req);
const user = cookie ? await authClient.validate(cookie) : null;
if (!user) {
socket.write(
'HTTP/1.1 401 Unauthorized\r\n' +
'Content-Length: 0\r\n' +
'Connection: close\r\n' +
'\r\n'
);
socket.destroy();
return;
}
// Stash the user on the request object so the connection handler can pick it up.
(req as IncomingMessage & { user: AuthenticatedUser }).user = user;
wss.handleUpgrade(req, socket, head, (ws) => {
wss.emit('connection', ws, req);
});
});
wss.on('connection', (ws, req: IncomingMessage & { user: AuthenticatedUser }) => {
const conn: LiveConnection = {
id: nanoid(),
ws,
remoteAddr: req.socket.remoteAddress ?? 'unknown',
openedAt: new Date(),
lastSeenAt: new Date(),
user: req.user,
};
// ... rest of connection setup
});
What AuthenticatedUser does and doesn't include
Include only fields the registry (1.5.3) and Phase 4 permissions will need:
const AuthenticatedUserSchema = z.object({
id: z.string().uuid(),
email: z.string().email().nullable(),
role: z.string().uuid().nullable(), // Directus role id, not the `organization_users.role` enum
first_name: z.string().nullable(),
last_name: z.string().nullable(),
});
Don't pull in directus_users extension fields or anything specific to the TRM domain — those are queried per-subscription, not per-connection.
What we don't do (deferred)
- No JWT validation locally. The simplest path is the round-trip; cache only if the round-trip becomes a bottleneck (it won't at pilot scale).
- No refresh handling. The cookie's lifetime is the SPA's problem. If it expires mid-connection, server-side state is unaffected; the SPA will reconnect (which re-validates).
- No revocation re-checks. A user removed from the database mid-session keeps their WebSocket until they disconnect or the server restarts. Phase 4 hardening can add periodic re-validation if needed.
Acceptance criteria
pnpm typecheck,pnpm lint,pnpm testclean.- Connecting without a cookie returns HTTP 401 (visible in
wscat's output as a connection rejection with status code). - Connecting with a stale/invalid cookie returns HTTP 401.
- Connecting with a valid cookie (obtained via Directus's
/auth/loginwithmode: cookie) succeeds; the connection is logged with the user id. processor_live_auth_attempts_total{result="success"}increments on a successful upgrade.- Auth latency p95 < 100ms against a stage-realistic Directus (single
/users/mecall against a warm DB).
Risks / open questions
- Directus base URL in dev vs stage vs prod. In dev the SPA might run via Vite proxy at
localhost:5173, with Directus atlocalhost:8055. The Processor'sDIRECTUS_BASE_URLshould always be the internal Compose-network URL (http://directus:8055) — that's the path with the lowest latency and no proxy hops. Document this in.env.example. - Cookie scope. Directus issues the refresh cookie scoped to the public domain (e.g.
Domain=stage.trmtracking.org). The Processor receives the same cookie because the upgrade request hits the same origin (proxy fronts both). Verify this works end-to-end during the integration test (1.5.6). - What if
/users/mereturns 200 withdata: null? Directus does this when the cookie is well-formed but the session is expired. Treat asnulluser (returnnull, log atwarn).
Done
Landed in 190254d. Key deviations from spec:
- Added distinction between
data: null(unauthorized / expired session) and missingdatakey (error / malformed response) — the task spec only mentioneddata: nullbut the missing-key case is equally important. authClientis an optional parameter tocreateLiveServer(not required) so the existing unit tests that don't need auth work unchanged.- Used the
satisfiesoperator to pass the anonymous user placeholder at the no-auth code path for type safety.