Files

9.2 KiB

Task 1.5.2 — Cookie auth handshake

Phase: 1.5 — Live broadcast Status: Not started Depends on: 1.5.1 Wiki refs: docs/wiki/synthesis/processor-ws-contract.md §Auth handshake; docs/wiki/entities/directus.md; docs/wiki/entities/react-spa.md §Auth pattern

Goal

Authenticate WebSocket connections using the Directus-issued cookie attached to the upgrade request. Validate via a single /users/me round-trip to Directus; on success, bind the user identity to the LiveConnection for its lifetime; on failure, close with code 4401 before completing the upgrade.

After this task, anonymous connections are rejected — only Directus-authenticated users can hold an open WebSocket.

Deliverables

  • src/live/auth.ts exporting:
    • createAuthClient(config, logger): AuthClient — factory.
    • AuthClient interface: validate(cookieHeader: string): Promise<AuthenticatedUser | null>.
    • type AuthenticatedUser = { id: string; email: string; role: string | null; first_name: string | null; last_name: string | null } — minimum fields used by the registry (1.5.3) for authorization decisions.
    • validate returns null on any failure (network, 401, malformed response). Logs at warn with the failure reason.
  • src/live/server.ts updated:
    • LiveConnection gains a user: AuthenticatedUser field (no longer optional).
    • The 'upgrade' handler validates the cookie before calling wss.handleUpgrade. On null, write a 401 HTTP response on the raw socket and destroy it (this is how ws recommends rejecting upgrades cleanly).
    • On success, pass the validated user through to the 'connection' handler via req[USER_KEY].
  • New config keys (zod):
    • DIRECTUS_BASE_URL (default http://directus:8055) — where to call /users/me.
    • DIRECTUS_AUTH_TIMEOUT_MS (default 5_000).
  • New Prometheus metrics:
    • processor_live_auth_attempts_total{result}success / unauthorized / error.
    • processor_live_auth_latency_ms (histogram).
  • test/live-auth.test.ts:
    • With a mocked Directus returning 200 + a user payload, validate returns the parsed user.
    • With 401, returns null and increments unauthorized counter.
    • With a network error, returns null and increments error counter (does not throw).
    • With a 200 but malformed payload (no id field), returns null and logs at warn.
    • The HTTP timeout is enforced (AbortController after DIRECTUS_AUTH_TIMEOUT_MS).

Specification

The browser attaches whatever cookies were set on the SPA's origin. Directus's refresh cookie default is named directus_refresh_token; the actual session is identified server-side via the access token in the Authorization header on REST calls — but for WebSocket upgrades there is no Authorization header, so we forward the cookie and let Directus handle session lookup.

function extractCookieHeader(req: IncomingMessage): string | null {
  return req.headers.cookie ?? null;
}

If the header is missing entirely, fail fast — no point calling Directus.

/users/me call

async function validate(cookieHeader: string): Promise<AuthenticatedUser | null> {
  if (!cookieHeader) return null;

  const controller = new AbortController();
  const timer = setTimeout(() => controller.abort(), config.DIRECTUS_AUTH_TIMEOUT_MS);

  const start = performance.now();
  try {
    const res = await fetch(`${config.DIRECTUS_BASE_URL}/users/me?fields=id,email,role,first_name,last_name`, {
      method: 'GET',
      headers: { cookie: cookieHeader },
      signal: controller.signal,
    });

    if (res.status === 401 || res.status === 403) {
      metrics.authAttempts.inc({ result: 'unauthorized' });
      return null;
    }
    if (!res.ok) {
      logger.warn({ status: res.status }, 'directus auth call returned non-2xx');
      metrics.authAttempts.inc({ result: 'error' });
      return null;
    }

    const body = await res.json();
    const user = AuthenticatedUserSchema.safeParse(body.data);
    if (!user.success) {
      logger.warn({ issues: user.error.issues }, 'directus /users/me returned unexpected shape');
      metrics.authAttempts.inc({ result: 'error' });
      return null;
    }

    metrics.authAttempts.inc({ result: 'success' });
    return user.data;
  } catch (err) {
    if ((err as Error).name === 'AbortError') {
      logger.warn('directus auth call timed out');
    } else {
      logger.warn({ err }, 'directus auth call failed');
    }
    metrics.authAttempts.inc({ result: 'error' });
    return null;
  } finally {
    clearTimeout(timer);
    metrics.authLatency.observe(performance.now() - start);
  }
}

Notes:

  • Field projection (?fields=...) keeps the response small. The full user record has dozens of fields we don't need.
  • Forward the entire cookie header. Directus may rotate the refresh cookie on this call (it shouldn't on /users/me, but be liberal); we ignore any Set-Cookie in the response — it's not our cookie to manage.
  • No retries. A failed validation immediately closes the upgrade. The SPA will reconnect, which gives a natural retry. Don't add server-side retry logic — masks bugs and slows down the bad-credential case.

Rejecting the upgrade

ws lets you reject by writing directly to the raw socket before handleUpgrade:

httpServer.on('upgrade', async (req, socket, head) => {
  const cookie = extractCookieHeader(req);
  const user = cookie ? await authClient.validate(cookie) : null;

  if (!user) {
    socket.write(
      'HTTP/1.1 401 Unauthorized\r\n' +
      'Content-Length: 0\r\n' +
      'Connection: close\r\n' +
      '\r\n'
    );
    socket.destroy();
    return;
  }

  // Stash the user on the request object so the connection handler can pick it up.
  (req as IncomingMessage & { user: AuthenticatedUser }).user = user;

  wss.handleUpgrade(req, socket, head, (ws) => {
    wss.emit('connection', ws, req);
  });
});

wss.on('connection', (ws, req: IncomingMessage & { user: AuthenticatedUser }) => {
  const conn: LiveConnection = {
    id: nanoid(),
    ws,
    remoteAddr: req.socket.remoteAddress ?? 'unknown',
    openedAt: new Date(),
    lastSeenAt: new Date(),
    user: req.user,
  };
  // ... rest of connection setup
});

What AuthenticatedUser does and doesn't include

Include only fields the registry (1.5.3) and Phase 4 permissions will need:

const AuthenticatedUserSchema = z.object({
  id: z.string().uuid(),
  email: z.string().email().nullable(),
  role: z.string().uuid().nullable(),  // Directus role id, not the `organization_users.role` enum
  first_name: z.string().nullable(),
  last_name: z.string().nullable(),
});

Don't pull in directus_users extension fields or anything specific to the TRM domain — those are queried per-subscription, not per-connection.

What we don't do (deferred)

  • No JWT validation locally. The simplest path is the round-trip; cache only if the round-trip becomes a bottleneck (it won't at pilot scale).
  • No refresh handling. The cookie's lifetime is the SPA's problem. If it expires mid-connection, server-side state is unaffected; the SPA will reconnect (which re-validates).
  • No revocation re-checks. A user removed from the database mid-session keeps their WebSocket until they disconnect or the server restarts. Phase 4 hardening can add periodic re-validation if needed.

Acceptance criteria

  • pnpm typecheck, pnpm lint, pnpm test clean.
  • Connecting without a cookie returns HTTP 401 (visible in wscat's output as a connection rejection with status code).
  • Connecting with a stale/invalid cookie returns HTTP 401.
  • Connecting with a valid cookie (obtained via Directus's /auth/login with mode: cookie) succeeds; the connection is logged with the user id.
  • processor_live_auth_attempts_total{result="success"} increments on a successful upgrade.
  • Auth latency p95 < 100ms against a stage-realistic Directus (single /users/me call against a warm DB).

Risks / open questions

  • Directus base URL in dev vs stage vs prod. In dev the SPA might run via Vite proxy at localhost:5173, with Directus at localhost:8055. The Processor's DIRECTUS_BASE_URL should always be the internal Compose-network URL (http://directus:8055) — that's the path with the lowest latency and no proxy hops. Document this in .env.example.
  • Cookie scope. Directus issues the refresh cookie scoped to the public domain (e.g. Domain=stage.trmtracking.org). The Processor receives the same cookie because the upgrade request hits the same origin (proxy fronts both). Verify this works end-to-end during the integration test (1.5.6).
  • What if /users/me returns 200 with data: null? Directus does this when the cookie is well-formed but the session is expired. Treat as null user (return null, log at warn).

Done

Landed in 190254d. Key deviations from spec:

  • Added distinction between data: null (unauthorized / expired session) and missing data key (error / malformed response) — the task spec only mentioned data: null but the missing-key case is equally important.
  • authClient is an optional parameter to createLiveServer (not required) so the existing unit tests that don't need auth work unchanged.
  • Used the satisfies operator to pass the anonymous user placeholder at the no-auth code path for type safety.