Authenticate WebSocket upgrade requests via Directus's /users/me:
- src/live/auth.ts: createAuthClient factory; validate() forwards the raw
Cookie: header to Directus, parses the user with zod, and returns
AuthenticatedUser or null. Handles 401/403 (unauthorized), non-2xx (error),
network failures, AbortError (timeout), null data (expired session), and
missing data key (malformed Directus response).
- src/live/server.ts: upgrade handler now calls authClient.validate() before
completing the WS handshake; on null user, writes HTTP 401 and destroys the
socket. LiveConnection gains user: AuthenticatedUser and cookieHeader: string
(needed for per-subscription authz in task 1.5.3). authClient is an optional
parameter so tests without auth still work.
- src/main.ts: wires createAuthClient and passes it to createLiveServer.
- test/live-auth.test.ts: 11 unit tests covering all validate() code paths
including the empty-cookie fast-path, latency histogram observation, and
distinction between unauthorized (401/expired) and error (malformed) results.
Stage discovered the wrong default at runtime: tcp-ingestion's compiled
default REDIS_TELEMETRY_STREAM is 'telemetry:teltonika' but processor's
was 'telemetry:t', so the two services were talking past each other —
tcp-ingestion publishing to one stream, processor reading another empty
one. The deploy stack now pins both to the same value via a shared env
var, but the processor's compiled default should also match so local
development and the integration test stay aligned with reality.
Changes:
- src/config/load.ts — default changed to 'telemetry:teltonika'
- .env.example — same
- test/config.test.ts — default-value assertion updated
- planning docs (ROADMAP, phase-1 README, tasks 03/08/10, phase-3 README) —
occurrences of 'telemetry:t' replaced with 'telemetry:teltonika'
The deploy stack remains the single source of truth via the shared
REDIS_TELEMETRY_STREAM env var. Compiled defaults are belt-and-braces.
Several Phase 1 metrics were registered in observability/metrics.ts but
either unwired at the call sites or wired with wrong counts. Production
output showed 11 records ingested per logs but only 4 in metrics. The
fixes below align metric values with actual hot-path activity.
Wiring gaps closed (consumer.ts):
- processor_consumer_reads_total{result=ok|empty|error} — was registered
but never inc'd. Now fires on each XREADGROUP outcome.
- processor_consumer_records_total — was registered but never inc'd.
Now fires once per XREADGROUP, with the entry count.
Counts corrected (writer.ts):
- processor_position_writes_total{status} — was inc'd unconditionally
by 1 per chunk for each of inserted/duplicate. Now inc'd by the
actual per-status count, and only when count > 0.
- processor_position_writes_total{status='failed'} — was inc'd by 1
per failed chunk. Now inc'd by chunk.length so every failed record
is counted.
Counts corrected (main.ts):
- processor_acks_total — was inc'd by 1 per non-empty batch. Now
inc'd by ackIds.length so every ACK'd ID is counted.
Wiring gap closed (state.ts):
- processor_device_state_evictions_total — internal `evicted` counter
existed but was never published to metrics. createDeviceStateStore
now accepts a Metrics injection and inc's on each eviction.
Metrics interface extended (types.ts, metrics.ts):
- Metrics.inc gained an optional third `value` parameter (defaults to 1)
for batched increments. dispatchInc passes it through to prom-client's
Counter.inc(labels, value).
Tests updated to reflect the new third arg and the state.ts factory's
new metrics parameter. Total 134 unit tests passing (no count change —
existing tests adjusted, no new tests added; the real verification is
on stage where the metrics are now meaningful again).
Migration 0001_positions.sql now runs CREATE EXTENSION IF NOT EXISTS
postgis alongside timescaledb. PostGIS isn't used in Phase 1 but enabling
it now means Phase 2's geofence engine doesn't need a separate migration
step. The deploy stack uses the timescale/timescaledb-ha:*-all image
which ships both extensions.
Integration test (pipeline.integration.test.ts) updated to use the same
timescale/timescaledb-ha:pg16.6-ts2.17.2-all image as the deploy stack.
Stock POSTGRES_USER/PASSWORD/DB env vars retained — if recent ha-image
revisions don't accept them, the test container will fail clearly on
first run with Docker, and we'll switch to the right env-var scheme.
src/observability/metrics.ts — full prom-client implementation. All 10
Phase 1 metrics registered (processor_consumer_reads_total,
_records_total, _lag, _decode_errors_total, processor_position_writes_total
{status}, _write_duration_seconds, processor_acks_total,
processor_device_state_{size,evictions_total}) plus nodejs_* defaults.
node:http server with /metrics, /healthz, /readyz. /readyz checks
redis.status === 'ready' AND a 5s-cached SELECT 1 Postgres probe.
processor_consumer_lag sampled every 10s via XINFO GROUPS, falling back
to a no-op when the consumer group hasn't been created yet.
src/main.ts — replaces the trace-logging shim with createMetrics() and
startMetricsServer(); shutdown closes the metrics server before
redis.quit() and pool.end().
test/metrics.test.ts — 22 unit tests: exposition format, every metric
type behaviour, all four HTTP endpoint paths including /readyz 503 cases.
test/pipeline.integration.test.ts — testcontainers Redis 7 +
TimescaleDB latest-pg16. Four scenarios: happy path with bigint+Buffer
attribute round-trip, idempotency on (device_id, ts), malformed payload
stays in PEL (decode_errors_total increments), writer failure → retry
(weaker variant per spec: stop Postgres before publish, restart, verify
row appears). Skip-on-no-Docker pattern verified — exits 0 without
Docker.
Dockerfile — multi-stage matching tcp-ingestion. EXPOSE 9090 only,
HEALTHCHECK on /readyz, image-source label points at processor repo.
.gitea/workflows/build.yml — single-job workflow mirroring
tcp-ingestion. Path filters cover src/, test/, build config, Dockerfile.
Portainer webhook step uncommented for :main auto-deploy.
compose.dev.yaml — local-build variant with Redis + TimescaleDB +
processor-dev for verifying Dockerfile changes without the registry
round-trip.
README.md — fleshed out from stub: quick-start, Docker build, deployment
note, env vars, tests (unit vs. integration), CI behavior. Flags the
deploy-side change needed: deploy/compose.yaml needs a TimescaleDB
service and a processor service entry added.
Verification: typecheck, lint clean; 134 unit tests passing across 8
files (+22 from this batch). pnpm test:integration runs cleanly under
the no-Docker skip pattern.
Phase 1 is now complete. Service is pilot-ready.
src/core/consumer.ts — XREADGROUP loop with consumer-group resumption,
ensureConsumerGroup (BUSYGROUP-tolerant), decodeBatch (CodecError → log
+ skip + leave pending; never speculative ACK), partial-ACK semantics,
connectRedis (mirroring tcp-ingestion's retry pattern), clean stop.
src/core/state.ts — LRU Map<device_id, DeviceState> using delete+set
bump trick (no third-party LRU dep); last_seen = max(prev, ts) so
out-of-order replays don't regress the high-water mark; evictedTotal()
counter.
src/core/writer.ts — multi-row INSERT ON CONFLICT (device_id, ts) DO
NOTHING with RETURNING. Duplicate detection by set-difference between
input and RETURNING rows (xmax=0 doesn't work for skipped-conflict
rows, only returned ones — confirmed in the task spec's own Note).
Sequential chunking to WRITE_BATCH_SIZE; bigint→string and Buffer→base64
attribute serialization that handles Buffer.toJSON shape.
src/main.ts — full pipeline: pool → migrate → redis → state → writer →
sink → consumer → graceful-shutdown stub. Sink ordering is
state.update BEFORE writer.write per spec rationale (state stays
consistent with what's been seen even if not yet persisted; redelivery
is idempotent on state). Metrics is still the trace-logging shim from
tcp-ingestion's pre-1.10 pattern; real prom-client lands in task 1.9.
Verification: typecheck, lint clean; 112 unit tests passing across 7
test files (+39 from this batch).
Scaffold mirrors tcp-ingestion conventions: ESM, strict TS, pnpm, vitest
with unit/integration split, ESLint flat config with no-floating-promises
+ no-misused-promises + import/no-restricted-paths (the new src/core/ →
src/domain/ boundary that protects Phase 1 from Phase 2 churn).
Core types in src/core/types.ts (Position, StreamRecord, DeviceState,
Metrics, AttributeValue) — Position is byte-equivalent to tcp-ingestion's
output. Codec in src/core/codec.ts implements sentinel reversal:
{__bigint:"..."} → bigint, {__buffer_b64:"..."} → Buffer, ISO timestamp
string → Date. CodecError surfaces malformed payload reasons with the
failing field named.
Config in src/config/load.ts (zod schema, all 13 env vars with defaults
and bounded numerics). Logger in src/observability/logger.ts matches
tcp-ingestion exactly: ISO timestamps, string level labels, pino-pretty
in development.
Postgres in src/db/: createPool with sane defaults and application_name,
connectWithRetry mirroring the ioredis retry pattern, a 30-line
migration runner using a schema_migrations table, and 0001_positions.sql
with the hypertable + (device_id, ts) unique index + ts DESC index.
Migration runner unit-tested against a mocked pg.Pool; the real
TimescaleDB round-trip is deferred to task 1.10 per spec.
Verification: typecheck, lint, build all clean; 73 unit tests passing
across 4 files. import/no-restricted-paths verified live by temporarily
adding a forbidden src/domain/ import.