Files

T

julian be48da9baa Implement Phase 1 tasks 1.9-1.11 (observability + integration test + Dockerfile/CI)

src/observability/metrics.ts — full prom-client implementation. All 10
Phase 1 metrics registered (processor_consumer_reads_total,
_records_total, _lag, _decode_errors_total, processor_position_writes_total
{status}, _write_duration_seconds, processor_acks_total,
processor_device_state_{size,evictions_total}) plus nodejs_* defaults.
node:http server with /metrics, /healthz, /readyz. /readyz checks
redis.status === 'ready' AND a 5s-cached SELECT 1 Postgres probe.
processor_consumer_lag sampled every 10s via XINFO GROUPS, falling back
to a no-op when the consumer group hasn't been created yet.

src/main.ts — replaces the trace-logging shim with createMetrics() and
startMetricsServer(); shutdown closes the metrics server before
redis.quit() and pool.end().

test/metrics.test.ts — 22 unit tests: exposition format, every metric
type behaviour, all four HTTP endpoint paths including /readyz 503 cases.

test/pipeline.integration.test.ts — testcontainers Redis 7 +
TimescaleDB latest-pg16. Four scenarios: happy path with bigint+Buffer
attribute round-trip, idempotency on (device_id, ts), malformed payload
stays in PEL (decode_errors_total increments), writer failure → retry
(weaker variant per spec: stop Postgres before publish, restart, verify
row appears). Skip-on-no-Docker pattern verified — exits 0 without
Docker.

Dockerfile — multi-stage matching tcp-ingestion. EXPOSE 9090 only,
HEALTHCHECK on /readyz, image-source label points at processor repo.

.gitea/workflows/build.yml — single-job workflow mirroring
tcp-ingestion. Path filters cover src/, test/, build config, Dockerfile.
Portainer webhook step uncommented for :main auto-deploy.

compose.dev.yaml — local-build variant with Redis + TimescaleDB +
processor-dev for verifying Dockerfile changes without the registry
round-trip.

README.md — fleshed out from stub: quick-start, Docker build, deployment
note, env vars, tests (unit vs. integration), CI behavior. Flags the
deploy-side change needed: deploy/compose.yaml needs a TimescaleDB
service and a processor service entry added.

Verification: typecheck, lint clean; 134 unit tests passing across 8
files (+22 from this batch). pnpm test:integration runs cleanly under
the no-Docker skip pattern.

Phase 1 is now complete. Service is pilot-ready.

2026-04-30 22:01:55 +02:00

7.4 KiB

Raw Blame History

processor — Roadmap

A Node.js worker service that consumes normalized Position records from a Redis Stream, maintains per-device runtime state, applies racing-domain rules, and writes durable state to Postgres / TimescaleDB.

This file is the single navigation hub for all implementation planning. Each phase has its own folder with a README and granular task files. Update statuses here as work lands.

Status legend

Symbol	Meaning
⬜	Not started
🟦	Planned (designed, not coded)
🟨	In progress
🟩	Done
⏸	Paused / blocked
❄️	Frozen / future / optional

Architectural anchors

The service is specified by the wiki at ../docs/wiki/. Implementing agents should read these pages before starting any task:

Architecture — docs/wiki/sources/gps-tracking-architecture.md, docs/wiki/concepts/plane-separation.md, docs/wiki/concepts/failure-domains.md
This service — docs/wiki/entities/processor.md
Upstream contract (input) — docs/wiki/concepts/position-record.md, docs/wiki/concepts/io-element-bag.md, docs/wiki/entities/redis-streams.md
Downstream contract (output) — docs/wiki/entities/postgres-timescaledb.md, docs/wiki/entities/directus.md

Non-negotiable design rules

These rules govern every task. Any deviation must be discussed and documented as a decision before code lands.

Domain logic is isolated. src/core/ (Stream consumer, Postgres writer, in-memory state plumbing) never imports from src/domain/ (geofence engine, timing logic, IO interpretation). Phase 2 must be a pure addition layered on top of the Phase 1 throughput pipeline.
Schema authority lives in Directus, with one exception: the positions hypertable is bulk-written by this service and its migration is owned here. All other tables Processor writes to (timing_records, stage_results, etc.) are defined in Directus and Processor inserts respecting that schema.
Per-device state is in-memory, not durable. The DB is the source of truth for replay/analysis; in-memory state is the source of truth for the current decision. On restart, hot state is rehydrated from the DB — Phase 1 does not implement rehydration; restart loses state, which is acceptable until Phase 2 introduces stateful accumulators.
Consumer-group offsets drive work distribution. No application-level coordination between Processor instances. XACK on success; failed batches stay pending and are claimed by surviving instances via XAUTOCLAIM.
Idempotent writes. Records arriving twice (after a claim, replay, or retry) must not produce duplicate rows. The positions hypertable uses (device_id, ts) as a unique key with ON CONFLICT DO NOTHING. Derived tables follow the same pattern, scoped by their natural keys.
Bounded memory. Per-device state is capped (LRU eviction by last-seen timestamp); replay-from-DB rehydrates an evicted device on next packet. No unbounded Map<imei, ...> growth.
Fail loudly. Schema-incompatible records (e.g. malformed payload, unknown sentinel) are logged at error and dead-letter-streamed (Phase 3); they do not silently skip.

Phases

Phase 1 — Throughput pipeline

Status: 🟩 Done Outcome: A Node.js Processor that joins a Redis Streams consumer group on telemetry:t, decodes each Position (including __bigint/__buffer_b64 sentinel reversal), upserts it into a TimescaleDB positions hypertable, updates per-device in-memory state (last position, last seen), XACKs on successful write, and exposes Prometheus metrics + health/readiness HTTP endpoints. End-to-end pilot-quality service; no domain logic yet.

See phase-1-throughput/README.md

#	Task	Status	Landed in
1.1	Project scaffold	🟩	`290a08e`
1.2	Core types & contracts	🟩	`290a08e`
1.3	Configuration & logging	🟩	`290a08e`
1.4	Postgres connection & `positions` hypertable	🟩	`290a08e`
1.5	Redis Stream consumer (XREADGROUP)	🟩	`68d3da3`
1.6	Per-device in-memory state	🟩	`68d3da3`
1.7	Position writer (batched upsert)	🟩	`68d3da3`
1.8	Main wiring & ACK semantics	🟩	`68d3da3`
1.9	Observability (Prometheus metrics + /healthz + /readyz)	🟩	(pending commit SHA)
1.10	Integration test (testcontainers Redis + Postgres)	🟩	(pending commit SHA)
1.11	Dockerfile & Gitea workflow	🟩	(pending commit SHA)

Phase 2 — Domain logic

Status: ⬜ Not started — blocks on Directus schema decisions Outcome: Geofence engine that detects entry/checkpoint/finish crossings; per-model Teltonika IO mapping driving derived attributes (odometer_km, ignition, etc.); timing record writer producing entries in the Directus-owned timing_records table; per-stage result aggregator. Layered on top of Phase 1 — no changes to the throughput pipeline.

See phase-2-domain/README.md

Detailed task breakdown deferred until the Directus schema is finalized (open questions about geofence ownership, IO mapping storage, stage vocabulary). Phase 1 can ship and run on stage without any Phase 2 work.

Phase 3 — Production hardening

Status: ⬜ Not started Outcome: Graceful shutdown with consumer-group commit on SIGTERM; per-device state rehydration from Postgres on startup (only loaded on first packet for a given device); XAUTOCLAIM for stuck pending entries from a dead instance; dead-letter stream for poison records; multi-instance load-split verification; OPERATIONS.md runbook.

See phase-3-hardening/README.md

Phase 4 — Future / optional

Status: ❄️ Not committed See phase-4-future/README.md

Ideas on radar: Directus Flow trigger emission, replay tooling, derived-metric backfill, alternate consumer for analytics export.

Operating model

Implementation agent contract. Each task file is self-sufficient: goal, deliverables, specification, acceptance criteria. An agent should be able to complete one task without reading the whole wiki — but should skim the wiki references at the top of the task before starting.
Sequence within a phase. Task numbering reflects intended order. Soft dependencies are explicit in each task's "Depends on" field. Tasks with no dependencies on each other can be done in parallel.
Status updates. When a task is started, change its row in this ROADMAP to 🟨 and the task file's status badge accordingly. When done, 🟩 + a one-line note in the task file's "Done" section pointing at the merging commit/PR.
Drift control. If implementation diverges from a task's spec, update the task file before the diverging code lands, with a note explaining why. Do not let plans rot — either fix the plan or fix the code.

7.4 KiB Raw Blame History