src/observability/metrics.ts — full prom-client implementation. All 10
Phase 1 metrics registered (processor_consumer_reads_total,
_records_total, _lag, _decode_errors_total, processor_position_writes_total
{status}, _write_duration_seconds, processor_acks_total,
processor_device_state_{size,evictions_total}) plus nodejs_* defaults.
node:http server with /metrics, /healthz, /readyz. /readyz checks
redis.status === 'ready' AND a 5s-cached SELECT 1 Postgres probe.
processor_consumer_lag sampled every 10s via XINFO GROUPS, falling back
to a no-op when the consumer group hasn't been created yet.
src/main.ts — replaces the trace-logging shim with createMetrics() and
startMetricsServer(); shutdown closes the metrics server before
redis.quit() and pool.end().
test/metrics.test.ts — 22 unit tests: exposition format, every metric
type behaviour, all four HTTP endpoint paths including /readyz 503 cases.
test/pipeline.integration.test.ts — testcontainers Redis 7 +
TimescaleDB latest-pg16. Four scenarios: happy path with bigint+Buffer
attribute round-trip, idempotency on (device_id, ts), malformed payload
stays in PEL (decode_errors_total increments), writer failure → retry
(weaker variant per spec: stop Postgres before publish, restart, verify
row appears). Skip-on-no-Docker pattern verified — exits 0 without
Docker.
Dockerfile — multi-stage matching tcp-ingestion. EXPOSE 9090 only,
HEALTHCHECK on /readyz, image-source label points at processor repo.
.gitea/workflows/build.yml — single-job workflow mirroring
tcp-ingestion. Path filters cover src/, test/, build config, Dockerfile.
Portainer webhook step uncommented for :main auto-deploy.
compose.dev.yaml — local-build variant with Redis + TimescaleDB +
processor-dev for verifying Dockerfile changes without the registry
round-trip.
README.md — fleshed out from stub: quick-start, Docker build, deployment
note, env vars, tests (unit vs. integration), CI behavior. Flags the
deploy-side change needed: deploy/compose.yaml needs a TimescaleDB
service and a processor service entry added.
Verification: typecheck, lint clean; 134 unit tests passing across 8
files (+22 from this batch). pnpm test:integration runs cleanly under
the no-Docker skip pattern.
Phase 1 is now complete. Service is pilot-ready.
7.4 KiB
processor — Roadmap
A Node.js worker service that consumes normalized Position records from a Redis Stream, maintains per-device runtime state, applies racing-domain rules, and writes durable state to Postgres / TimescaleDB.
This file is the single navigation hub for all implementation planning. Each phase has its own folder with a README and granular task files. Update statuses here as work lands.
Status legend
| Symbol | Meaning |
|---|---|
| ⬜ | Not started |
| 🟦 | Planned (designed, not coded) |
| 🟨 | In progress |
| 🟩 | Done |
| ⏸ | Paused / blocked |
| ❄️ | Frozen / future / optional |
Architectural anchors
The service is specified by the wiki at ../docs/wiki/. Implementing agents should read these pages before starting any task:
- Architecture —
docs/wiki/sources/gps-tracking-architecture.md,docs/wiki/concepts/plane-separation.md,docs/wiki/concepts/failure-domains.md - This service —
docs/wiki/entities/processor.md - Upstream contract (input) —
docs/wiki/concepts/position-record.md,docs/wiki/concepts/io-element-bag.md,docs/wiki/entities/redis-streams.md - Downstream contract (output) —
docs/wiki/entities/postgres-timescaledb.md,docs/wiki/entities/directus.md
Non-negotiable design rules
These rules govern every task. Any deviation must be discussed and documented as a decision before code lands.
- Domain logic is isolated.
src/core/(Stream consumer, Postgres writer, in-memory state plumbing) never imports fromsrc/domain/(geofence engine, timing logic, IO interpretation). Phase 2 must be a pure addition layered on top of the Phase 1 throughput pipeline. - Schema authority lives in Directus, with one exception: the
positionshypertable is bulk-written by this service and its migration is owned here. All other tables Processor writes to (timing_records, stage_results, etc.) are defined in Directus and Processor inserts respecting that schema. - Per-device state is in-memory, not durable. The DB is the source of truth for replay/analysis; in-memory state is the source of truth for the current decision. On restart, hot state is rehydrated from the DB — Phase 1 does not implement rehydration; restart loses state, which is acceptable until Phase 2 introduces stateful accumulators.
- Consumer-group offsets drive work distribution. No application-level coordination between Processor instances.
XACKon success; failed batches stay pending and are claimed by surviving instances viaXAUTOCLAIM. - Idempotent writes. Records arriving twice (after a claim, replay, or retry) must not produce duplicate rows. The
positionshypertable uses(device_id, ts)as a unique key withON CONFLICT DO NOTHING. Derived tables follow the same pattern, scoped by their natural keys. - Bounded memory. Per-device state is capped (LRU eviction by last-seen timestamp); replay-from-DB rehydrates an evicted device on next packet. No unbounded
Map<imei, ...>growth. - Fail loudly. Schema-incompatible records (e.g. malformed payload, unknown sentinel) are logged at
errorand dead-letter-streamed (Phase 3); they do not silently skip.
Phases
Phase 1 — Throughput pipeline
Status: 🟩 Done
Outcome: A Node.js Processor that joins a Redis Streams consumer group on telemetry:t, decodes each Position (including __bigint/__buffer_b64 sentinel reversal), upserts it into a TimescaleDB positions hypertable, updates per-device in-memory state (last position, last seen), XACKs on successful write, and exposes Prometheus metrics + health/readiness HTTP endpoints. End-to-end pilot-quality service; no domain logic yet.
See phase-1-throughput/README.md
| # | Task | Status | Landed in |
|---|---|---|---|
| 1.1 | Project scaffold | 🟩 | 290a08e |
| 1.2 | Core types & contracts | 🟩 | 290a08e |
| 1.3 | Configuration & logging | 🟩 | 290a08e |
| 1.4 | Postgres connection & positions hypertable |
🟩 | 290a08e |
| 1.5 | Redis Stream consumer (XREADGROUP) | 🟩 | 68d3da3 |
| 1.6 | Per-device in-memory state | 🟩 | 68d3da3 |
| 1.7 | Position writer (batched upsert) | 🟩 | 68d3da3 |
| 1.8 | Main wiring & ACK semantics | 🟩 | 68d3da3 |
| 1.9 | Observability (Prometheus metrics + /healthz + /readyz) | 🟩 | (pending commit SHA) |
| 1.10 | Integration test (testcontainers Redis + Postgres) | 🟩 | (pending commit SHA) |
| 1.11 | Dockerfile & Gitea workflow | 🟩 | (pending commit SHA) |
Phase 2 — Domain logic
Status: ⬜ Not started — blocks on Directus schema decisions
Outcome: Geofence engine that detects entry/checkpoint/finish crossings; per-model Teltonika IO mapping driving derived attributes (odometer_km, ignition, etc.); timing record writer producing entries in the Directus-owned timing_records table; per-stage result aggregator. Layered on top of Phase 1 — no changes to the throughput pipeline.
Detailed task breakdown deferred until the Directus schema is finalized (open questions about geofence ownership, IO mapping storage, stage vocabulary). Phase 1 can ship and run on stage without any Phase 2 work.
Phase 3 — Production hardening
Status: ⬜ Not started
Outcome: Graceful shutdown with consumer-group commit on SIGTERM; per-device state rehydration from Postgres on startup (only loaded on first packet for a given device); XAUTOCLAIM for stuck pending entries from a dead instance; dead-letter stream for poison records; multi-instance load-split verification; OPERATIONS.md runbook.
See phase-3-hardening/README.md
Phase 4 — Future / optional
Status: ❄️ Not committed
See phase-4-future/README.md
Ideas on radar: Directus Flow trigger emission, replay tooling, derived-metric backfill, alternate consumer for analytics export.
Operating model
- Implementation agent contract. Each task file is self-sufficient: goal, deliverables, specification, acceptance criteria. An agent should be able to complete one task without reading the whole wiki — but should skim the wiki references at the top of the task before starting.
- Sequence within a phase. Task numbering reflects intended order. Soft dependencies are explicit in each task's "Depends on" field. Tasks with no dependencies on each other can be done in parallel.
- Status updates. When a task is started, change its row in this ROADMAP to 🟨 and the task file's status badge accordingly. When done, 🟩 + a one-line note in the task file's "Done" section pointing at the merging commit/PR.
- Drift control. If implementation diverges from a task's spec, update the task file before the diverging code lands, with a note explaining why. Do not let plans rot — either fix the plan or fix the code.