Add planning documents for Phase 1 (throughput pipeline) and stub Phases 2-4

ROADMAP.md establishes status legend, architectural anchors pointing at the
wiki, and seven non-negotiable design rules — most importantly the
core/domain boundary that protects Phase 1 from Phase 2 churn, the
schema-authority split (positions hypertable owned here; everything else
owned by Directus), and idempotent-writes via (device_id, ts) ON CONFLICT.

Phase 1 (throughput pipeline) is fully detailed across 11 task files:
scaffold, core types + sentinel decoder, config + logging, Postgres
hypertable, Redis Stream consumer, per-device LRU state, batched writer,
main wiring, observability, integration test, Dockerfile + Gitea CI.
Observability is in Phase 1 (not deferred) — lesson learned from
tcp-ingestion task 1.10.

Phases 2-4 are stub READMEs. Phase 2 (domain logic) blocks on Directus
schema decisions and lists those open questions explicitly. Phase 3
(production hardening) and Phase 4 (future) sketch the task shape.
This commit is contained in:
2026-04-30 21:16:26 +02:00
parent 1a4202f4d1
commit c314ba0902
17 changed files with 1191 additions and 0 deletions
+90
View File
@@ -0,0 +1,90 @@
# processor — Roadmap
A Node.js worker service that consumes normalized `Position` records from a Redis Stream, maintains per-device runtime state, applies racing-domain rules, and writes durable state to Postgres / TimescaleDB.
This file is the single navigation hub for all implementation planning. Each phase has its own folder with a README and granular task files. Update statuses here as work lands.
## Status legend
| Symbol | Meaning |
|--------|---------|
| ⬜ | Not started |
| 🟦 | Planned (designed, not coded) |
| 🟨 | In progress |
| 🟩 | Done |
| ⏸ | Paused / blocked |
| ❄️ | Frozen / future / optional |
## Architectural anchors
The service is specified by the wiki at `../docs/wiki/`. Implementing agents should read these pages before starting any task:
- **Architecture** — `docs/wiki/sources/gps-tracking-architecture.md`, `docs/wiki/concepts/plane-separation.md`, `docs/wiki/concepts/failure-domains.md`
- **This service** — `docs/wiki/entities/processor.md`
- **Upstream contract (input)** — `docs/wiki/concepts/position-record.md`, `docs/wiki/concepts/io-element-bag.md`, `docs/wiki/entities/redis-streams.md`
- **Downstream contract (output)** — `docs/wiki/entities/postgres-timescaledb.md`, `docs/wiki/entities/directus.md`
## Non-negotiable design rules
These rules govern every task. Any deviation must be discussed and documented as a decision before code lands.
1. **Domain logic is isolated.** `src/core/` (Stream consumer, Postgres writer, in-memory state plumbing) never imports from `src/domain/` (geofence engine, timing logic, IO interpretation). Phase 2 must be a pure addition layered on top of the Phase 1 throughput pipeline.
2. **Schema authority lives in Directus**, with one exception: the `positions` hypertable is bulk-written by this service and its migration is owned here. All other tables Processor writes to (timing_records, stage_results, etc.) are defined in Directus and Processor inserts respecting that schema.
3. **Per-device state is in-memory, not durable.** The DB is the source of truth for replay/analysis; in-memory state is the source of truth for the current decision. On restart, hot state is rehydrated from the DB — Phase 1 does not implement rehydration; restart loses state, which is acceptable until Phase 2 introduces stateful accumulators.
4. **Consumer-group offsets drive work distribution.** No application-level coordination between Processor instances. `XACK` on success; failed batches stay pending and are claimed by surviving instances via `XAUTOCLAIM`.
5. **Idempotent writes.** Records arriving twice (after a claim, replay, or retry) must not produce duplicate rows. The `positions` hypertable uses `(device_id, ts)` as a unique key with `ON CONFLICT DO NOTHING`. Derived tables follow the same pattern, scoped by their natural keys.
6. **Bounded memory.** Per-device state is capped (LRU eviction by last-seen timestamp); replay-from-DB rehydrates an evicted device on next packet. No unbounded `Map<imei, ...>` growth.
7. **Fail loudly.** Schema-incompatible records (e.g. malformed payload, unknown sentinel) are logged at `error` and dead-letter-streamed (Phase 3); they do **not** silently skip.
## Phases
### Phase 1 — Throughput pipeline
**Status:** ⬜ Not started
**Outcome:** A Node.js Processor that joins a Redis Streams consumer group on `telemetry:t`, decodes each `Position` (including `__bigint`/`__buffer_b64` sentinel reversal), upserts it into a TimescaleDB `positions` hypertable, updates per-device in-memory state (last position, last seen), `XACK`s on successful write, and exposes Prometheus metrics + health/readiness HTTP endpoints. End-to-end pilot-quality service; no domain logic yet.
[**See `phase-1-throughput/README.md`**](./phase-1-throughput/README.md)
| # | Task | Status | Landed in |
|---|------|--------|-----------|
| 1.1 | [Project scaffold](./phase-1-throughput/01-project-scaffold.md) | ⬜ | — |
| 1.2 | [Core types & contracts](./phase-1-throughput/02-core-types.md) | ⬜ | — |
| 1.3 | [Configuration & logging](./phase-1-throughput/03-config-and-logging.md) | ⬜ | — |
| 1.4 | [Postgres connection & `positions` hypertable](./phase-1-throughput/04-postgres-schema.md) | ⬜ | — |
| 1.5 | [Redis Stream consumer (XREADGROUP)](./phase-1-throughput/05-stream-consumer.md) | ⬜ | — |
| 1.6 | [Per-device in-memory state](./phase-1-throughput/06-device-state.md) | ⬜ | — |
| 1.7 | [Position writer (batched upsert)](./phase-1-throughput/07-position-writer.md) | ⬜ | — |
| 1.8 | [Main wiring & ACK semantics](./phase-1-throughput/08-main-wiring.md) | ⬜ | — |
| 1.9 | [Observability (Prometheus metrics + /healthz + /readyz)](./phase-1-throughput/09-observability.md) | ⬜ | — |
| 1.10 | [Integration test (testcontainers Redis + Postgres)](./phase-1-throughput/10-integration-test.md) | ⬜ | — |
| 1.11 | [Dockerfile & Gitea workflow](./phase-1-throughput/11-dockerfile-and-ci.md) | ⬜ | — |
### Phase 2 — Domain logic
**Status:** ⬜ Not started — blocks on Directus schema decisions
**Outcome:** Geofence engine that detects entry/checkpoint/finish crossings; per-model Teltonika IO mapping driving derived attributes (`odometer_km`, `ignition`, etc.); timing record writer producing entries in the Directus-owned `timing_records` table; per-stage result aggregator. Layered on top of Phase 1 — no changes to the throughput pipeline.
[**See `phase-2-domain/README.md`**](./phase-2-domain/README.md)
Detailed task breakdown deferred until the Directus schema is finalized (open questions about geofence ownership, IO mapping storage, stage vocabulary). Phase 1 can ship and run on stage without any Phase 2 work.
### Phase 3 — Production hardening
**Status:** ⬜ Not started
**Outcome:** Graceful shutdown with consumer-group commit on SIGTERM; per-device state rehydration from Postgres on startup (only loaded on first packet for a given device); `XAUTOCLAIM` for stuck pending entries from a dead instance; dead-letter stream for poison records; multi-instance load-split verification; `OPERATIONS.md` runbook.
[**See `phase-3-hardening/README.md`**](./phase-3-hardening/README.md)
### Phase 4 — Future / optional
**Status:** ❄️ Not committed
[**See `phase-4-future/README.md`**](./phase-4-future/README.md)
Ideas on radar: Directus Flow trigger emission, replay tooling, derived-metric backfill, alternate consumer for analytics export.
## Operating model
- **Implementation agent contract.** Each task file is self-sufficient: goal, deliverables, specification, acceptance criteria. An agent should be able to complete one task without reading the whole wiki — but should skim the wiki references at the top of the task before starting.
- **Sequence within a phase.** Task numbering reflects intended order. Soft dependencies are explicit in each task's "Depends on" field. Tasks with no dependencies on each other can be done in parallel.
- **Status updates.** When a task is started, change its row in this ROADMAP to 🟨 and the task file's status badge accordingly. When done, 🟩 + a one-line note in the task file's "Done" section pointing at the merging commit/PR.
- **Drift control.** If implementation diverges from a task's spec, update the task file *before* the diverging code lands, with a note explaining why. Do not let plans rot — either fix the plan or fix the code.