fa50df3e27
Tasks 1.5.4, 1.5.5, 1.5.6 marked 🟩 with commit hashes and implementation
notes. Phase 1.5 status updated to Done in ROADMAP.md.
107 lines
8.7 KiB
Markdown
107 lines
8.7 KiB
Markdown
# processor — Roadmap
|
|
|
|
A Node.js worker service that consumes normalized `Position` records from a Redis Stream, maintains per-device runtime state, applies racing-domain rules, and writes durable state to Postgres / TimescaleDB.
|
|
|
|
This file is the single navigation hub for all implementation planning. Each phase has its own folder with a README and granular task files. Update statuses here as work lands.
|
|
|
|
## Status legend
|
|
|
|
| Symbol | Meaning |
|
|
|--------|---------|
|
|
| ⬜ | Not started |
|
|
| 🟦 | Planned (designed, not coded) |
|
|
| 🟨 | In progress |
|
|
| 🟩 | Done |
|
|
| ⏸ | Paused / blocked |
|
|
| ❄️ | Frozen / future / optional |
|
|
|
|
## Architectural anchors
|
|
|
|
The service is specified by the wiki at `../docs/wiki/`. Implementing agents should read these pages before starting any task:
|
|
|
|
- **Architecture** — `docs/wiki/sources/gps-tracking-architecture.md`, `docs/wiki/concepts/plane-separation.md`, `docs/wiki/concepts/failure-domains.md`
|
|
- **This service** — `docs/wiki/entities/processor.md`
|
|
- **Upstream contract (input)** — `docs/wiki/concepts/position-record.md`, `docs/wiki/concepts/io-element-bag.md`, `docs/wiki/entities/redis-streams.md`
|
|
- **Downstream contract (output)** — `docs/wiki/entities/postgres-timescaledb.md`, `docs/wiki/entities/directus.md`
|
|
|
|
## Non-negotiable design rules
|
|
|
|
These rules govern every task. Any deviation must be discussed and documented as a decision before code lands.
|
|
|
|
1. **Domain logic is isolated.** `src/core/` (Stream consumer, Postgres writer, in-memory state plumbing) never imports from `src/domain/` (geofence engine, timing logic, IO interpretation). Phase 2 must be a pure addition layered on top of the Phase 1 throughput pipeline.
|
|
2. **Schema authority lives in Directus**, with one exception: the `positions` hypertable is bulk-written by this service and its migration is owned here. All other tables Processor writes to (timing_records, stage_results, etc.) are defined in Directus and Processor inserts respecting that schema.
|
|
3. **Per-device state is in-memory, not durable.** The DB is the source of truth for replay/analysis; in-memory state is the source of truth for the current decision. On restart, hot state is rehydrated from the DB — Phase 1 does not implement rehydration; restart loses state, which is acceptable until Phase 2 introduces stateful accumulators.
|
|
4. **Consumer-group offsets drive work distribution.** No application-level coordination between Processor instances. `XACK` on success; failed batches stay pending and are claimed by surviving instances via `XAUTOCLAIM`.
|
|
5. **Idempotent writes.** Records arriving twice (after a claim, replay, or retry) must not produce duplicate rows. The `positions` hypertable uses `(device_id, ts)` as a unique key with `ON CONFLICT DO NOTHING`. Derived tables follow the same pattern, scoped by their natural keys.
|
|
6. **Bounded memory.** Per-device state is capped (LRU eviction by last-seen timestamp); replay-from-DB rehydrates an evicted device on next packet. No unbounded `Map<imei, ...>` growth.
|
|
7. **Fail loudly.** Schema-incompatible records (e.g. malformed payload, unknown sentinel) are logged at `error` and dead-letter-streamed (Phase 3); they do **not** silently skip.
|
|
|
|
## Phases
|
|
|
|
### Phase 1 — Throughput pipeline
|
|
|
|
**Status:** 🟩 Done
|
|
**Outcome:** A Node.js Processor that joins a Redis Streams consumer group on `telemetry:teltonika`, decodes each `Position` (including `__bigint`/`__buffer_b64` sentinel reversal), upserts it into a TimescaleDB `positions` hypertable, updates per-device in-memory state (last position, last seen), `XACK`s on successful write, and exposes Prometheus metrics + health/readiness HTTP endpoints. End-to-end pilot-quality service; no domain logic yet.
|
|
|
|
[**See `phase-1-throughput/README.md`**](./phase-1-throughput/README.md)
|
|
|
|
| # | Task | Status | Landed in |
|
|
|---|------|--------|-----------|
|
|
| 1.1 | [Project scaffold](./phase-1-throughput/01-project-scaffold.md) | 🟩 | `290a08e` |
|
|
| 1.2 | [Core types & contracts](./phase-1-throughput/02-core-types.md) | 🟩 | `290a08e` |
|
|
| 1.3 | [Configuration & logging](./phase-1-throughput/03-config-and-logging.md) | 🟩 | `290a08e` |
|
|
| 1.4 | [Postgres connection & `positions` hypertable](./phase-1-throughput/04-postgres-schema.md) | 🟩 | `290a08e` |
|
|
| 1.5 | [Redis Stream consumer (XREADGROUP)](./phase-1-throughput/05-stream-consumer.md) | 🟩 | `68d3da3` |
|
|
| 1.6 | [Per-device in-memory state](./phase-1-throughput/06-device-state.md) | 🟩 | `68d3da3` |
|
|
| 1.7 | [Position writer (batched upsert)](./phase-1-throughput/07-position-writer.md) | 🟩 | `68d3da3` |
|
|
| 1.8 | [Main wiring & ACK semantics](./phase-1-throughput/08-main-wiring.md) | 🟩 | `68d3da3` |
|
|
| 1.9 | [Observability (Prometheus metrics + /healthz + /readyz)](./phase-1-throughput/09-observability.md) | 🟩 | `9791620` |
|
|
| 1.10 | [Integration test (testcontainers Redis + Postgres)](./phase-1-throughput/10-integration-test.md) | 🟩 | `9791620` |
|
|
| 1.11 | [Dockerfile & Gitea workflow](./phase-1-throughput/11-dockerfile-and-ci.md) | 🟩 | `9791620` |
|
|
|
|
### Phase 1.5 — Live broadcast
|
|
|
|
**Status:** 🟩 Done
|
|
**Outcome:** WebSocket endpoint inside the Processor that fans live position updates from Redis to subscribed [[react-spa]] clients. Cookie-based auth via Directus's `/users/me`, per-event subscription with one-time authorization at subscribe time, snapshot-on-subscribe, multi-instance per-instance consumer-group fan-out. The wire spec is `docs/wiki/synthesis/processor-ws-contract.md`. Unblocks the SPA's live-map feature for the Rally Albania 2026 dogfood.
|
|
|
|
[**See `phase-1-5-live-broadcast/README.md`**](./phase-1-5-live-broadcast/README.md)
|
|
|
|
| # | Task | Status | Landed in |
|
|
|---|------|--------|-----------|
|
|
| 1.5.1 | [WS server scaffold + heartbeat](./phase-1-5-live-broadcast/01-ws-server-scaffold.md) | 🟩 | `b8ebbd0` |
|
|
| 1.5.2 | [Cookie auth handshake](./phase-1-5-live-broadcast/02-cookie-auth-handshake.md) | 🟩 | `190254d` |
|
|
| 1.5.3 | [Subscription registry & per-event authorization](./phase-1-5-live-broadcast/03-subscription-registry.md) | 🟩 | `38de4bc` |
|
|
| 1.5.4 | [Broadcast consumer group & fan-out](./phase-1-5-live-broadcast/04-broadcast-consumer-group.md) | 🟩 | `c07ea0e` |
|
|
| 1.5.5 | [Snapshot-on-subscribe](./phase-1-5-live-broadcast/05-snapshot-on-subscribe.md) | 🟩 | `f4b50ca` |
|
|
| 1.5.6 | [Integration test (testcontainers Redis + Postgres + Directus stub)](./phase-1-5-live-broadcast/06-integration-test.md) | 🟩 | `2f2cf5c` |
|
|
|
|
### Phase 2 — Domain logic
|
|
|
|
**Status:** ⬜ Not started — blocks on Directus schema decisions
|
|
**Outcome:** Geofence engine that detects entry/checkpoint/finish crossings; per-model Teltonika IO mapping driving derived attributes (`odometer_km`, `ignition`, etc.); timing record writer producing entries in the Directus-owned `timing_records` table; per-stage result aggregator. Layered on top of Phase 1 — no changes to the throughput pipeline.
|
|
|
|
[**See `phase-2-domain/README.md`**](./phase-2-domain/README.md)
|
|
|
|
Detailed task breakdown deferred until the Directus schema is finalized (open questions about geofence ownership, IO mapping storage, stage vocabulary). Phase 1 can ship and run on stage without any Phase 2 work.
|
|
|
|
### Phase 3 — Production hardening
|
|
|
|
**Status:** ⬜ Not started
|
|
**Outcome:** Graceful shutdown with consumer-group commit on SIGTERM; per-device state rehydration from Postgres on startup (only loaded on first packet for a given device); `XAUTOCLAIM` for stuck pending entries from a dead instance; dead-letter stream for poison records; multi-instance load-split verification; `OPERATIONS.md` runbook.
|
|
|
|
[**See `phase-3-hardening/README.md`**](./phase-3-hardening/README.md)
|
|
|
|
### Phase 4 — Future / optional
|
|
|
|
**Status:** ❄️ Not committed
|
|
[**See `phase-4-future/README.md`**](./phase-4-future/README.md)
|
|
|
|
Ideas on radar: Directus Flow trigger emission, replay tooling, derived-metric backfill, alternate consumer for analytics export.
|
|
|
|
## Operating model
|
|
|
|
- **Implementation agent contract.** Each task file is self-sufficient: goal, deliverables, specification, acceptance criteria. An agent should be able to complete one task without reading the whole wiki — but should skim the wiki references at the top of the task before starting.
|
|
- **Sequence within a phase.** Task numbering reflects intended order. Soft dependencies are explicit in each task's "Depends on" field. Tasks with no dependencies on each other can be done in parallel.
|
|
- **Status updates.** When a task is started, change its row in this ROADMAP to 🟨 and the task file's status badge accordingly. When done, 🟩 + a one-line note in the task file's "Done" section pointing at the merging commit/PR.
|
|
- **Drift control.** If implementation diverges from a task's spec, update the task file *before* the diverging code lands, with a note explaining why. Do not let plans rot — either fix the plan or fix the code.
|