ROADMAP.md establishes status legend, architectural anchors pointing at the wiki, and seven non-negotiable design rules — most importantly the core/domain boundary that protects Phase 1 from Phase 2 churn, the schema-authority split (positions hypertable owned here; everything else owned by Directus), and idempotent-writes via (device_id, ts) ON CONFLICT. Phase 1 (throughput pipeline) is fully detailed across 11 task files: scaffold, core types + sentinel decoder, config + logging, Postgres hypertable, Redis Stream consumer, per-device LRU state, batched writer, main wiring, observability, integration test, Dockerfile + Gitea CI. Observability is in Phase 1 (not deferred) — lesson learned from tcp-ingestion task 1.10. Phases 2-4 are stub READMEs. Phase 2 (domain logic) blocks on Directus schema decisions and lists those open questions explicitly. Phase 3 (production hardening) and Phase 4 (future) sketch the task shape.
4.4 KiB
Task 1.6 — Per-device in-memory state
Phase: 1 — Throughput pipeline
Status: ⬜ Not started
Depends on: 1.2
Wiki refs: docs/wiki/entities/processor.md (§ State management)
Goal
Maintain a bounded Map<device_id, DeviceState> updated on every accepted Position. Phase 1 only stores trivial state — last_position, last_seen, position_count_session — but the structure is built so Phase 2 (geofence accumulators, time-since-last-checkpoint, etc.) can extend it cleanly.
Deliverables
src/core/state.tsexporting:createDeviceStateStore(config, logger): DeviceStateStore— factory.DeviceStateStoreinterface:update(position: Position): DeviceState— applies the position, returns the new state. Touches LRU order.get(device_id: string): DeviceState | undefined— read without touching LRU order. (Used for diagnostics; the hot path usesupdate.)size(): number— for metrics.evictedTotal(): number— for metrics.
test/state.test.tscovering:- First update for a new device creates the entry; subsequent updates increment
position_count_session. - LRU eviction: with cap=3, after 4 distinct devices, the least-recently-updated is evicted.
- Eviction increments
evictedTotal(). last_seenreflects the position'stimestamp(the device-reported time), not the wall clock at update time.- Out-of-order positions (a position with
timestampolder thanlast_seen) are still applied (we don't drop them) butlast_seenonly advances forward — i.e.last_seen = max(prev_last_seen, position.timestamp). Document the rationale.
- First update for a new device creates the entry; subsequent updates increment
Specification
LRU implementation
Use a plain Map<string, DeviceState>. JavaScript Map preserves insertion order, and we exploit it: on every update, delete then set the entry — that bumps it to the most recent position in iteration order. When size() > cap, take keys().next().value (the oldest) and delete it.
This is O(1) per update and avoids a third-party LRU dependency. Do not introduce lru-cache — the standard Map trick is sufficient for Phase 1's needs.
Why last_seen = max(...), not last_seen = position.timestamp
Devices buffer records when offline and replay them in bursts (we observed a 55-record buffer flush on stage). Within a single batch, timestamps may decrease between consecutive records if the device sorted them oddly. We want last_seen to mean "highest device timestamp seen so far for this device" — that's what downstream consumers want.
What about restart?
On Processor restart, the in-memory state is empty. The first record from any device creates a fresh DeviceState. Phase 1 accepts this — it's a recovery path, not a hot path, and Phase 1 has no domain logic that would be wrong without rehydrated state.
Phase 3 (production hardening) adds rehydration: on first packet for an unknown device, query positions WHERE device_id = $1 ORDER BY ts DESC LIMIT 1 to seed last_position. That's a Phase 3 task, not Phase 1.
What state lives here, what doesn't
In Phase 1 the state is intentionally minimal:
type DeviceState = {
device_id: string;
last_position: Position;
last_seen: Date; // = max(prev, position.timestamp)
position_count_session: number; // resets on restart
};
Not in Phase 1:
- Geofence membership (Phase 2)
- Distance accumulators (Phase 2)
- Time-in-stage (Phase 2)
- Anything that would be wrong if dropped on restart (Phase 3 + rehydration)
The interface is built to extend: Phase 2 may add fields, but the existing fields and method signatures should not change.
Acceptance criteria
pnpm typecheck,pnpm lint,pnpm testclean.- LRU cap from
DEVICE_STATE_LRU_CAPconfig is respected. evictedTotal()increments correctly under eviction.last_seendoes not regress on out-of-order timestamps.
Risks / open questions
- Cap sizing. Default
DEVICE_STATE_LRU_CAP=10000. At 1KB per state entry, that's 10MB of resident memory — fine. Operators with unusually large fleets can raise it; the bound exists to prevent runaway growth from misbehaving devices flooding noveldevice_idvalues. - No mutex. State is updated only from the consumer loop, which is single-threaded. If Phase 2 introduces parallel sinks, revisit with proper synchronization.
Done
(Fill in once complete: commit SHA, brief notes.)