Files
processor/.planning/phase-1-throughput/06-device-state.md
T
julian 2a50aaf175 Implement Phase 1 tasks 1.5-1.8 (consumer + state + writer + main wiring)
src/core/consumer.ts — XREADGROUP loop with consumer-group resumption,
ensureConsumerGroup (BUSYGROUP-tolerant), decodeBatch (CodecError → log
+ skip + leave pending; never speculative ACK), partial-ACK semantics,
connectRedis (mirroring tcp-ingestion's retry pattern), clean stop.

src/core/state.ts — LRU Map<device_id, DeviceState> using delete+set
bump trick (no third-party LRU dep); last_seen = max(prev, ts) so
out-of-order replays don't regress the high-water mark; evictedTotal()
counter.

src/core/writer.ts — multi-row INSERT ON CONFLICT (device_id, ts) DO
NOTHING with RETURNING. Duplicate detection by set-difference between
input and RETURNING rows (xmax=0 doesn't work for skipped-conflict
rows, only returned ones — confirmed in the task spec's own Note).
Sequential chunking to WRITE_BATCH_SIZE; bigint→string and Buffer→base64
attribute serialization that handles Buffer.toJSON shape.

src/main.ts — full pipeline: pool → migrate → redis → state → writer →
sink → consumer → graceful-shutdown stub. Sink ordering is
state.update BEFORE writer.write per spec rationale (state stays
consistent with what's been seen even if not yet persisted; redelivery
is idempotent on state). Metrics is still the trace-logging shim from
tcp-ingestion's pre-1.10 pattern; real prom-client lands in task 1.9.

Verification: typecheck, lint clean; 112 unit tests passing across 7
test files (+39 from this batch).
2026-04-30 21:49:29 +02:00

4.7 KiB

Task 1.6 — Per-device in-memory state

Phase: 1 — Throughput pipeline Status: 🟩 Done Depends on: 1.2 Wiki refs: docs/wiki/entities/processor.md (§ State management)

Goal

Maintain a bounded Map<device_id, DeviceState> updated on every accepted Position. Phase 1 only stores trivial state — last_position, last_seen, position_count_session — but the structure is built so Phase 2 (geofence accumulators, time-since-last-checkpoint, etc.) can extend it cleanly.

Deliverables

  • src/core/state.ts exporting:
    • createDeviceStateStore(config, logger): DeviceStateStore — factory.
    • DeviceStateStore interface:
      • update(position: Position): DeviceState — applies the position, returns the new state. Touches LRU order.
      • get(device_id: string): DeviceState | undefined — read without touching LRU order. (Used for diagnostics; the hot path uses update.)
      • size(): number — for metrics.
      • evictedTotal(): number — for metrics.
  • test/state.test.ts covering:
    • First update for a new device creates the entry; subsequent updates increment position_count_session.
    • LRU eviction: with cap=3, after 4 distinct devices, the least-recently-updated is evicted.
    • Eviction increments evictedTotal().
    • last_seen reflects the position's timestamp (the device-reported time), not the wall clock at update time.
    • Out-of-order positions (a position with timestamp older than last_seen) are still applied (we don't drop them) but last_seen only advances forward — i.e. last_seen = max(prev_last_seen, position.timestamp). Document the rationale.

Specification

LRU implementation

Use a plain Map<string, DeviceState>. JavaScript Map preserves insertion order, and we exploit it: on every update, delete then set the entry — that bumps it to the most recent position in iteration order. When size() > cap, take keys().next().value (the oldest) and delete it.

This is O(1) per update and avoids a third-party LRU dependency. Do not introduce lru-cache — the standard Map trick is sufficient for Phase 1's needs.

Why last_seen = max(...), not last_seen = position.timestamp

Devices buffer records when offline and replay them in bursts (we observed a 55-record buffer flush on stage). Within a single batch, timestamps may decrease between consecutive records if the device sorted them oddly. We want last_seen to mean "highest device timestamp seen so far for this device" — that's what downstream consumers want.

What about restart?

On Processor restart, the in-memory state is empty. The first record from any device creates a fresh DeviceState. Phase 1 accepts this — it's a recovery path, not a hot path, and Phase 1 has no domain logic that would be wrong without rehydrated state.

Phase 3 (production hardening) adds rehydration: on first packet for an unknown device, query positions WHERE device_id = $1 ORDER BY ts DESC LIMIT 1 to seed last_position. That's a Phase 3 task, not Phase 1.

What state lives here, what doesn't

In Phase 1 the state is intentionally minimal:

type DeviceState = {
  device_id: string;
  last_position: Position;
  last_seen: Date;                // = max(prev, position.timestamp)
  position_count_session: number; // resets on restart
};

Not in Phase 1:

  • Geofence membership (Phase 2)
  • Distance accumulators (Phase 2)
  • Time-in-stage (Phase 2)
  • Anything that would be wrong if dropped on restart (Phase 3 + rehydration)

The interface is built to extend: Phase 2 may add fields, but the existing fields and method signatures should not change.

Acceptance criteria

  • pnpm typecheck, pnpm lint, pnpm test clean.
  • LRU cap from DEVICE_STATE_LRU_CAP config is respected.
  • evictedTotal() increments correctly under eviction.
  • last_seen does not regress on out-of-order timestamps.

Risks / open questions

  • Cap sizing. Default DEVICE_STATE_LRU_CAP=10000. At 1KB per state entry, that's 10MB of resident memory — fine. Operators with unusually large fleets can raise it; the bound exists to prevent runaway growth from misbehaving devices flooding novel device_id values.
  • No mutex. State is updated only from the consumer loop, which is single-threaded. If Phase 2 introduces parallel sinks, revisit with proper synchronization.

Done

src/core/state.ts — LRU Map using delete+set bump trick, last_seen = max(prev, position.timestamp) semantics, evictedTotal() counter. test/state.test.ts — 14 tests covering new-device creation, session counter increment, LRU eviction at cap, LRU re-touch, evictedTotal, out-of-order timestamp rejection, get/size. (pending commit SHA)