2a50aaf175
src/core/consumer.ts — XREADGROUP loop with consumer-group resumption, ensureConsumerGroup (BUSYGROUP-tolerant), decodeBatch (CodecError → log + skip + leave pending; never speculative ACK), partial-ACK semantics, connectRedis (mirroring tcp-ingestion's retry pattern), clean stop. src/core/state.ts — LRU Map<device_id, DeviceState> using delete+set bump trick (no third-party LRU dep); last_seen = max(prev, ts) so out-of-order replays don't regress the high-water mark; evictedTotal() counter. src/core/writer.ts — multi-row INSERT ON CONFLICT (device_id, ts) DO NOTHING with RETURNING. Duplicate detection by set-difference between input and RETURNING rows (xmax=0 doesn't work for skipped-conflict rows, only returned ones — confirmed in the task spec's own Note). Sequential chunking to WRITE_BATCH_SIZE; bigint→string and Buffer→base64 attribute serialization that handles Buffer.toJSON shape. src/main.ts — full pipeline: pool → migrate → redis → state → writer → sink → consumer → graceful-shutdown stub. Sink ordering is state.update BEFORE writer.write per spec rationale (state stays consistent with what's been seen even if not yet persisted; redelivery is idempotent on state). Metrics is still the trace-logging shim from tcp-ingestion's pre-1.10 pattern; real prom-client lands in task 1.9. Verification: typecheck, lint clean; 112 unit tests passing across 7 test files (+39 from this batch).
82 lines
4.7 KiB
Markdown
82 lines
4.7 KiB
Markdown
# Task 1.6 — Per-device in-memory state
|
|
|
|
**Phase:** 1 — Throughput pipeline
|
|
**Status:** 🟩 Done
|
|
**Depends on:** 1.2
|
|
**Wiki refs:** `docs/wiki/entities/processor.md` (§ State management)
|
|
|
|
## Goal
|
|
|
|
Maintain a bounded `Map<device_id, DeviceState>` updated on every accepted Position. Phase 1 only stores trivial state — `last_position`, `last_seen`, `position_count_session` — but the structure is built so Phase 2 (geofence accumulators, time-since-last-checkpoint, etc.) can extend it cleanly.
|
|
|
|
## Deliverables
|
|
|
|
- `src/core/state.ts` exporting:
|
|
- `createDeviceStateStore(config, logger): DeviceStateStore` — factory.
|
|
- `DeviceStateStore` interface:
|
|
- `update(position: Position): DeviceState` — applies the position, returns the new state. Touches LRU order.
|
|
- `get(device_id: string): DeviceState | undefined` — read without touching LRU order. (Used for diagnostics; the hot path uses `update`.)
|
|
- `size(): number` — for metrics.
|
|
- `evictedTotal(): number` — for metrics.
|
|
- `test/state.test.ts` covering:
|
|
- First update for a new device creates the entry; subsequent updates increment `position_count_session`.
|
|
- LRU eviction: with cap=3, after 4 distinct devices, the least-recently-updated is evicted.
|
|
- Eviction increments `evictedTotal()`.
|
|
- `last_seen` reflects the position's `timestamp` (the device-reported time), not the wall clock at update time.
|
|
- Out-of-order positions (a position with `timestamp` older than `last_seen`) are still applied (we don't drop them) but `last_seen` only advances forward — i.e. `last_seen = max(prev_last_seen, position.timestamp)`. Document the rationale.
|
|
|
|
## Specification
|
|
|
|
### LRU implementation
|
|
|
|
Use a plain `Map<string, DeviceState>`. JavaScript `Map` preserves insertion order, and we exploit it: on every `update`, `delete` then `set` the entry — that bumps it to the most recent position in iteration order. When `size() > cap`, take `keys().next().value` (the oldest) and `delete` it.
|
|
|
|
This is O(1) per update and avoids a third-party LRU dependency. **Do not** introduce `lru-cache` — the standard `Map` trick is sufficient for Phase 1's needs.
|
|
|
|
### Why `last_seen = max(...)`, not `last_seen = position.timestamp`
|
|
|
|
Devices buffer records when offline and replay them in bursts (we observed a 55-record buffer flush on stage). Within a single batch, timestamps may *decrease* between consecutive records if the device sorted them oddly. We want `last_seen` to mean "highest device timestamp seen so far for this device" — that's what downstream consumers want.
|
|
|
|
### What about restart?
|
|
|
|
On Processor restart, the in-memory state is empty. The first record from any device creates a fresh `DeviceState`. **Phase 1 accepts this** — it's a recovery path, not a hot path, and Phase 1 has no domain logic that would be wrong without rehydrated state.
|
|
|
|
Phase 3 (production hardening) adds rehydration: on first packet for an unknown device, query `positions WHERE device_id = $1 ORDER BY ts DESC LIMIT 1` to seed `last_position`. That's a Phase 3 task, not Phase 1.
|
|
|
|
### What state lives here, what doesn't
|
|
|
|
In Phase 1 the state is intentionally minimal:
|
|
|
|
```ts
|
|
type DeviceState = {
|
|
device_id: string;
|
|
last_position: Position;
|
|
last_seen: Date; // = max(prev, position.timestamp)
|
|
position_count_session: number; // resets on restart
|
|
};
|
|
```
|
|
|
|
**Not in Phase 1:**
|
|
- Geofence membership (Phase 2)
|
|
- Distance accumulators (Phase 2)
|
|
- Time-in-stage (Phase 2)
|
|
- Anything that would be wrong if dropped on restart (Phase 3 + rehydration)
|
|
|
|
The interface is built to extend: Phase 2 may add fields, but the existing fields and method signatures should not change.
|
|
|
|
## Acceptance criteria
|
|
|
|
- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
|
|
- [ ] LRU cap from `DEVICE_STATE_LRU_CAP` config is respected.
|
|
- [ ] `evictedTotal()` increments correctly under eviction.
|
|
- [ ] `last_seen` does not regress on out-of-order timestamps.
|
|
|
|
## Risks / open questions
|
|
|
|
- **Cap sizing.** Default `DEVICE_STATE_LRU_CAP=10000`. At 1KB per state entry, that's 10MB of resident memory — fine. Operators with unusually large fleets can raise it; the bound exists to prevent runaway growth from misbehaving devices flooding novel `device_id` values.
|
|
- **No mutex.** State is updated only from the consumer loop, which is single-threaded. If Phase 2 introduces parallel sinks, revisit with proper synchronization.
|
|
|
|
## Done
|
|
|
|
`src/core/state.ts` — LRU Map using delete+set bump trick, `last_seen = max(prev, position.timestamp)` semantics, `evictedTotal()` counter. `test/state.test.ts` — 14 tests covering new-device creation, session counter increment, LRU eviction at cap, LRU re-touch, evictedTotal, out-of-order timestamp rejection, get/size. *(pending commit SHA)*
|