2a50aaf175
src/core/consumer.ts — XREADGROUP loop with consumer-group resumption, ensureConsumerGroup (BUSYGROUP-tolerant), decodeBatch (CodecError → log + skip + leave pending; never speculative ACK), partial-ACK semantics, connectRedis (mirroring tcp-ingestion's retry pattern), clean stop. src/core/state.ts — LRU Map<device_id, DeviceState> using delete+set bump trick (no third-party LRU dep); last_seen = max(prev, ts) so out-of-order replays don't regress the high-water mark; evictedTotal() counter. src/core/writer.ts — multi-row INSERT ON CONFLICT (device_id, ts) DO NOTHING with RETURNING. Duplicate detection by set-difference between input and RETURNING rows (xmax=0 doesn't work for skipped-conflict rows, only returned ones — confirmed in the task spec's own Note). Sequential chunking to WRITE_BATCH_SIZE; bigint→string and Buffer→base64 attribute serialization that handles Buffer.toJSON shape. src/main.ts — full pipeline: pool → migrate → redis → state → writer → sink → consumer → graceful-shutdown stub. Sink ordering is state.update BEFORE writer.write per spec rationale (state stays consistent with what's been seen even if not yet persisted; redelivery is idempotent on state). Metrics is still the trace-logging shim from tcp-ingestion's pre-1.10 pattern; real prom-client lands in task 1.9. Verification: typecheck, lint clean; 112 unit tests passing across 7 test files (+39 from this batch).
101 lines
5.8 KiB
Markdown
101 lines
5.8 KiB
Markdown
# Task 1.8 — Main wiring & ACK semantics
|
|
|
|
**Phase:** 1 — Throughput pipeline
|
|
**Status:** 🟩 Done
|
|
**Depends on:** 1.5, 1.6, 1.7
|
|
**Wiki refs:** `docs/wiki/entities/processor.md`
|
|
|
|
## Goal
|
|
|
|
Assemble the throughput pipeline in `src/main.ts`: connect Redis + Postgres → run migrations → build the device-state store → build the writer → build the consumer with a sink that calls `state.update()` then `writer.write()` → start. Establish the rule for what to ACK and when.
|
|
|
|
## Deliverables
|
|
|
|
- `src/main.ts` updated to:
|
|
1. `loadConfig()` (from task 1.3).
|
|
2. `createLogger()` (from task 1.3).
|
|
3. `createPool(config.POSTGRES_URL)` and `connectWithRetry()` (from task 1.4).
|
|
4. Run migrations via `migrate()` (from task 1.4) before any consumer activity.
|
|
5. Connect Redis with `connectRedis(...)` (re-implement the `tcp-ingestion` retry pattern; small enough to copy).
|
|
6. Build `state = createDeviceStateStore(config, logger)`.
|
|
7. Build `writer = createWriter(pool, config, logger, metrics)`.
|
|
8. Build `consumer = createConsumer(redis, config, logger, metrics, sink)` where `sink` is the function defined below.
|
|
9. `await consumer.start()`.
|
|
10. Install graceful shutdown stub (full Phase 3 hardening later): on SIGTERM/SIGINT, call `consumer.stop()`, await pending writes, close Redis + Pool, exit.
|
|
- `src/main.ts` defines the **sink function** (the central decision point):
|
|
|
|
```ts
|
|
async function sink(records: ConsumedRecord[]): Promise<string[]> {
|
|
// 1. Update in-memory state for every record (cheap, synchronous, can't fail meaningfully)
|
|
for (const r of records) state.update(r.position);
|
|
|
|
// 2. Write to Postgres
|
|
const results = await writer.write(records);
|
|
|
|
// 3. ACK only the IDs that succeeded or were duplicates
|
|
return results
|
|
.filter(r => r.status === 'inserted' || r.status === 'duplicate')
|
|
.map(r => r.id);
|
|
}
|
|
```
|
|
|
|
- A placeholder `metrics` shim — the same trace-logging stub as `tcp-ingestion` originally had (task 1.9 replaces it with prom-client). Use `Metrics` from `src/core/types.ts`.
|
|
|
|
## Specification
|
|
|
|
### State update happens before write — by design
|
|
|
|
The sink updates `state` first, *then* writes. If the write fails:
|
|
- The state update has already happened.
|
|
- The record is not ACKed, so it stays pending.
|
|
- On re-delivery (same instance retries, or another instance claims), the record will be processed again.
|
|
- `state.update` is idempotent for a given position (same record applied twice produces the same `last_position`, only `position_count_session` is double-counted — and that's a session counter that resets on restart anyway, so it's a non-issue).
|
|
|
|
If we wrote *first* and updated state second, a successful write followed by a state-update crash would leave Postgres ahead of state — but state is hot-path, so that's worse. The chosen order keeps state consistent with what's been seen, even if not yet persisted.
|
|
|
|
### What the sink does NOT do
|
|
|
|
- **No business logic.** No "is this a finish-line crossing" detection. That's Phase 2's domain.
|
|
- **No multi-stream fanout.** No publishing to derived streams (e.g. for the SPA). The Phase 1 model is: positions go into Postgres, Directus reads them and pushes via WebSocket. If that fanout proves insufficient at the SPA layer, Phase 4 considers a dedicated WebSocket gateway reading from Redis directly.
|
|
|
|
### Graceful shutdown — Phase 1 stub vs. Phase 3 final
|
|
|
|
Phase 1 stub is enough to not lose data in the common case:
|
|
1. Catch SIGTERM/SIGINT.
|
|
2. `consumer.stop()` — exits the read loop after the current batch.
|
|
3. Await any in-flight `writer.write()`.
|
|
4. `redis.quit()` and `pool.end()`.
|
|
5. `process.exit(0)`.
|
|
6. Force-exit timer at 15s as a backstop.
|
|
|
|
What Phase 1 does NOT do (deferred to Phase 3):
|
|
- Explicit consumer-group offset commit on SIGTERM (the current model relies on `XACK` after each successful write, which is already the right thing — but Phase 3 documents and tests this rigorously).
|
|
- Uncaught exception / unhandled rejection handlers that flush state to logs before crashing.
|
|
- Multi-instance coordination on shutdown (drain mode).
|
|
|
|
### Logger shape
|
|
|
|
Match `tcp-ingestion`'s convention:
|
|
- `info` for lifecycle: `processor starting`, `Postgres connected`, `Redis connected`, `migrations applied`, `consumer started on stream X group Y consumer Z`, `processor ready`.
|
|
- `debug` for per-batch: `batch consumed n=42`, `batch written inserted=40 duplicates=2 failed=0`.
|
|
- `warn` / `error` for the obvious.
|
|
|
|
After this task lands you should be able to run `pnpm dev` against a local Redis + Postgres, publish a synthetic `Position` to `telemetry:t`, and watch a row appear in `positions` while seeing the lifecycle logs above.
|
|
|
|
## Acceptance criteria
|
|
|
|
- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
|
|
- [ ] `pnpm dev` (with local Redis + Postgres reachable) shows the lifecycle log sequence and `processor ready`.
|
|
- [ ] Manually publishing a `Position` to `telemetry:t` results in a row in `positions` within seconds.
|
|
- [ ] SIGTERM during idle exits cleanly (no error, no force-exit warning).
|
|
- [ ] SIGTERM with in-flight writes waits for them to complete before exiting.
|
|
|
|
## Risks / open questions
|
|
|
|
- **`metrics` placeholder is intentional.** Don't try to wire prom-client here; that's task 1.9. Use the trace-logging shim from `tcp-ingestion`'s pre-1.10 `main.ts` as the model.
|
|
- **Migration during deploy.** Phase 1 runs migrations on every startup. With multiple instances, two starting at once both try to migrate — Postgres advisory locks would solve this. **Defer to Phase 3** (it's a Production hardening concern); for the pilot with one instance, this is fine. Document the limitation.
|
|
|
|
## Done
|
|
|
|
`src/main.ts` — full pipeline wiring: Postgres pool → migrations → Redis → state store → writer → sink → consumer → graceful shutdown stub. Metrics shim uses `logger.trace`. Sink ordering: state.update before writer.write per spec. *(pending commit SHA)*
|