Files
processor/.planning/phase-1-throughput/05-stream-consumer.md
T
julian 2a50aaf175 Implement Phase 1 tasks 1.5-1.8 (consumer + state + writer + main wiring)
src/core/consumer.ts — XREADGROUP loop with consumer-group resumption,
ensureConsumerGroup (BUSYGROUP-tolerant), decodeBatch (CodecError → log
+ skip + leave pending; never speculative ACK), partial-ACK semantics,
connectRedis (mirroring tcp-ingestion's retry pattern), clean stop.

src/core/state.ts — LRU Map<device_id, DeviceState> using delete+set
bump trick (no third-party LRU dep); last_seen = max(prev, ts) so
out-of-order replays don't regress the high-water mark; evictedTotal()
counter.

src/core/writer.ts — multi-row INSERT ON CONFLICT (device_id, ts) DO
NOTHING with RETURNING. Duplicate detection by set-difference between
input and RETURNING rows (xmax=0 doesn't work for skipped-conflict
rows, only returned ones — confirmed in the task spec's own Note).
Sequential chunking to WRITE_BATCH_SIZE; bigint→string and Buffer→base64
attribute serialization that handles Buffer.toJSON shape.

src/main.ts — full pipeline: pool → migrate → redis → state → writer →
sink → consumer → graceful-shutdown stub. Sink ordering is
state.update BEFORE writer.write per spec rationale (state stays
consistent with what's been seen even if not yet persisted; redelivery
is idempotent on state). Metrics is still the trace-logging shim from
tcp-ingestion's pre-1.10 pattern; real prom-client lands in task 1.9.

Verification: typecheck, lint clean; 112 unit tests passing across 7
test files (+39 from this batch).
2026-04-30 21:49:29 +02:00

5.4 KiB

Task 1.5 — Redis Stream consumer (XREADGROUP)

Phase: 1 — Throughput pipeline Status: 🟩 Done Depends on: 1.2, 1.3 Wiki refs: docs/wiki/entities/redis-streams.md, docs/wiki/entities/processor.md

Goal

Build the Redis Stream consumer: join the consumer group, fetch batches via XREADGROUP, decode each entry to a Position, hand off to a sink callback, and return successfully-handled IDs to the caller for XACK.

This task does not wire in the Postgres writer or the in-memory state — those are tasks 1.7 and 1.6, joined to the consumer in 1.8. The consumer accepts a sink: (records: ConsumedRecord[]) => Promise<string[]> callback that returns the IDs it wants ACKed. Only those IDs are ACKed; failures stay pending and get claimed on the next loop.

Deliverables

  • src/core/consumer.ts exporting:
    • createConsumer(redis, config, logger, metrics, sink): Consumer — factory.
    • Consumer interface: start(): Promise<void> (returns when the consumer loop starts), stop(): Promise<void> (signals the loop to exit, waits for the in-flight batch).
    • ensureConsumerGroup(redis, stream, group)XGROUP CREATE ... MKSTREAM ignoring BUSYGROUP errors. Called once at start.
    • type ConsumedRecord = { id: string; position: Position; codec: string; ts: string } — what's passed to the sink.
  • test/consumer.test.ts (mocked ioredis):
    • Decodes a synthetic stream entry into a ConsumedRecord with the right shape.
    • Calls sink with the decoded batch and ACKs only the IDs the sink returned.
    • On BUSYGROUP error from XGROUP CREATE, swallows the error and continues.
    • On a malformed payload, increments consumer_decode_errors_total, logs at error, and does not ACK the bad entry — leaves it pending for inspection.
    • On stop(), the loop exits cleanly without losing in-flight work.

Specification

Consumer loop shape

async function runLoop() {
  while (!stopping) {
    let entries: StreamEntry[];
    try {
      entries = await redis.xreadgroup(
        'GROUP', group, consumerName,
        'COUNT', batchSize,
        'BLOCK', batchBlockMs,
        'STREAMS', stream, '>',
      );
    } catch (err) {
      logger.error({ err }, 'XREADGROUP failed; backing off');
      await sleep(1000);
      continue;
    }
    if (!entries) continue;  // BLOCK timeout

    const records = decodeBatch(entries);              // <— may emit decode errors
    const ackIds = await sink(records);                // <— writer + state
    if (ackIds.length > 0) {
      await redis.xack(stream, group, ...ackIds);
    }
  }
}

Decode error handling

decodeBatch calls decodePosition (from task 1.2) on each entry's payload field. If a single entry fails to decode:

  • Increment processor_decode_errors_total{stream=...}.
  • Log at error with the entry ID and a truncated raw payload (first 256 chars).
  • Skip the entry — do not pass to sink, do not ACK. It stays in the consumer's PEL (Pending Entries List) and will be re-attempted on next claim. Phase 3 will route truly-poison entries to a dead-letter stream; for Phase 1, leaving them pending and visible in XPENDING is enough.

XACK semantics

ACK only what the sink returned. If the sink returns ['id1', 'id3'] from a batch of [id1, id2, id3], then id2 stays pending. Why a sink might return a partial list: it failed to write some records. The consumer must trust the sink's signal — never ACK speculatively.

Consumer group setup

On start():

  1. XGROUP CREATE <stream> <group> $ MKSTREAM — creates the stream if missing, group at "now" so we don't replay history. If the group already exists, the call returns BUSYGROUP Consumer Group name already exists — catch and ignore.
  2. Log at info whether the group was created or already existed.

Why > not 0 for the read ID

> means "deliver only new entries, not pending ones for this consumer." That's what we want for the steady-state loop. Phase 3 will add an explicit XAUTOCLAIM step at startup (and periodically) to pull stuck pending entries from dead consumers; Phase 1 relies on the natural redelivery via consumer-group resumption (when a dead instance restarts with the same name, it sees its old PEL).

Acceptance criteria

  • pnpm typecheck, pnpm lint, pnpm test clean.
  • Unit tests cover: happy path, BUSYGROUP swallow, decode error skip, partial-ACK, clean stop.
  • Stop signal causes the loop to exit within one BATCH_BLOCK_MS tick.

Risks / open questions

  • Consumer name uniqueness. Two instances with the same REDIS_CONSUMER_NAME will both read from the same PEL, which is undefined behaviour. Task 1.3 already documents that INSTANCE_ID (which defaults REDIS_CONSUMER_NAME) must be unique per instance — surface this again in the operator-facing README later.
  • Long sink calls block the loop. If the Postgres writer takes 30s, no new records are read. That's fine for Phase 1 (Postgres should be fast); Phase 3 may add a configurable max-in-flight if writer pressure becomes an issue.

Done

src/core/consumer.ts — XREADGROUP loop with ensureConsumerGroup, decodeBatch, partial-ACK semantics, connectRedis (co-located, not in src/db/), and clean stop. test/consumer.test.ts — 11 tests covering happy path, partial ACK, BUSYGROUP swallow, decode error skip, missing payload skip, XREADGROUP backoff, clean stop. (pending commit SHA)