processor/.planning/phase-1-throughput/05-stream-consumer.md

# Task 1.5 — Redis Stream consumer (XREADGROUP)

**Phase:** 1 — Throughput pipeline
**Status:** 🟩 Done
**Depends on:** 1.2, 1.3
**Wiki refs:** `docs/wiki/entities/redis-streams.md`, `docs/wiki/entities/processor.md`

## Goal

Build the Redis Stream consumer: join the consumer group, fetch batches via `XREADGROUP`, decode each entry to a `Position`, hand off to a sink callback, and return successfully-handled IDs to the caller for `XACK`.

This task does **not** wire in the Postgres writer or the in-memory state — those are tasks 1.7 and 1.6, joined to the consumer in 1.8. The consumer accepts a `sink: (records: ConsumedRecord[]) => Promise<string[]>` callback that returns the IDs it wants ACKed. Only those IDs are ACKed; failures stay pending and get claimed on the next loop.

## Deliverables

- `src/core/consumer.ts` exporting:
  - `createConsumer(redis, config, logger, metrics, sink): Consumer` — factory.
  - `Consumer` interface: `start(): Promise<void>` (returns when the consumer loop starts), `stop(): Promise<void>` (signals the loop to exit, waits for the in-flight batch).
  - `ensureConsumerGroup(redis, stream, group)` — `XGROUP CREATE ... MKSTREAM` ignoring `BUSYGROUP` errors. Called once at start.
  - `type ConsumedRecord = { id: string; position: Position; codec: string; ts: string }` — what's passed to the sink.
- `test/consumer.test.ts` (mocked `ioredis`):
  - Decodes a synthetic stream entry into a `ConsumedRecord` with the right shape.
  - Calls `sink` with the decoded batch and ACKs only the IDs the sink returned.
  - On `BUSYGROUP` error from `XGROUP CREATE`, swallows the error and continues.
  - On a malformed payload, increments `consumer_decode_errors_total`, logs at `error`, and **does not** ACK the bad entry — leaves it pending for inspection.
  - On `stop()`, the loop exits cleanly without losing in-flight work.

## Specification

### Consumer loop shape

```ts
async function runLoop() {
  while (!stopping) {
    let entries: StreamEntry[];
    try {
      entries = await redis.xreadgroup(
        'GROUP', group, consumerName,
        'COUNT', batchSize,
        'BLOCK', batchBlockMs,
        'STREAMS', stream, '>',
      );
    } catch (err) {
      logger.error({ err }, 'XREADGROUP failed; backing off');
      await sleep(1000);
      continue;
    }
    if (!entries) continue;  // BLOCK timeout

    const records = decodeBatch(entries);              // <— may emit decode errors
    const ackIds = await sink(records);                // <— writer + state
    if (ackIds.length > 0) {
      await redis.xack(stream, group, ...ackIds);
    }
  }
}
```

### Decode error handling

`decodeBatch` calls `decodePosition` (from task 1.2) on each entry's `payload` field. If a single entry fails to decode:
- Increment `processor_decode_errors_total{stream=...}`.
- Log at `error` with the entry ID and a truncated raw payload (first 256 chars).
- **Skip** the entry — do not pass to sink, do not ACK. It stays in the consumer's PEL (Pending Entries List) and will be re-attempted on next claim. Phase 3 will route truly-poison entries to a dead-letter stream; for Phase 1, leaving them pending and visible in `XPENDING` is enough.

### `XACK` semantics

ACK only what the sink returned. If the sink returns `['id1', 'id3']` from a batch of `[id1, id2, id3]`, then `id2` stays pending. Why a sink might return a partial list: it failed to write some records. The consumer must trust the sink's signal — never ACK speculatively.

### Consumer group setup

On `start()`:
1. `XGROUP CREATE <stream> <group> $ MKSTREAM` — creates the stream if missing, group at "now" so we don't replay history. If the group already exists, the call returns `BUSYGROUP Consumer Group name already exists` — catch and ignore.
2. Log at `info` whether the group was created or already existed.

### Why `>` not `0` for the read ID

`>` means "deliver only new entries, not pending ones for this consumer." That's what we want for the steady-state loop. Phase 3 will add an explicit `XAUTOCLAIM` step at startup (and periodically) to pull stuck pending entries from dead consumers; Phase 1 relies on the natural redelivery via consumer-group resumption (when a dead instance restarts with the same name, it sees its old PEL).

## Acceptance criteria

- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
- [ ] Unit tests cover: happy path, `BUSYGROUP` swallow, decode error skip, partial-ACK, clean stop.
- [ ] Stop signal causes the loop to exit within one `BATCH_BLOCK_MS` tick.

## Risks / open questions

- **Consumer name uniqueness.** Two instances with the same `REDIS_CONSUMER_NAME` will both read from the same PEL, which is undefined behaviour. Task 1.3 already documents that `INSTANCE_ID` (which defaults `REDIS_CONSUMER_NAME`) must be unique per instance — surface this again in the operator-facing README later.
- **Long sink calls block the loop.** If the Postgres writer takes 30s, no new records are read. That's fine for Phase 1 (Postgres should be fast); Phase 3 may add a configurable max-in-flight if writer pressure becomes an issue.

## Done

`src/core/consumer.ts` — XREADGROUP loop with `ensureConsumerGroup`, `decodeBatch`, partial-ACK semantics, `connectRedis` (co-located, not in `src/db/`), and clean stop. `test/consumer.test.ts` — 11 tests covering happy path, partial ACK, BUSYGROUP swallow, decode error skip, missing payload skip, XREADGROUP backoff, clean stop. Landed in `68d3da3`.