Files
processor/.planning/phase-1-throughput/05-stream-consumer.md
T

94 lines
5.4 KiB
Markdown

# Task 1.5 — Redis Stream consumer (XREADGROUP)
**Phase:** 1 — Throughput pipeline
**Status:** 🟩 Done
**Depends on:** 1.2, 1.3
**Wiki refs:** `docs/wiki/entities/redis-streams.md`, `docs/wiki/entities/processor.md`
## Goal
Build the Redis Stream consumer: join the consumer group, fetch batches via `XREADGROUP`, decode each entry to a `Position`, hand off to a sink callback, and return successfully-handled IDs to the caller for `XACK`.
This task does **not** wire in the Postgres writer or the in-memory state — those are tasks 1.7 and 1.6, joined to the consumer in 1.8. The consumer accepts a `sink: (records: ConsumedRecord[]) => Promise<string[]>` callback that returns the IDs it wants ACKed. Only those IDs are ACKed; failures stay pending and get claimed on the next loop.
## Deliverables
- `src/core/consumer.ts` exporting:
- `createConsumer(redis, config, logger, metrics, sink): Consumer` — factory.
- `Consumer` interface: `start(): Promise<void>` (returns when the consumer loop starts), `stop(): Promise<void>` (signals the loop to exit, waits for the in-flight batch).
- `ensureConsumerGroup(redis, stream, group)``XGROUP CREATE ... MKSTREAM` ignoring `BUSYGROUP` errors. Called once at start.
- `type ConsumedRecord = { id: string; position: Position; codec: string; ts: string }` — what's passed to the sink.
- `test/consumer.test.ts` (mocked `ioredis`):
- Decodes a synthetic stream entry into a `ConsumedRecord` with the right shape.
- Calls `sink` with the decoded batch and ACKs only the IDs the sink returned.
- On `BUSYGROUP` error from `XGROUP CREATE`, swallows the error and continues.
- On a malformed payload, increments `consumer_decode_errors_total`, logs at `error`, and **does not** ACK the bad entry — leaves it pending for inspection.
- On `stop()`, the loop exits cleanly without losing in-flight work.
## Specification
### Consumer loop shape
```ts
async function runLoop() {
while (!stopping) {
let entries: StreamEntry[];
try {
entries = await redis.xreadgroup(
'GROUP', group, consumerName,
'COUNT', batchSize,
'BLOCK', batchBlockMs,
'STREAMS', stream, '>',
);
} catch (err) {
logger.error({ err }, 'XREADGROUP failed; backing off');
await sleep(1000);
continue;
}
if (!entries) continue; // BLOCK timeout
const records = decodeBatch(entries); // <— may emit decode errors
const ackIds = await sink(records); // <— writer + state
if (ackIds.length > 0) {
await redis.xack(stream, group, ...ackIds);
}
}
}
```
### Decode error handling
`decodeBatch` calls `decodePosition` (from task 1.2) on each entry's `payload` field. If a single entry fails to decode:
- Increment `processor_decode_errors_total{stream=...}`.
- Log at `error` with the entry ID and a truncated raw payload (first 256 chars).
- **Skip** the entry — do not pass to sink, do not ACK. It stays in the consumer's PEL (Pending Entries List) and will be re-attempted on next claim. Phase 3 will route truly-poison entries to a dead-letter stream; for Phase 1, leaving them pending and visible in `XPENDING` is enough.
### `XACK` semantics
ACK only what the sink returned. If the sink returns `['id1', 'id3']` from a batch of `[id1, id2, id3]`, then `id2` stays pending. Why a sink might return a partial list: it failed to write some records. The consumer must trust the sink's signal — never ACK speculatively.
### Consumer group setup
On `start()`:
1. `XGROUP CREATE <stream> <group> $ MKSTREAM` — creates the stream if missing, group at "now" so we don't replay history. If the group already exists, the call returns `BUSYGROUP Consumer Group name already exists` — catch and ignore.
2. Log at `info` whether the group was created or already existed.
### Why `>` not `0` for the read ID
`>` means "deliver only new entries, not pending ones for this consumer." That's what we want for the steady-state loop. Phase 3 will add an explicit `XAUTOCLAIM` step at startup (and periodically) to pull stuck pending entries from dead consumers; Phase 1 relies on the natural redelivery via consumer-group resumption (when a dead instance restarts with the same name, it sees its old PEL).
## Acceptance criteria
- [ ] `pnpm typecheck`, `pnpm lint`, `pnpm test` clean.
- [ ] Unit tests cover: happy path, `BUSYGROUP` swallow, decode error skip, partial-ACK, clean stop.
- [ ] Stop signal causes the loop to exit within one `BATCH_BLOCK_MS` tick.
## Risks / open questions
- **Consumer name uniqueness.** Two instances with the same `REDIS_CONSUMER_NAME` will both read from the same PEL, which is undefined behaviour. Task 1.3 already documents that `INSTANCE_ID` (which defaults `REDIS_CONSUMER_NAME`) must be unique per instance — surface this again in the operator-facing README later.
- **Long sink calls block the loop.** If the Postgres writer takes 30s, no new records are read. That's fine for Phase 1 (Postgres should be fast); Phase 3 may add a configurable max-in-flight if writer pressure becomes an issue.
## Done
`src/core/consumer.ts` — XREADGROUP loop with `ensureConsumerGroup`, `decodeBatch`, partial-ACK semantics, `connectRedis` (co-located, not in `src/db/`), and clean stop. `test/consumer.test.ts` — 11 tests covering happy path, partial ACK, BUSYGROUP swallow, decode error skip, missing payload skip, XREADGROUP backoff, clean stop. Landed in `68d3da3`.