Files
tcp-ingestion/.planning/phase-1-telemetry/08-redis-publisher.md
T
julian c33c7a4f6b Implement Phase 1 task 1.8 (Redis Streams publisher + main wiring)
- Bounded in-memory queue (default 10000); overflow throws PublishOverflowError
  so the framing layer skips ACK and the device retransmits.
- Background worker drains via XADD with MAXLEN ~ approximate trimming.
- JSON serialization with sentinel encoding for bigint/Buffer/Date; correctly
  handles Buffer.prototype.toJSON firing before the replacer.
- AdapterContext.publish(position, codec) with codec-label closure at dispatch
  in adapters/teltonika/index.ts; zero changes to the three codec parsers.
- connectRedis with retry-on-startup; main.ts wires the full pipeline.
- installGracefulShutdown stubbed (full hardening in task 1.12).
- 19 new tests (17 unit + 2 Docker-conditional integration). Total 81 passing.
2026-04-30 16:39:34 +02:00

127 lines
7.3 KiB
Markdown

# Task 1.8 — Redis Streams publisher & main wiring
**Phase:** 1 — Inbound telemetry
**Status:** 🟩 Done
**Depends on:** 1.2, 1.3, 1.4, 1.5, 1.6, 1.7
**Wiki refs:** `docs/wiki/entities/redis-streams.md`, `docs/wiki/concepts/position-record.md`
## Goal
Implement the real `publishPosition` that writes `Position` records to a Redis Stream, then wire the entire Phase 1 pipeline together in `src/main.ts`.
## Deliverables
- `src/core/publish.ts` (replacing the stub from task 1.2):
- `createPublisher(redis: Redis, config: Config, logger: Logger, metrics: Metrics): Publisher` factory.
- `Publisher.publish(p: Position): Promise<void>` that serializes and `XADD`s.
- Internal serialization helper `serializePosition(p: Position): Record<string, string>` returning the field-value pairs Redis expects.
- `src/main.ts` updated to:
1. Load config (task 1.3).
2. Build logger and metrics (tasks 1.3, 1.10).
3. Connect to Redis with retry-on-startup logic.
4. Build the publisher.
5. Build the Teltonika adapter and register codec handlers.
6. Start the TCP server.
7. Start the metrics HTTP server (task 1.10).
8. Install graceful shutdown (task 1.12 finalizes; stub here).
## Specification
### Stream record shape
`XADD telemetry:teltonika MAXLEN ~ <maxlen> * <fields>` where fields are flat key→string pairs (Redis Streams do not nest). Use a JSON-encoded `payload` field for simplicity:
```
1) ts → ISO8601 string (timestamp from the Position)
2) device_id → IMEI string
3) codec → "8" | "8E" | "16" (the codec that produced this record — useful for downstream filtering)
4) payload → JSON string of the full Position
```
The duplicated `device_id` and `ts` at the top level let downstream tools filter without parsing the JSON; `payload` is the source of truth.
### JSON serialization
`Position.attributes` contains `number | bigint | Buffer`. JSON.stringify out of the box handles `number` but not `bigint` or `Buffer`. Implement a custom replacer:
```ts
function replacer(_key: string, value: unknown): unknown {
if (typeof value === 'bigint') return { __bigint: value.toString() };
if (Buffer.isBuffer(value)) return { __buffer_b64: value.toString('base64') };
if (value instanceof Date) return value.toISOString();
return value;
}
```
The `__bigint` and `__buffer_b64` sentinels are decoded by the Processor (and any other consumer). Document this contract in the [[position-record]] page once landed.
### `XADD` options
- `MAXLEN ~ <REDIS_STREAM_MAXLEN>` — approximate trimming, much cheaper than exact.
- `*` for auto-generated message ID.
- Use a single connection (no pooling — `ioredis` multiplexes commands automatically).
### Backpressure / non-blocking property
The TCP handler is `await`-ing `ctx.publish(p)`. Two strategies:
**Option A: Direct `XADD` per record.** Simplest. Latency per publish is sub-millisecond on a healthy Redis. The risk: if Redis hangs, the TCP handler blocks → device sockets back up → Phase 1's "TCP handler never blocks" property is violated.
**Option B: Bounded in-memory queue + worker drain.** A `Promise`-based bounded queue (e.g. `p-queue` or hand-rolled). `publish()` resolves once the record is enqueued; a worker drains via `XADD`. If the queue is full, the worker has fallen behind catastrophically — at that point we have to choose: drop oldest, drop newest, or throw. Recommendation: drop newest with a structured error log + metric, because the device will retransmit (we won't ACK).
**Decision: Option B.** Specification:
- Queue capacity: 10,000 records (configurable via `PUBLISH_QUEUE_CAPACITY`).
- On overflow: do **not** publish; throw a typed `PublishOverflowError`. The framing layer (task 1.4) catches this and skips the ACK so the device retransmits.
- Worker concurrency: 1 (Redis is single-threaded per connection; concurrency just adds context-switch cost).
- Metric: `teltonika_publish_queue_depth` gauge, `teltonika_publish_overflow_total` counter.
The worker uses `XADD` with a per-call timeout (e.g. 2s) and exits the process on prolonged Redis unavailability — graceful shutdown should restart the process via the orchestrator.
### `main.ts` skeleton
```ts
async function main() {
const config = loadConfig();
const logger = createLogger(config);
const metrics = createMetrics();
const redis = await connectRedis(config, logger);
const publisher = createPublisher(redis, config, logger, metrics);
const adapter = createTeltonikaAdapter({ publisher, logger, metrics });
const server = startServer(config.TELTONIKA_PORT, adapter, { publish: publisher.publish, logger, metrics });
const metricsServer = startMetricsServer(config.METRICS_PORT, metrics);
installGracefulShutdown({ server, metricsServer, redis, publisher, logger });
logger.info({ port: config.TELTONIKA_PORT }, 'tcp-ingestion ready');
}
main().catch((err) => { console.error(err); process.exit(1); });
```
## Acceptance criteria
- [ ] Integration test: spin up a Redis (testcontainers or `redis-mock`), publish a known `Position`, `XREAD` it back, parse the JSON, and assert it equals the input (with `bigint` and `Buffer` round-tripped through the sentinel encoding).
- [ ] Overflow test: artificially block the worker, fill the queue, verify the next `publish()` rejects with `PublishOverflowError`, verify metrics increment.
- [ ] Startup test: with a wrong `REDIS_URL`, the process logs a clear error and exits non-zero.
- [ ] An end-to-end test: open a TCP client to the running server, send the canonical Codec 8 fixture, verify a Position lands on the Stream and the ACK comes back with `00 00 00 01`.
## Risks / open questions
- `redis-mock` does not implement Streams. Use testcontainers + a real Redis for integration tests.
- The bounded queue could cause backpressure concerns — discuss with the Processor team whether they prefer the device-retransmit path (overflow throw) or a soft-drop with logging. Defaulting to retransmit because it's the safer correctness choice.
## Done
Implemented in task 1.8. Key deviations from spec:
1. **Buffer.toJSON() trap**`Buffer.prototype.toJSON()` converts Buffer to `{type:'Buffer',data:[...]}` before the `JSON.stringify` replacer sees it. The replacer checks both `instanceof Uint8Array` (direct calls) and the `{type:'Buffer',data:[]}` shape (JSON.stringify path) to handle both cases. The spec's `Buffer.isBuffer(value)` check would not work here; documented in `publish.ts`.
2. **Codec label plumbing** — Chose Option B (handler wrapper), not a signature change to `CodecHandlerContext.publish`. `AdapterContext.publish` was updated to `(position, codec) => Promise<void>`; the framing layer (`index.ts`) builds a `(pos) => ctx.publish(pos, codecLabel)` closure at dispatch time. Codec parsers (codec8.ts, codec8e.ts, codec16.ts) are unchanged.
3. **`connectRedis` exported from `publish.ts`** — co-located with publisher for testability; spec showed it in main.ts but extraction is cleaner.
4. **Integration tests skipped (Docker unavailable)** — Two integration tests in `test/publish.integration.test.ts` log `"Docker not available — skipping"` and pass without executing. Will run in CI (task 1.11).
5. **`startMetricsServer` omitted from main.ts** — Task 1.10 is out of scope; placeholder metrics (stub inc/observe) used per spec. The `main.ts` skeleton in the spec included `startMetricsServer` — deferred.
Test count: 81 (was 62, +19).