Files
tcp-ingestion/.planning/phase-1-telemetry/08-redis-publisher.md
T
julian c8a5f4cd68 Add Phase 1 and Phase 2 planning documents
ROADMAP plus granular task files per phase. Phase 1 (12 tasks + 1.13
device authority) covers Codec 8/8E/16 telemetry ingestion; Phase 2
(6 tasks) covers Codec 12/14 outbound commands; Phase 3 enumerates
deferred items.
2026-04-30 15:50:49 +02:00

5.9 KiB

Task 1.8 — Redis Streams publisher & main wiring

Phase: 1 — Inbound telemetry Status: Not started Depends on: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 Wiki refs: docs/wiki/entities/redis-streams.md, docs/wiki/concepts/position-record.md

Goal

Implement the real publishPosition that writes Position records to a Redis Stream, then wire the entire Phase 1 pipeline together in src/main.ts.

Deliverables

  • src/core/publish.ts (replacing the stub from task 1.2):
    • createPublisher(redis: Redis, config: Config, logger: Logger, metrics: Metrics): Publisher factory.
    • Publisher.publish(p: Position): Promise<void> that serializes and XADDs.
    • Internal serialization helper serializePosition(p: Position): Record<string, string> returning the field-value pairs Redis expects.
  • src/main.ts updated to:
    1. Load config (task 1.3).
    2. Build logger and metrics (tasks 1.3, 1.10).
    3. Connect to Redis with retry-on-startup logic.
    4. Build the publisher.
    5. Build the Teltonika adapter and register codec handlers.
    6. Start the TCP server.
    7. Start the metrics HTTP server (task 1.10).
    8. Install graceful shutdown (task 1.12 finalizes; stub here).

Specification

Stream record shape

XADD telemetry:teltonika MAXLEN ~ <maxlen> * <fields> where fields are flat key→string pairs (Redis Streams do not nest). Use a JSON-encoded payload field for simplicity:

1) ts        → ISO8601 string (timestamp from the Position)
2) device_id → IMEI string
3) codec     → "8" | "8E" | "16" (the codec that produced this record — useful for downstream filtering)
4) payload   → JSON string of the full Position

The duplicated device_id and ts at the top level let downstream tools filter without parsing the JSON; payload is the source of truth.

JSON serialization

Position.attributes contains number | bigint | Buffer. JSON.stringify out of the box handles number but not bigint or Buffer. Implement a custom replacer:

function replacer(_key: string, value: unknown): unknown {
  if (typeof value === 'bigint') return { __bigint: value.toString() };
  if (Buffer.isBuffer(value))    return { __buffer_b64: value.toString('base64') };
  if (value instanceof Date)     return value.toISOString();
  return value;
}

The __bigint and __buffer_b64 sentinels are decoded by the Processor (and any other consumer). Document this contract in the position-record page once landed.

XADD options

  • MAXLEN ~ <REDIS_STREAM_MAXLEN> — approximate trimming, much cheaper than exact.
  • * for auto-generated message ID.
  • Use a single connection (no pooling — ioredis multiplexes commands automatically).

Backpressure / non-blocking property

The TCP handler is await-ing ctx.publish(p). Two strategies:

Option A: Direct XADD per record. Simplest. Latency per publish is sub-millisecond on a healthy Redis. The risk: if Redis hangs, the TCP handler blocks → device sockets back up → Phase 1's "TCP handler never blocks" property is violated.

Option B: Bounded in-memory queue + worker drain. A Promise-based bounded queue (e.g. p-queue or hand-rolled). publish() resolves once the record is enqueued; a worker drains via XADD. If the queue is full, the worker has fallen behind catastrophically — at that point we have to choose: drop oldest, drop newest, or throw. Recommendation: drop newest with a structured error log + metric, because the device will retransmit (we won't ACK).

Decision: Option B. Specification:

  • Queue capacity: 10,000 records (configurable via PUBLISH_QUEUE_CAPACITY).
  • On overflow: do not publish; throw a typed PublishOverflowError. The framing layer (task 1.4) catches this and skips the ACK so the device retransmits.
  • Worker concurrency: 1 (Redis is single-threaded per connection; concurrency just adds context-switch cost).
  • Metric: teltonika_publish_queue_depth gauge, teltonika_publish_overflow_total counter.

The worker uses XADD with a per-call timeout (e.g. 2s) and exits the process on prolonged Redis unavailability — graceful shutdown should restart the process via the orchestrator.

main.ts skeleton

async function main() {
  const config = loadConfig();
  const logger = createLogger(config);
  const metrics = createMetrics();
  const redis = await connectRedis(config, logger);
  const publisher = createPublisher(redis, config, logger, metrics);
  const adapter = createTeltonikaAdapter({ publisher, logger, metrics });
  const server = startServer(config.TELTONIKA_PORT, adapter, { publish: publisher.publish, logger, metrics });
  const metricsServer = startMetricsServer(config.METRICS_PORT, metrics);
  installGracefulShutdown({ server, metricsServer, redis, publisher, logger });
  logger.info({ port: config.TELTONIKA_PORT }, 'tcp-ingestion ready');
}

main().catch((err) => { console.error(err); process.exit(1); });

Acceptance criteria

  • Integration test: spin up a Redis (testcontainers or redis-mock), publish a known Position, XREAD it back, parse the JSON, and assert it equals the input (with bigint and Buffer round-tripped through the sentinel encoding).
  • Overflow test: artificially block the worker, fill the queue, verify the next publish() rejects with PublishOverflowError, verify metrics increment.
  • Startup test: with a wrong REDIS_URL, the process logs a clear error and exits non-zero.
  • An end-to-end test: open a TCP client to the running server, send the canonical Codec 8 fixture, verify a Position lands on the Stream and the ACK comes back with 00 00 00 01.

Risks / open questions

  • redis-mock does not implement Streams. Use testcontainers + a real Redis for integration tests.
  • The bounded queue could cause backpressure concerns — discuss with the Processor team whether they prefer the device-retransmit path (overflow throw) or a soft-drop with logging. Defaulting to retransmit because it's the safer correctness choice.

Done

(Fill in once complete.)