Add Phase 1 and Phase 2 planning documents

ROADMAP plus granular task files per phase. Phase 1 (12 tasks + 1.13 device authority) covers Codec 8/8E/16 telemetry ingestion; Phase 2 (6 tasks) covers Codec 12/14 outbound commands; Phase 3 enumerates deferred items.
2026-04-30 15:47:06 +02:00
parent 95e60a2c75
commit c8a5f4cd68
23 changed files with 2508 additions and 0 deletions
@@ -0,0 +1,114 @@
+# Task 1.8 — Redis Streams publisher & main wiring
+
+**Phase:** 1 — Inbound telemetry
+**Status:** ⬜ Not started
+**Depends on:** 1.2, 1.3, 1.4, 1.5, 1.6, 1.7
+**Wiki refs:** `docs/wiki/entities/redis-streams.md`, `docs/wiki/concepts/position-record.md`
+
+## Goal
+
+Implement the real `publishPosition` that writes `Position` records to a Redis Stream, then wire the entire Phase 1 pipeline together in `src/main.ts`.
+
+## Deliverables
+
+- `src/core/publish.ts` (replacing the stub from task 1.2):
+  - `createPublisher(redis: Redis, config: Config, logger: Logger, metrics: Metrics): Publisher` factory.
+  - `Publisher.publish(p: Position): Promise<void>` that serializes and `XADD`s.
+  - Internal serialization helper `serializePosition(p: Position): Record<string, string>` returning the field-value pairs Redis expects.
+- `src/main.ts` updated to:
+  1. Load config (task 1.3).
+  2. Build logger and metrics (tasks 1.3, 1.10).
+  3. Connect to Redis with retry-on-startup logic.
+  4. Build the publisher.
+  5. Build the Teltonika adapter and register codec handlers.
+  6. Start the TCP server.
+  7. Start the metrics HTTP server (task 1.10).
+  8. Install graceful shutdown (task 1.12 finalizes; stub here).
+
+## Specification
+
+### Stream record shape
+
+`XADD telemetry:teltonika MAXLEN ~ <maxlen> * <fields>` where fields are flat key→string pairs (Redis Streams do not nest). Use a JSON-encoded `payload` field for simplicity:
+
+```
+1) ts        → ISO8601 string (timestamp from the Position)
+2) device_id → IMEI string
+3) codec     → "8" | "8E" | "16" (the codec that produced this record — useful for downstream filtering)
+4) payload   → JSON string of the full Position
+```
+
+The duplicated `device_id` and `ts` at the top level let downstream tools filter without parsing the JSON; `payload` is the source of truth.
+
+### JSON serialization
+
+`Position.attributes` contains `number | bigint | Buffer`. JSON.stringify out of the box handles `number` but not `bigint` or `Buffer`. Implement a custom replacer:
+
+```ts
+function replacer(_key: string, value: unknown): unknown {
+  if (typeof value === 'bigint') return { __bigint: value.toString() };
+  if (Buffer.isBuffer(value))    return { __buffer_b64: value.toString('base64') };
+  if (value instanceof Date)     return value.toISOString();
+  return value;
+}
+```
+
+The `__bigint` and `__buffer_b64` sentinels are decoded by the Processor (and any other consumer). Document this contract in the [[position-record]] page once landed.
+
+### `XADD` options
+
+- `MAXLEN ~ <REDIS_STREAM_MAXLEN>` — approximate trimming, much cheaper than exact.
+- `*` for auto-generated message ID.
+- Use a single connection (no pooling — `ioredis` multiplexes commands automatically).
+
+### Backpressure / non-blocking property
+
+The TCP handler is `await`-ing `ctx.publish(p)`. Two strategies:
+
+**Option A: Direct `XADD` per record.** Simplest. Latency per publish is sub-millisecond on a healthy Redis. The risk: if Redis hangs, the TCP handler blocks → device sockets back up → Phase 1's "TCP handler never blocks" property is violated.
+
+**Option B: Bounded in-memory queue + worker drain.** A `Promise`-based bounded queue (e.g. `p-queue` or hand-rolled). `publish()` resolves once the record is enqueued; a worker drains via `XADD`. If the queue is full, the worker has fallen behind catastrophically — at that point we have to choose: drop oldest, drop newest, or throw. Recommendation: drop newest with a structured error log + metric, because the device will retransmit (we won't ACK).
+
+**Decision: Option B.** Specification:
+
+- Queue capacity: 10,000 records (configurable via `PUBLISH_QUEUE_CAPACITY`).
+- On overflow: do **not** publish; throw a typed `PublishOverflowError`. The framing layer (task 1.4) catches this and skips the ACK so the device retransmits.
+- Worker concurrency: 1 (Redis is single-threaded per connection; concurrency just adds context-switch cost).
+- Metric: `teltonika_publish_queue_depth` gauge, `teltonika_publish_overflow_total` counter.
+
+The worker uses `XADD` with a per-call timeout (e.g. 2s) and exits the process on prolonged Redis unavailability — graceful shutdown should restart the process via the orchestrator.
+
+### `main.ts` skeleton
+
+```ts
+async function main() {
+  const config = loadConfig();
+  const logger = createLogger(config);
+  const metrics = createMetrics();
+  const redis = await connectRedis(config, logger);
+  const publisher = createPublisher(redis, config, logger, metrics);
+  const adapter = createTeltonikaAdapter({ publisher, logger, metrics });
+  const server = startServer(config.TELTONIKA_PORT, adapter, { publish: publisher.publish, logger, metrics });
+  const metricsServer = startMetricsServer(config.METRICS_PORT, metrics);
+  installGracefulShutdown({ server, metricsServer, redis, publisher, logger });
+  logger.info({ port: config.TELTONIKA_PORT }, 'tcp-ingestion ready');
+}
+
+main().catch((err) => { console.error(err); process.exit(1); });
+```
+
+## Acceptance criteria
+
+- [ ] Integration test: spin up a Redis (testcontainers or `redis-mock`), publish a known `Position`, `XREAD` it back, parse the JSON, and assert it equals the input (with `bigint` and `Buffer` round-tripped through the sentinel encoding).
+- [ ] Overflow test: artificially block the worker, fill the queue, verify the next `publish()` rejects with `PublishOverflowError`, verify metrics increment.
+- [ ] Startup test: with a wrong `REDIS_URL`, the process logs a clear error and exits non-zero.
+- [ ] An end-to-end test: open a TCP client to the running server, send the canonical Codec 8 fixture, verify a Position lands on the Stream and the ACK comes back with `00 00 00 01`.
+
+## Risks / open questions
+
+- `redis-mock` does not implement Streams. Use testcontainers + a real Redis for integration tests.
+- The bounded queue could cause backpressure concerns — discuss with the Processor team whether they prefer the device-retransmit path (overflow throw) or a soft-drop with logging. Defaulting to retransmit because it's the safer correctness choice.
+
+## Done
+
+(Fill in once complete.)