Add Phase 1 and Phase 2 planning documents

ROADMAP plus granular task files per phase. Phase 1 (12 tasks + 1.13
device authority) covers Codec 8/8E/16 telemetry ingestion; Phase 2
(6 tasks) covers Codec 12/14 outbound commands; Phase 3 enumerates
deferred items.
This commit is contained in:
2026-04-30 15:47:06 +02:00
parent 95e60a2c75
commit c8a5f4cd68
23 changed files with 2508 additions and 0 deletions
@@ -0,0 +1,114 @@
# Task 1.8 — Redis Streams publisher & main wiring
**Phase:** 1 — Inbound telemetry
**Status:** ⬜ Not started
**Depends on:** 1.2, 1.3, 1.4, 1.5, 1.6, 1.7
**Wiki refs:** `docs/wiki/entities/redis-streams.md`, `docs/wiki/concepts/position-record.md`
## Goal
Implement the real `publishPosition` that writes `Position` records to a Redis Stream, then wire the entire Phase 1 pipeline together in `src/main.ts`.
## Deliverables
- `src/core/publish.ts` (replacing the stub from task 1.2):
- `createPublisher(redis: Redis, config: Config, logger: Logger, metrics: Metrics): Publisher` factory.
- `Publisher.publish(p: Position): Promise<void>` that serializes and `XADD`s.
- Internal serialization helper `serializePosition(p: Position): Record<string, string>` returning the field-value pairs Redis expects.
- `src/main.ts` updated to:
1. Load config (task 1.3).
2. Build logger and metrics (tasks 1.3, 1.10).
3. Connect to Redis with retry-on-startup logic.
4. Build the publisher.
5. Build the Teltonika adapter and register codec handlers.
6. Start the TCP server.
7. Start the metrics HTTP server (task 1.10).
8. Install graceful shutdown (task 1.12 finalizes; stub here).
## Specification
### Stream record shape
`XADD telemetry:teltonika MAXLEN ~ <maxlen> * <fields>` where fields are flat key→string pairs (Redis Streams do not nest). Use a JSON-encoded `payload` field for simplicity:
```
1) ts → ISO8601 string (timestamp from the Position)
2) device_id → IMEI string
3) codec → "8" | "8E" | "16" (the codec that produced this record — useful for downstream filtering)
4) payload → JSON string of the full Position
```
The duplicated `device_id` and `ts` at the top level let downstream tools filter without parsing the JSON; `payload` is the source of truth.
### JSON serialization
`Position.attributes` contains `number | bigint | Buffer`. JSON.stringify out of the box handles `number` but not `bigint` or `Buffer`. Implement a custom replacer:
```ts
function replacer(_key: string, value: unknown): unknown {
if (typeof value === 'bigint') return { __bigint: value.toString() };
if (Buffer.isBuffer(value)) return { __buffer_b64: value.toString('base64') };
if (value instanceof Date) return value.toISOString();
return value;
}
```
The `__bigint` and `__buffer_b64` sentinels are decoded by the Processor (and any other consumer). Document this contract in the [[position-record]] page once landed.
### `XADD` options
- `MAXLEN ~ <REDIS_STREAM_MAXLEN>` — approximate trimming, much cheaper than exact.
- `*` for auto-generated message ID.
- Use a single connection (no pooling — `ioredis` multiplexes commands automatically).
### Backpressure / non-blocking property
The TCP handler is `await`-ing `ctx.publish(p)`. Two strategies:
**Option A: Direct `XADD` per record.** Simplest. Latency per publish is sub-millisecond on a healthy Redis. The risk: if Redis hangs, the TCP handler blocks → device sockets back up → Phase 1's "TCP handler never blocks" property is violated.
**Option B: Bounded in-memory queue + worker drain.** A `Promise`-based bounded queue (e.g. `p-queue` or hand-rolled). `publish()` resolves once the record is enqueued; a worker drains via `XADD`. If the queue is full, the worker has fallen behind catastrophically — at that point we have to choose: drop oldest, drop newest, or throw. Recommendation: drop newest with a structured error log + metric, because the device will retransmit (we won't ACK).
**Decision: Option B.** Specification:
- Queue capacity: 10,000 records (configurable via `PUBLISH_QUEUE_CAPACITY`).
- On overflow: do **not** publish; throw a typed `PublishOverflowError`. The framing layer (task 1.4) catches this and skips the ACK so the device retransmits.
- Worker concurrency: 1 (Redis is single-threaded per connection; concurrency just adds context-switch cost).
- Metric: `teltonika_publish_queue_depth` gauge, `teltonika_publish_overflow_total` counter.
The worker uses `XADD` with a per-call timeout (e.g. 2s) and exits the process on prolonged Redis unavailability — graceful shutdown should restart the process via the orchestrator.
### `main.ts` skeleton
```ts
async function main() {
const config = loadConfig();
const logger = createLogger(config);
const metrics = createMetrics();
const redis = await connectRedis(config, logger);
const publisher = createPublisher(redis, config, logger, metrics);
const adapter = createTeltonikaAdapter({ publisher, logger, metrics });
const server = startServer(config.TELTONIKA_PORT, adapter, { publish: publisher.publish, logger, metrics });
const metricsServer = startMetricsServer(config.METRICS_PORT, metrics);
installGracefulShutdown({ server, metricsServer, redis, publisher, logger });
logger.info({ port: config.TELTONIKA_PORT }, 'tcp-ingestion ready');
}
main().catch((err) => { console.error(err); process.exit(1); });
```
## Acceptance criteria
- [ ] Integration test: spin up a Redis (testcontainers or `redis-mock`), publish a known `Position`, `XREAD` it back, parse the JSON, and assert it equals the input (with `bigint` and `Buffer` round-tripped through the sentinel encoding).
- [ ] Overflow test: artificially block the worker, fill the queue, verify the next `publish()` rejects with `PublishOverflowError`, verify metrics increment.
- [ ] Startup test: with a wrong `REDIS_URL`, the process logs a clear error and exits non-zero.
- [ ] An end-to-end test: open a TCP client to the running server, send the canonical Codec 8 fixture, verify a Position lands on the Stream and the ACK comes back with `00 00 00 01`.
## Risks / open questions
- `redis-mock` does not implement Streams. Use testcontainers + a real Redis for integration tests.
- The bounded queue could cause backpressure concerns — discuss with the Processor team whether they prefer the device-retransmit path (overflow throw) or a soft-drop with logging. Defaulting to retransmit because it's the safer correctness choice.
## Done
(Fill in once complete.)