Add Phase 1 and Phase 2 planning documents

ROADMAP plus granular task files per phase. Phase 1 (12 tasks + 1.13
device authority) covers Codec 8/8E/16 telemetry ingestion; Phase 2
(6 tasks) covers Codec 12/14 outbound commands; Phase 3 enumerates
deferred items.
This commit is contained in:
2026-04-30 15:47:06 +02:00
parent 95e60a2c75
commit c8a5f4cd68
23 changed files with 2508 additions and 0 deletions
@@ -0,0 +1,118 @@
# Task 2.1 — Connection registry & heartbeat
**Phase:** 2 — Outbound commands
**Status:** ⬜ Not started
**Depends on:** Phase 1 complete
**Wiki refs:** `docs/wiki/concepts/phase-2-commands.md` § 9.3
## Goal
Maintain a Redis-backed registry mapping device IMEI → Ingestion instance ID, so Directus can route outbound commands to the instance currently holding the device's TCP socket.
## Deliverables
- `src/core/connection-registry.ts`:
- `ConnectionRegistry` class with methods `register(imei)`, `unregister(imei)`, `unregisterAll()`, `heartbeat()`.
- Internal state: `Set<string>` of held IMEIs for graceful-shutdown bulk cleanup.
- Hook into the Teltonika session lifecycle (in `src/adapters/teltonika/index.ts`):
- After IMEI handshake succeeds: `registry.register(imei)`.
- On socket close (any cause): `registry.unregister(imei)`.
- Heartbeat ticker started in `src/main.ts`, runs every 30 seconds.
- Graceful shutdown calls `registry.unregisterAll()` (task 1.12 hook updated to include this).
## Specification
### Redis layout
- **Hash** `connections:registry`: field = `imei`, value = `instance_id`. Single hash, all instances share it. Per-field TTL is not supported by Redis hashes — that's why the heartbeat key exists.
- **Key** `instance:heartbeat:{instance_id}`: written every 30s with `EX 90`. Existence proves the instance is alive.
### Operations
```ts
class ConnectionRegistry {
private held = new Set<string>();
constructor(private redis: Redis, private instanceId: string) {}
async register(imei: string): Promise<void> {
await this.redis.hset('connections:registry', imei, this.instanceId);
this.held.add(imei);
}
async unregister(imei: string): Promise<void> {
// Only delete if the entry still points at us.
// (Race: a device reconnected to a different instance between
// our session ending and this delete.)
const current = await this.redis.hget('connections:registry', imei);
if (current === this.instanceId) {
await this.redis.hdel('connections:registry', imei);
}
this.held.delete(imei);
}
async unregisterAll(): Promise<void> {
if (this.held.size === 0) return;
const pipeline = this.redis.pipeline();
for (const imei of this.held) {
pipeline.hdel('connections:registry', imei);
}
await pipeline.exec();
this.held.clear();
}
async heartbeat(): Promise<void> {
await this.redis.set(
`instance:heartbeat:${this.instanceId}`,
Date.now().toString(),
'EX',
90,
);
}
}
```
### Heartbeat ticker
In `main.ts`:
```ts
const heartbeatInterval = setInterval(() => {
registry.heartbeat().catch((err) => logger.error({ err }, 'heartbeat failed'));
}, 30_000);
// ensure cleared on shutdown
```
Run an initial `heartbeat()` immediately at startup so the instance is "alive" before the first 30s tick.
### Race conditions to handle
- **Same IMEI on two instances at once.** Possible when a device reconnects faster than we can detect close. The new instance's `register` overwrites the old's entry. The old instance's `unregister` checks `if (current === this.instanceId)` and skips the delete if not. Good.
- **Heartbeat key expires while instance is alive.** Network glitch caused a write to fail. The janitor (task 2.2) will clear the registry entries; devices reconnect and the new entries get written. Acceptable — temporary loss of routability for affected devices, recoverable in seconds.
- **Hash entry without heartbeat.** The instance died without graceful cleanup. Janitor handles this.
### Phase 1 impact
Phase 1 code in `src/adapters/teltonika/index.ts` needs three hook points:
1. After successful handshake.
2. On `socket.on('close')`.
3. On graceful shutdown (already wired in task 1.12).
These are additive — no Phase 1 logic changes, only new calls to the registry.
## Acceptance criteria
- [ ] After a device handshake completes, `HGET connections:registry <imei>` returns the local instance ID.
- [ ] After the socket closes, `HGET connections:registry <imei>` returns nil.
- [ ] If two simulated instances "race" on the same IMEI, the registry ends up pointing at whichever instance most recently registered, and the loser's `unregister` does not delete the winner's entry.
- [ ] Heartbeat key has `EX 90` and is refreshed every 30s.
- [ ] On SIGTERM, all held IMEIs are unregistered before exit.
- [ ] Registry operations are non-blocking on the TCP read path — register/unregister use `await` but inside session lifecycle callbacks, not the per-frame hot path.
## Risks / open questions
- What if Redis is unavailable at registration time? Options: (A) fail the handshake, (B) accept the device but log + alert. **Recommendation: B.** Phase 1's "telemetry continues even if business plane is degraded" property must be preserved; commands routing is a Phase 2 nice-to-have. Track via `teltonika_registry_failures_total`.
- Heartbeat write failures: log at warn, retry on next tick. Don't crash.
## Done
(Fill in once complete.)
@@ -0,0 +1,83 @@
# Task 2.2 — Registry janitor
**Phase:** 2 — Outbound commands
**Status:** ⬜ Not started
**Depends on:** 2.1
**Wiki refs:** `docs/wiki/concepts/phase-2-commands.md` § 9.3
## Goal
Periodically clear stale entries from `connections:registry` whose owning instance has died (heartbeat expired) without graceful cleanup.
## Deliverables
- `src/core/janitor.ts``Janitor` class with a `run()` method that performs one cleanup pass.
- A choice: run the janitor in-process (every Ingestion instance runs it, with leader election or with idempotent cleanup) or as a separate small process. **Recommendation: in-process, idempotent.** Simpler ops, no leader election; the cost is N instances each doing the work, but a registry pass is O(N_devices) and fast.
- Wired into `src/main.ts` as a 60-second ticker.
- Metric: `teltonika_registry_janitor_evicted_total{instance_id=...}` counter.
## Specification
### Algorithm (per pass)
```
1. entries = HGETALL connections:registry
2. unique_instance_ids = unique values from entries
3. For each instance_id in unique_instance_ids:
alive = EXISTS instance:heartbeat:{instance_id}
If !alive:
For each (imei, owner) in entries where owner == instance_id:
HDEL connections:registry imei
metrics.evicted.inc({ instance_id })
```
Use `HSCAN` instead of `HGETALL` if the registry is large (>10k entries) to avoid blocking Redis. For Phase 2's expected scale, `HGETALL` is fine.
### Idempotence
Multiple instances running the janitor in parallel may both attempt to delete the same stale entry. `HDEL` is idempotent — the second call returns 0 and is harmless. Just ensure logging doesn't double-count: only log on actual deletes (HDEL > 0 result).
### Race with re-registration
Sequence to consider:
1. Instance A dies; heartbeat expires.
2. Janitor on Instance B starts a pass. Sees A's entries, A's heartbeat is gone.
3. Device that was on A reconnects to Instance C.
4. Instance C calls `HSET connections:registry <imei> C`.
5. Janitor on B is mid-pass and calls `HDEL connections:registry <imei>`.
Result: device entry deleted moments after C registered it. Device routing is broken until the next reconnect or registration.
**Mitigation:** the janitor must check the entry value at delete time, not just at scan time:
```ts
for (const imei of evictTargets) {
// Re-read the value; only delete if still pointing at the dead instance.
const current = await redis.hget('connections:registry', imei);
if (current === deadInstanceId) {
await redis.hdel('connections:registry', imei);
}
}
```
This is "check-and-delete" — not atomic but the window is small. For full atomicity, use a Lua script. **Recommendation: ship the non-atomic version first; upgrade to Lua if the race causes operational issues.**
### Pace
Run every 60 seconds (configurable via `JANITOR_INTERVAL_MS`). One pass costs at most one `HGETALL` + N `EXISTS` + (rare) M `HDEL`. Negligible Redis load.
## Acceptance criteria
- [ ] Killing an Ingestion instance without graceful shutdown: within ~2 minutes (heartbeat TTL of 90s + one janitor pass), all of that instance's registry entries are gone.
- [ ] If the dying instance restarts and re-registers a device before the janitor evicts it, the new (live) entry is preserved (verified by the check-and-delete logic).
- [ ] Two janitors running in parallel: total deletes are correct, no double-counting in metrics.
- [ ] `teltonika_registry_janitor_evicted_total` increments by the right amount per pass.
## Risks / open questions
- The check-and-delete race window: small but real. If operationally observed, upgrade to Lua. Document the trade-off in `OPERATIONS.md`.
- Should the janitor be a separate process? Pros: cleaner separation; can be sized differently. Cons: another deployable, another monitoring target. **Defer to operational feedback.**
## Done
(Fill in once complete.)
@@ -0,0 +1,112 @@
# Task 2.3 — Per-socket write queue & outstanding-command tracker
**Phase:** 2 — Outbound commands
**Status:** ⬜ Not started
**Depends on:** Phase 1 complete (specifically the session loop in 1.4)
**Wiki refs:** `docs/wiki/concepts/phase-2-commands.md` § 9.6, § 9.8
## Goal
Provide a per-socket serialization layer so:
1. Outbound command frames do not interleave with codec ACK writes (which would corrupt the byte stream).
2. Only **one command is outstanding per socket at a time** (Teltonika's command codecs assume serial dispatch — there's no correlation ID in the protocol).
## Deliverables
- `src/core/write-queue.ts`:
- `SocketWriteQueue` class wrapping a `net.Socket` with an internal queue.
- Methods: `writeAck(buf: Buffer): Promise<void>`, `writeCommand(buf: Buffer): Promise<void>`.
- Per-socket state: `outstandingCommand: PendingCommand | null` with `commandId`, `timeout`, `resolve`, `reject` functions.
- `awaitResponse(commandId, timeoutMs): Promise<Buffer>` — registers the in-flight command and waits for a response delivered via a separate `notifyResponse(buf)` method.
- Update `src/adapters/teltonika/index.ts` session struct to hold a `SocketWriteQueue` per session.
- Update Phase 1's framing layer (task 1.4 deliverable) to write ACKs through `queue.writeAck` instead of directly to the socket.
## Specification
### Why ACKs go through the queue too
Phase 1 wrote ACKs directly to the socket. Phase 2 must serialize ACKs with command writes, otherwise:
```
Time T+0: codec parser writes ACK = [00 00 00 01]
Time T+0: command consumer writes Codec 12 frame
```
Without serialization, the bytes interleave at the socket level, producing garbage on the wire. The fix is mandatory — every socket write goes through one queue.
### Queue semantics
```ts
class SocketWriteQueue {
private chain: Promise<void> = Promise.resolve();
private outstanding: PendingCommand | null = null;
constructor(private socket: net.Socket) {}
async writeAck(buf: Buffer): Promise<void> {
this.chain = this.chain.then(() => this.writeRaw(buf));
return this.chain;
}
async writeCommand(commandId: string, buf: Buffer, timeoutMs = 30_000): Promise<Buffer> {
if (this.outstanding) {
// Wait for the previous command to resolve/reject before queueing this one.
try { await this.outstanding.promise; } catch { /* prior command failed; we still proceed */ }
}
const pending: PendingCommand = makePending(commandId, timeoutMs);
this.outstanding = pending;
this.chain = this.chain.then(() => this.writeRaw(buf));
await this.chain; // bytes are on the wire
return pending.promise; // resolves when notifyResponse called or rejects on timeout
}
notifyResponse(buf: Buffer): void {
if (!this.outstanding) {
// Unsolicited response. Log warn and ignore.
return;
}
this.outstanding.resolveWith(buf);
this.outstanding = null;
}
private writeRaw(buf: Buffer): Promise<void> {
return new Promise((resolve, reject) => {
this.socket.write(buf, (err) => err ? reject(err) : resolve());
});
}
}
```
`PendingCommand` exposes a promise that resolves when `resolveWith` is called and rejects when its `setTimeout` fires.
### Backpressure on queued commands
A device with many queued commands could grow the queue unboundedly. Cap per-socket queue depth:
- Soft: log a warning at 5 queued commands.
- Hard: reject `writeCommand` with `WriteQueueFullError` at 20 queued commands. The command consumer publishes a failure to `commands:responses`.
### Timeout default
30 seconds per command. Override via `commandTimeoutMs` in the `commands` row (Phase 2 design has `expires_at`; that's a clock-time deadline at the Directus level. The per-write timeout is the protocol-level "device didn't respond" deadline).
When the timeout fires, the queue resolves the outstanding promise with a rejection (`CommandTimeoutError`). The next queued command becomes the outstanding one and proceeds.
## Acceptance criteria
- [ ] Two concurrent calls to `writeAck(buf1)` and `writeCommand(id, buf2)` produce bytes on the wire in submission order, no interleaving (verified with a TCP-level recording test).
- [ ] `writeCommand` blocks subsequent `writeCommand` calls until the first resolves or times out.
- [ ] `notifyResponse` correctly resolves the outstanding command's promise.
- [ ] Timeout firing rejects the outstanding promise; the next queued command starts.
- [ ] Queue depth metric (`teltonika_command_queue_depth{imei=...}`) — wait, no: per-IMEI labels are forbidden by task 1.10's cardinality rule. Use `teltonika_command_queue_depth_total` (gauge sum across sockets) and log per-IMEI in warns.
- [ ] On socket close, all pending command promises reject with `SocketClosedError`.
## Risks / open questions
- The "outstanding command" model assumes the device responds to commands in order, which Teltonika's protocol does (one outstanding per socket). If we discover devices that don't, we'd need correlation IDs — but the protocol doesn't carry them, so the answer is "you can't" and we'd add a queue depth limit smaller than 1 (i.e. don't ever queue, fail fast).
- ACK write order vs response delivery: when a device sends an AVL frame and we're mid-command, the AVL frame's ACK queues behind the command bytes. Worst case: device receives ACK for AVL frame slightly later. Acceptable.
## Done
(Fill in once complete.)
@@ -0,0 +1,141 @@
# Task 2.4 — Command consumer (Redis stream reader)
**Phase:** 2 — Outbound commands
**Status:** ⬜ Not started
**Depends on:** 2.1, 2.3
**Wiki refs:** `docs/wiki/concepts/phase-2-commands.md` § 9.6
## Goal
Each Ingestion instance runs a worker that consumes commands from `commands:outbound:{instance_id}`, looks up the local socket for the target IMEI, and dispatches the command to the appropriate codec encoder + write queue.
## Deliverables
- `src/adapters/teltonika/command-consumer.ts`:
- `CommandConsumer` class with `start()` and `stop()` methods.
- Internal: a registry of `imei → SocketWriteQueue` for sessions held by this instance.
- Methods exposed to the session lifecycle: `attach(imei, queue)`, `detach(imei)`.
- Reads commands via `XREADGROUP commands:outbound:{instance_id} GROUP ingest {instance_id} COUNT 16 BLOCK 1000`.
- Calls codec-specific encoder/handler based on the command's `codec` field.
- On terminal outcome (delivered, responded, failed), publishes to `commands:responses`.
- `src/adapters/teltonika/responses.ts`:
- `publishResponse({ commandId, status, response?, failureReason? })` writes to `commands:responses` via `XADD`.
## Specification
### Stream consumption
```ts
async start(): Promise<void> {
// Ensure the consumer group exists. MKSTREAM creates the stream if absent.
try {
await this.redis.xgroup('CREATE', this.streamKey, 'ingest', '$', 'MKSTREAM');
} catch (err: any) {
if (!err.message?.includes('BUSYGROUP')) throw err;
}
while (!this.stopping) {
const messages = await this.redis.xreadgroup(
'GROUP', 'ingest', this.instanceId,
'COUNT', 16, 'BLOCK', 1000,
'STREAMS', this.streamKey, '>',
);
if (!messages) continue;
for (const [, entries] of messages) {
for (const [id, fields] of entries) {
await this.handleCommand(id, fieldsToObject(fields));
}
}
}
}
```
`fieldsToObject` converts Redis's flat `[key, value, key, value, ...]` array to a plain object.
### Command field shape
Per the Phase 2 design, Directus's Flow publishes:
```
XADD commands:outbound:{instance_id} *
command_id <uuid>
target_imei <string>
codec 12 | 14
payload <ASCII command text>
expires_at <unix-seconds>
```
### Dispatch
```ts
async handleCommand(streamId: string, cmd: CommandMessage): Promise<void> {
const queue = this.sessions.get(cmd.target_imei);
if (!queue) {
await this.publishResponse({ commandId: cmd.command_id, status: 'failed', failureReason: 'socket_closed' });
await this.redis.xack(this.streamKey, 'ingest', streamId);
return;
}
if (Date.now() / 1000 > cmd.expires_at) {
await this.publishResponse({ commandId: cmd.command_id, status: 'failed', failureReason: 'expired_before_delivery' });
await this.redis.xack(this.streamKey, 'ingest', streamId);
return;
}
try {
const frame = encodeCommand(cmd.codec, cmd.command_id, cmd.payload);
const responseBuf = await queue.writeCommand(cmd.command_id, frame, /* timeoutMs */ 30_000);
const parsed = parseCommandResponse(cmd.codec, responseBuf);
await this.publishResponse({
commandId: cmd.command_id,
status: parsed.kind === 'ack' ? 'responded' : 'failed',
response: parsed.text,
failureReason: parsed.kind === 'nack' ? 'imei_mismatch' : undefined,
});
} catch (err) {
await this.publishResponse({ commandId: cmd.command_id, status: 'failed', failureReason: errToReason(err) });
} finally {
await this.redis.xack(this.streamKey, 'ingest', streamId);
}
}
```
`encodeCommand` and `parseCommandResponse` come from tasks 2.5 (Codec 12) and 2.6 (Codec 14).
### `commands:responses` shape
```
XADD commands:responses *
command_id <uuid>
status delivered | responded | failed
response <ASCII response text, optional>
failure_reason socket_closed | expired_before_delivery | imei_mismatch | timeout | write_queue_full | ...
responded_at <ms>
```
### Lifecycle hooks
In the Teltonika session:
- After successful handshake: `commandConsumer.attach(imei, writeQueue)`.
- On socket close: `commandConsumer.detach(imei)`.
- The consumer must reject any in-flight command for a detached IMEI with `socket_closed`.
### Concurrency
The consumer reads up to 16 messages per `XREADGROUP` call. Process them sequentially per call (`for await`). Multiple commands targeting different IMEIs can complete in parallel naturally because each goes to a different `SocketWriteQueue`. Within a single IMEI, the queue serializes them.
## Acceptance criteria
- [ ] Publishing a command via `XADD commands:outbound:{instance_id}` causes the consumer to call `writeCommand` on the right session.
- [ ] If the IMEI is not held by this instance, the consumer publishes `failed` with `socket_closed` to `commands:responses` and ACKs the stream entry.
- [ ] If `expires_at` has passed, the consumer publishes `failed` with `expired_before_delivery` and ACKs.
- [ ] On `stop()`, the consumer drains the in-flight message and exits the read loop cleanly.
- [ ] `XACK` happens only after the response is published (or terminal failure recorded), so a crash mid-handler causes the command to be redelivered.
## Risks / open questions
- Crash mid-handler: the command was sent on the wire but we crashed before `XACK`. After restart, the consumer will redeliver; the new instance won't have the device, so it publishes `socket_closed`. The result: command was delivered to the device but Directus thinks it failed. Operator re-issues. Acceptable v1; flagged in [[phase-2-commands]] as a sweeper concern. Idempotent device commands mitigate.
- Duplicate delivery via `XPENDING`: not handling Pending Entries List explicitly in v1. If a consumer crashes, its claims time out and another consumer in the group can claim — but we're using `instance_id` as the consumer name, so cross-instance claiming would deliver commands to the wrong device. **Decision:** each instance is the only consumer in its own consumer group (group name = `ingest`, consumer name = `instance_id`, but stream is per-instance so no cross-claiming risk). Verify this matches the Directus-side publishing logic.
## Done
(Fill in once complete.)
+117
View File
@@ -0,0 +1,117 @@
# Task 2.5 — Codec 12 encoder + handler
**Phase:** 2 — Outbound commands
**Status:** ⬜ Not started
**Depends on:** 2.3, 2.4
**Wiki refs:** `docs/wiki/sources/teltonika-data-sending-protocols.md` § Codec 12, `docs/wiki/concepts/phase-2-commands.md`
## Goal
Encode Codec 12 (`0x0C`) command frames for outbound delivery; parse Codec 12 response frames coming back from devices.
## Deliverables
- `src/adapters/teltonika/codec/command/codec12.ts`:
- `encodeCodec12Command(payload: string): Buffer` produces the on-the-wire byte sequence.
- `parseCodec12Response(buf: Buffer): { kind: 'ack' | 'unexpected'; text: string }` parses an inbound response frame.
- A `codec12CommandHandler: CodecDataHandler` that the **inbound** framing layer (task 1.4) registers for codec ID `0x0C`. This handler does not produce `Position` records; it routes the response payload to the per-socket write queue's `notifyResponse`.
- Test file `test/codec12.test.ts` with at least:
- The two canonical doc examples (`getinfo` request + response, `getio` request + response).
- One synthetic command with non-ASCII bytes in the payload to verify HEX encoding.
## Specification
### Frame structure (server → device)
```
[Preamble 4B = 0x00000000]
[DataSize 4B BE] ← from CodecID through CmdQty2 inclusive
[CodecID 1B = 0x0C]
[CmdQty1 1B = 0x01]
[Type 1B = 0x05] ← 0x05 = command from server
[CmdSize 4B BE] ← length of command payload bytes
[Command X B] ← ASCII command, encoded as raw bytes (NOT hex-encoded)
[CmdQty2 1B = 0x01]
[CRC 4B BE] ← CRC-16/IBM, lower 2 bytes; computed over CodecID..CmdQty2
```
Encoder pseudocode:
```ts
export function encodeCodec12Command(payload: string): Buffer {
const cmd = Buffer.from(payload, 'ascii');
const cmdSize = cmd.length;
const dataSize = 1 + 1 + 1 + 4 + cmdSize + 1; // CodecID + CmdQty1 + Type + CmdSize + Command + CmdQty2
const out = Buffer.alloc(4 + 4 + dataSize + 4); // Preamble + DataSize + body + CRC
let off = 0;
out.writeUInt32BE(0, off); off += 4;
out.writeUInt32BE(dataSize, off); off += 4;
out.writeUInt8(0x0C, off); off += 1;
out.writeUInt8(0x01, off); off += 1;
out.writeUInt8(0x05, off); off += 1;
out.writeUInt32BE(cmdSize, off); off += 4;
cmd.copy(out, off); off += cmdSize;
out.writeUInt8(0x01, off); off += 1;
const body = out.subarray(8, 8 + dataSize); // CodecID through CmdQty2
const crc = crc16Ibm(body);
out.writeUInt32BE(crc, off);
return out;
}
```
Verify against the canonical doc's `getinfo` example: input `getinfo` → output `000000000000000F0C010500000007676574696E666F0100004312`.
### Response structure (device → server)
Identical frame shape, but `Type = 0x06`:
```
[Preamble 4B][DataSize 4B][CodecID 0x0C][RspQty1 1B][Type 0x06][RspSize 4B][Response X B][RspQty2 1B][CRC 4B]
```
The response field is ASCII text, e.g. `INI:2019/7/22 7:22 RTC:...`.
Parser:
```ts
export function parseCodec12Response(body: Buffer): { kind: 'ack'; text: string } | { kind: 'unexpected'; reason: string } {
// body is post-framing-layer: starts at CodecID
const codecId = body[0];
if (codecId !== 0x0C) return { kind: 'unexpected', reason: `wrong codec ${codecId.toString(16)}` };
const rspQty1 = body[1];
const type = body[2];
if (type !== 0x06) return { kind: 'unexpected', reason: `expected response type 0x06, got 0x${type.toString(16)}` };
const rspSize = body.readUInt32BE(3);
const text = body.subarray(7, 7 + rspSize).toString('ascii');
// body[7 + rspSize] is RspQty2; CRC was already validated upstream.
return { kind: 'ack', text };
}
```
### Routing inbound responses to the right command
The inbound framing layer (task 1.4) sees a frame with codec `0x0C` and dispatches to `codec12CommandHandler`. That handler retrieves the session's `SocketWriteQueue` (from the session context) and calls `queue.notifyResponse(rawBody)`. The write queue's `awaitResponse` promise resolves with the body; the command consumer (task 2.4) then calls `parseCodec12Response` to extract the text.
This is the seam where Phase 2 plugs into Phase 1's framing layer. Phase 1 already supports it because:
1. The codec dispatch is a registry — Phase 2 just registers a new handler.
2. Phase 1's handler interface returns `{ recordCount: number }` for ACK count. For Codec 12, **the device does not expect a record-count ACK** — responses are inherently their own ACK. The handler returns `{ recordCount: 0 }` and the framing layer's ACK send path skips the write when `recordCount` is 0. **Update task 1.4 to honor this** if not already.
> **Open question:** is `recordCount: 0` the right signal to skip ACK? Or should the handler interface return `{ ack: Buffer | null }` instead? The latter is cleaner. **Recommendation:** add an explicit `ack` return slot to `CodecDataHandler` in this task and update the data codec handlers to return `{ ack: makeRecordCountAck(n) }`. Phase 2's command handlers return `{ ack: null }`.
## Acceptance criteria
- [ ] `encodeCodec12Command('getinfo')` produces the canonical doc bytes exactly (compare hex strings).
- [ ] `parseCodec12Response` correctly decodes the doc's `getinfo` response into the `INI:2019/7/22...` ASCII string.
- [ ] An end-to-end test: simulate a device that responds to a Codec 12 command, verify the round-trip command_id → encoded frame → device response → parsed text → published to `commands:responses`.
- [ ] CRC of every encoded frame validates against `crc16Ibm`.
- [ ] An incoming Codec 12 frame with `Type != 0x06` is logged at warn (unexpected protocol direction) and not surfaced to the command consumer.
## Risks / open questions
- The interface change (returning `{ ack }` instead of `{ recordCount }`) is a Phase 1 retrofit. Cost: minor — three Phase 1 codec handlers update their return shape. Benefit: cleaner Phase 2 plug-in.
- The `getinfo` canonical CRC in the doc is `0x00004312`. Verify the encoder matches before declaring done.
## Done
(Fill in once complete.)
+118
View File
@@ -0,0 +1,118 @@
# Task 2.6 — Codec 14 encoder + ACK/nACK handler
**Phase:** 2 — Outbound commands
**Status:** ⬜ Not started
**Depends on:** 2.5 (shares utility code), 2.3, 2.4
**Wiki refs:** `docs/wiki/sources/teltonika-data-sending-protocols.md` § Codec 14, `docs/wiki/concepts/phase-2-commands.md`
## Goal
Encode Codec 14 (`0x0E`) command frames with embedded IMEI; parse responses with both ACK (`0x06`) and **nACK (`0x11`)** types.
## Deliverables
- `src/adapters/teltonika/codec/command/codec14.ts`:
- `encodeCodec14Command(imei: string, payload: string): Buffer`.
- `parseCodec14Response(buf: Buffer): { kind: 'ack'; imei: string; text: string } | { kind: 'nack'; imei: string } | { kind: 'unexpected'; reason: string }`.
- `codec14CommandHandler: CodecDataHandler` registered for codec ID `0x0E`.
- Test file `test/codec14.test.ts` covering: doc canonical example (`getver` round trip with both ACK and nACK responses).
## Specification
### Frame structure (server → device)
```
[Preamble 4B]
[DataSize 4B] ← from CodecID through CmdQty2
[CodecID 1B = 0x0E]
[CmdQty1 1B = 0x01]
[Type 1B = 0x05]
[CmdSize 4B] ← command bytes + 8 (IMEI size)
[IMEI 8B HEX] ← e.g. IMEI 123456789123456 → 0x0123456789123456
[Command X B] ← ASCII command bytes
[CmdQty2 1B = 0x01]
[CRC 4B]
```
**IMEI encoding rule:** the device IMEI is encoded as 8 bytes in HEX. For a 15-digit IMEI like `352093081452251`, prepend a leading zero (`0352093081452251`) and parse as a 16-hex-char value → 8 bytes: `0x03 52 09 30 81 45 22 51`. **Not** ASCII like the handshake.
```ts
function imeiToHex(imei: string): Buffer {
// 15 digits → prepend "0" → 16 hex chars → 8 bytes
const padded = imei.padStart(16, '0');
if (!/^\d{16}$/.test(padded)) throw new Error(`bad IMEI: ${imei}`);
return Buffer.from(padded, 'hex');
}
```
### Response structure (device → server)
Two cases:
**ACK** (`Type = 0x06`): IMEI matched; command executed.
```
[Preamble][DataSize][CodecID 0x0E][RspQty1][Type 0x06][RspSize][IMEI 8B][Response X B][RspQty2][CRC]
```
**nACK** (`Type = 0x11`): IMEI did not match; command not executed.
```
[Preamble][DataSize][CodecID 0x0E][RspQty1][Type 0x11][RspSize=0x08][IMEI 8B][RspQty2][CRC]
```
Note: nACK has `RspSize = 8` (the IMEI itself counts), no Response bytes.
### Parser
```ts
export function parseCodec14Response(body: Buffer):
| { kind: 'ack'; imei: string; text: string }
| { kind: 'nack'; imei: string }
| { kind: 'unexpected'; reason: string }
{
const codecId = body[0];
if (codecId !== 0x0E) return { kind: 'unexpected', reason: `wrong codec 0x${codecId.toString(16)}` };
const type = body[2];
const rspSize = body.readUInt32BE(3);
const imeiHex = body.subarray(7, 15).toString('hex');
const imei = imeiHex.replace(/^0+/, ''); // strip leading zero used for padding
if (type === 0x06) {
const text = body.subarray(15, 15 + rspSize - 8).toString('ascii');
return { kind: 'ack', imei, text };
}
if (type === 0x11) {
return { kind: 'nack', imei };
}
return { kind: 'unexpected', reason: `unknown response type 0x${type.toString(16)}` };
}
```
### Mapping to `commands:responses`
The command consumer (task 2.4) handles all three outcomes:
- `ack``status = 'responded'`, `response = text`.
- `nack``status = 'failed'`, `failure_reason = 'imei_mismatch'`. The command was *delivered* but rejected — important nuance for operator dashboards.
- `unexpected``status = 'failed'`, `failure_reason = 'protocol_error'`.
### Firmware version requirement
Codec 14 requires FMB.Ver.03.25.04.Rev.00 or newer. Older firmware will not understand the codec ID and may close the connection. The Phase 2 design relies on Directus knowing which devices support which codecs (potentially a `firmware_version` column on a `devices` collection). The Ingestion service does not enforce this; it just sends what it's told.
> **Open question:** should we expose a metric `teltonika_codec14_unexpected_total` to detect cases where Codec 14 was sent but the device closed the connection (suggesting outdated firmware)? Probably yes; add to task 2.6 deliverables and the metrics inventory.
## Acceptance criteria
- [ ] `encodeCodec14Command('352093081452251', 'getver')` produces the canonical doc bytes exactly: `00000000000000160E01050000000E0352093081452251676574766572010000D2C1`.
- [ ] `parseCodec14Response` correctly decodes the doc's ACK response (IMEI + version string).
- [ ] `parseCodec14Response` correctly decodes the doc's nACK response (IMEI mismatch case).
- [ ] An end-to-end test simulates a device that ACKs Codec 14 and a device that nACKs; verify both terminal statuses land in `commands:responses` correctly.
- [ ] IMEI HEX encoding round-trips through `imeiToHex` and the response parser.
## Risks / open questions
- nACK with `RspSize` not equal to 8 is malformed but we should fail safe (treat as `unexpected`) rather than read past buffer bounds.
- Should the Ingestion service also log the IMEI from the nACK response (which is the *server's claim*) and compare to the *actual* IMEI of the connection (from handshake)? If they differ, something seriously wrong is happening. **Yes — log at error if they differ.** Add to acceptance criteria.
## Done
(Fill in once complete.)
+65
View File
@@ -0,0 +1,65 @@
# Phase 2 — Outbound commands
Add server-to-device command delivery using Teltonika codecs 12 (`0x0C`) and 14 (`0x0E`). Codec 13 is one-way device→server (not in scope for outbound); codec 15 is FMX6-only (out of scope entirely).
## Prerequisite
Phase 1 must be complete and stable in production. Phase 2 adds code *alongside* Phase 1, never in the inbound parsing path.
## Outcome statement
When Phase 2 is done:
- Each Ingestion instance maintains its IMEI→instance mapping in `connections:registry` (Redis hash) and a heartbeat key.
- A Directus Flow on `commands` table inserts can publish a command to `commands:outbound:{instance_id}` after looking up the routing.
- Each Ingestion instance runs a command consumer in parallel with the TCP listener; consumed commands are dispatched to the right per-socket write queue, encoded as Codec 12 or 14, and written to the device.
- Device responses (Codec 12 Type `0x06` or Codec 14 Type `0x06`/`0x11`) are correlated to the in-flight command and published to `commands:responses` for Directus to update the row.
- The TCP read path is never blocked by outbound work.
- Phase 1 code is unchanged.
## Architectural anchors
`docs/wiki/concepts/phase-2-commands.md` is the design source of truth. Read it before starting any Phase 2 task.
Key invariants:
1. **Ingestion exposes no user-facing HTTP** — never. All command authorization happens in Directus.
2. **Commands are data before transport.** Every command has a row in Directus's `commands` table before it ever reaches Redis.
3. **One outstanding command per device socket.** Teltonika command codecs have no correlation ID; the protocol assumes serialization. Subsequent commands queue on the per-socket write queue.
4. **Per-instance routing.** Only the Ingestion instance currently holding a device's socket can deliver commands to it. The connection registry exists so Directus knows which instance to publish to.
## Sequencing
```
2.1 Connection registry & heartbeat ─┐
2.2 Registry janitor ├─→ 2.4 Command consumer ─┐
2.3 Per-socket write queue ──────────┘ ├─→ 2.5 Codec 12 handler
└─→ 2.6 Codec 14 handler
```
Tasks 2.1, 2.2, 2.3 can be done in parallel; they are independent infrastructure pieces. 2.5 and 2.6 can be parallelized once 2.4 lands.
## Files added
Phase 2 introduces these new files (no Phase 1 file is modified except `src/main.ts` to wire in the command consumer):
```
src/
├── adapters/teltonika/
│ ├── codec/command/
│ │ ├── codec12.ts ← NEW (encoder + response parser)
│ │ └── codec14.ts ← NEW (encoder + ACK/nACK parser)
│ └── command-consumer.ts ← NEW (stream reader, dispatch)
├── core/
│ ├── connection-registry.ts ← NEW
│ ├── write-queue.ts ← NEW
│ └── janitor.ts ← NEW (separate small process or in-process worker)
└── main.ts ← updated to start consumer + registry
```
`src/adapters/teltonika/codec/command/` already exists from Phase 1 (empty placeholder); Phase 2 fills it.
## Out of scope for this phase
- The Directus side of the system (`commands` table, Flows, sweeper) is owned by the Directus repo, not this one. Phase 2 in this repo only handles the Ingestion-side consumer and writer behavior.
- The pending-command sweeper runs in Directus, not Ingestion. Ingestion publishes terminal status (`delivered`, `responded`, or `failed` reasons) and Directus updates the row.