Files
docs/wiki/concepts/phase-2-commands.md
T
julian 22b1b069df Bootstrap LLM-maintained wiki with TRM architecture knowledge
Initialize CLAUDE.md schema, index, and log; ingest three architecture
sources (system overview, Teltonika ingestion design, official Teltonika
data-sending protocols) into 7 entity pages, 8 concept pages, and 3
source pages with wikilink cross-references.
2026-04-30 13:20:17 +02:00

7.6 KiB

title, type, created, updated, sources, tags
title type created updated sources tags
Phase 2 — Outbound Commands concept 2026-04-30 2026-04-30
teltonika-ingestion-architecture
teltonika-data-sending-protocols
teltonika
phase-2
commands
future

Phase 2 — Outbound Commands

The deferred design for server-to-device commands using teltonika codecs 12 (0x0C) and 14 (0x0E). Codec 13 (0x0D) is one-way device→server and is not part of the outbound design; codec 15 (0x0F) is FMX6-only and out of scope. Specified now so Phase 1 code respects the seams; no Phase 2 code ships until the platform actually needs to issue commands.

Codec selection: 12 vs 14

  • Codec 12 — generic command/response. Server sends with Type 0x05; device responds with Type 0x06. No IMEI in the frame; the device assumed to be the one on the other end of the socket.
  • Codec 14 — IMEI-addressed command/response. Server sends with Type 0x05 and an 8-byte IMEI in HEX; device returns Type 0x06 (ACK) if its physical IMEI matches, 0x11 (nACK) if not. Available from FMB.Ver.03.25.04.Rev.00. Useful as a defense-in-depth check that the connection registry is routing to the device we think we are.

A reasonable default is Codec 12 for routine ops (the connection registry already guarantees we're talking to the right device's socket), with Codec 14 reserved for situations where IMEI reconfirmation matters (e.g. infrequent high-impact commands).

Why deferred

Command codecs are a distinct feature, not an incremental codec, and require:

  • A way to enqueue commands targeted at specific devices.
  • Routing to whichever Ingestion instance currently holds the device's connection.
  • Permissioned APIs upstream so commands cannot be issued by unauthorized callers.
  • Audit trails for every command issued and every response received.

None of this is needed to read telemetry. Building it speculatively would either ship dead code or, worse, ship half-built infrastructure mistaken for usable.

End-to-end flow

SPA  ──HTTPS+JWT──▶  Directus  ──XADD──▶  Redis Streams  ──XREAD──▶  Ingestion ──▶ device
                       │                                                 │
SPA  ◀──WSS subscription──  Directus  ◀──hook on insert──  commands:responses

Five properties:

  1. Single auth surfacedirectus enforces "can this user command this device?" Same machinery as every other write.
  2. Commands are data before transport — every command is a row in the commands collection before it hits Redis.
  3. Symmetric to inbound telemetry — same plane boundary, same seam, same operational tools.
  4. Per-instance routing via a connection registry mapping imei → instance_id.
  5. Real-time status updates for free — Directus WebSocket subscriptions on commands push delivery status to the SPA.

Architectural posture

tcp-ingestion does not expose user-facing HTTP endpoints, in Phase 1 or Phase 2. All user-facing API surface is in directus (see plane-separation). Ingestion learns about commands by consuming its own Redis stream — never accepts inbound user-facing traffic.

The commands collection (Directus)

Key fields: id (uuid, correlation ID), target_imei, batch_id (nullable, for fleet ops), codec (12 or 1413/15 are device-originated, not server-issued), payload (ASCII text), status (pending | routed | delivered | responded | failed | nack | expired), requested_by, timestamp fields, response, failure_reason, expires_at (default requested_at + 5 min). The nack status captures Codec 14's IMEI-mismatch case (Type 0x11).

Permissions: writable by operator/admin roles; readable by requester + admin. The SPA inserts via SDK; Ingestion updates delivery status via a service token.

Connection registry

Redis hash connections:registry, keyed by IMEI, valued by Ingestion instance ID:

  • On handshake: HSET connections:registry {imei} {instance_id} + record IMEI in local Set<string>.
  • Every 30s: SET instance:heartbeat:{instance_id} {now} EX 90.
  • On socket close: HDEL connections:registry {imei}.
  • Graceful shutdown: HDEL all held IMEIs.

Crash recovery via janitor. Redis hashes don't support per-field TTL, so a registry janitor (Directus Flow or small process) runs every minute: for each instance_id in the registry, EXISTS instance:heartbeat:{instance_id} — if missing, scan the registry for entries pointing to it and HDEL them.

Issuing commands

Single device — SPA inserts a commands row; Directus Flow on items.create:

  1. Lookup instance_id = HGET connections:registry {target_imei}.
  2. If found: XADD commands:outbound:{instance_id} ...; status → routed.
  3. If not found: status stays pending; sweeper retries.

Fleet — SPA calls custom endpoint POST /commands/batch:

  1. Validate (size, authorization).
  2. Generate batch_id.
  3. Transactional insert of N rows sharing batch_id.
  4. Per row: registry lookup + stream publish.
  5. Return { batch_id, command_ids }.

The custom endpoint exists for fleet operations because transactional insert + routing fan-out is cleaner in code than in a Flow.

Pending-command sweeper

Flow runs every 30s:

  • pending rows where expires_at > now() → retry registry lookup; if device now online, publish + transition to routed.
  • pending or routed rows where expires_at <= now()expired with failure_reason.

Also handles the case where Ingestion crashes after publish but before delivery — those rows sit in routed past expires_at and get expired. Operator re-issues. (A subtler retry option exists — re-route stale routed rows when the original instance has died — but is an enhancement, not v1.)

Ingestion-side command consumer

Each Ingestion instance runs a parallel consumer reading commands:outbound:{instance_id} (XREADGROUP, COUNT 16, BLOCK 1000):

  1. Lookup local imei → socket map. If gone: publish failure (socket_closed).
  2. Check expires_at. If past: publish failure (expired_before_delivery).
  3. Encode codec 12/13/14 frame from payload.
  4. Write bytes to the socket (via per-socket write queue to avoid interleaving with codec ACKs).
  5. Register pending-response entry keyed by command_id with timeout (default 30s).

The consumer never blocks the TCP read path.

Response correlation

Teltonika's command codecs carry no correlation ID — the protocol assumes one outstanding command per connection. The Ingestion service enforces this; subsequent commands queue on the per-socket write queue.

When the device responds (Codec 12 with Type = 0x06), the codec dispatch routes to a response handler that publishes to commands:responses; a Directus hook (or small consumer) updates the row to status = responded. Timeout fires → status = failed with reason = 'no_device_response'; write queue is freed.

What this requires of Phase 1

Phase 1 must respect these shapes so Phase 2 is purely additive:

  • codec-dispatch is a registry keyed on codec ID byte — Phase 2 registers 0x0C, 0x0D, 0x0E.
  • Session loop owns the socket; handlers borrow it via a respond(bytes) callback (Phase 1 handlers don't use it).
  • Per-device runtime state is local to the socket and the holding instance — no shared registry today.
  • The position-record shape and the inbound stream are unchanged. Outbound uses entirely separate streams (commands:outbound:{instance_id}, commands:responses) and a separate Directus collection.

When Phase 2 ships, no Phase 1 code is rewritten — the command consumer runs alongside.