trm/docs

Files

T

julian 90d036dbf0 Document canonical Redis stream names in wiki

The wiki was silent on the actual stream name used by tcp-ingestion and
processor — anyone reading it to understand the architecture had no way
to find out what stream the services use. This gap contributed to a
stage-side bug where the two services' compiled defaults drifted
(tcp-ingestion: telemetry:teltonika, processor: telemetry:t), causing
~7 hours of silent zero-throughput before symptoms surfaced.

Changes:
- entities/redis-streams.md — added "Stream and key naming" table
  covering the inbound telemetry stream, Phase 2 command streams, and
  registry/heartbeat keys. Documented the telemetry:{vendor} convention
  so a future Queclink/Concox adapter fits predictably.
- entities/processor.md — opening paragraph names the stream and
  consumer group consumed.
- entities/tcp-ingestion.md — opening paragraph names the stream
  produced; defers full naming convention to redis-streams.
- log.md — note entry recording the canonicalization and the stage
  incident that triggered it.

2026-05-01 11:43:59 +02:00

3.0 KiB

Raw Blame History

title, type, created, updated, sources, tags

title

type

created

updated

sources

Redis Streams

The durable in-flight queue between tcp-ingestion and processor. Also the transport for Phase 2 outbound commands.

What it provides

Buffering — temporary slowness in processor does not push back on Ingestion sockets.
Replayability — Streams retain messages, so a Processor crash does not lose telemetry; consumer-group offsets resume from the last position.
Horizontal scaling — multiple Processor instances join a consumer group and split load across device IDs.

Why Redis (and not Kafka/NATS)

Sufficient at current scale and adds minimal operational burden. NATS or Kafka are reasonable upgrades when multi-region durability or very high throughput become real concerns. Until then, Redis is the right choice.

Stream and key naming

Canonical names used across the platform. Both tcp-ingestion and processor reference these via the REDIS_TELEMETRY_STREAM environment variable, pinned in the deploy stack so the two services cannot drift from each other.

Name	Purpose	Producer	Consumer
`telemetry:teltonika`	Inbound Position records from Teltonika devices	tcp-ingestion (XADD)	processor (XREADGROUP, group=`processor`)
`commands:outbound:{instance_id}`	Outbound device commands routed to a specific tcp-ingestion instance	directus Flow	tcp-ingestion
`commands:responses`	Command ACK/nACK and replies	tcp-ingestion	directus Flow
`connections:registry` (hash)	IMEI → instance routing table	tcp-ingestion	directus Flow
`instance:heartbeat:{instance_id}` (key, `EX 90`)	Liveness signal per tcp-ingestion instance	tcp-ingestion	janitor / directus Flow

Naming convention. Telemetry streams are namespaced by vendor (telemetry:{vendor}) so adding a second adapter (Queclink, Concox, etc.) creates telemetry:queclink rather than competing for shape on the same stream. processor consumes the union by joining a consumer group on each.

Phase 2 usage

Outbound commands ride on per-instance streams: commands:outbound:{instance_id}. Responses ride on commands:responses. Redis is the transport; the source of truth for commands is the directus commands collection. See phase-2-commands.

The connection registry (connections:registry hash) and per-instance heartbeats (instance:heartbeat:{instance_id} keys with EX 90) also live in Redis.

Failure mode

Streams are persisted; restart resumes from disk. Complete Redis loss is recoverable from device retransmits and Processor checkpointing. See failure-domains.

Operational note

Consumer lag is the canary metric for the entire telemetry pipeline. Observability dashboards should make it prominent.

Deployment

Internal-only container. Persistence enabled. Never exposed externally.

3.0 KiB Raw Blame History