Files
docs/wiki/entities/redis-streams.md
T
julian 90d036dbf0 Document canonical Redis stream names in wiki
The wiki was silent on the actual stream name used by tcp-ingestion and
processor — anyone reading it to understand the architecture had no way
to find out what stream the services use. This gap contributed to a
stage-side bug where the two services' compiled defaults drifted
(tcp-ingestion: telemetry:teltonika, processor: telemetry:t), causing
~7 hours of silent zero-throughput before symptoms surfaced.

Changes:
- entities/redis-streams.md — added "Stream and key naming" table
  covering the inbound telemetry stream, Phase 2 command streams, and
  registry/heartbeat keys. Documented the telemetry:{vendor} convention
  so a future Queclink/Concox adapter fits predictably.
- entities/processor.md — opening paragraph names the stream and
  consumer group consumed.
- entities/tcp-ingestion.md — opening paragraph names the stream
  produced; defers full naming convention to redis-streams.
- log.md — note entry recording the canonicalization and the stage
  incident that triggered it.
2026-05-01 11:43:59 +02:00

55 lines
3.0 KiB
Markdown

---
title: Redis Streams
type: entity
created: 2026-04-30
updated: 2026-05-01
sources: [gps-tracking-architecture, teltonika-ingestion-architecture]
tags: [infrastructure, telemetry-plane, queue]
---
# Redis Streams
The durable in-flight queue between [[tcp-ingestion]] and [[processor]]. Also the transport for Phase 2 outbound commands.
## What it provides
- **Buffering** — temporary slowness in [[processor]] does not push back on Ingestion sockets.
- **Replayability** — Streams retain messages, so a Processor crash does not lose telemetry; consumer-group offsets resume from the last position.
- **Horizontal scaling** — multiple Processor instances join a consumer group and split load across device IDs.
## Why Redis (and not Kafka/NATS)
Sufficient at current scale and adds minimal operational burden. NATS or Kafka are reasonable upgrades when **multi-region durability** or **very high throughput** become real concerns. Until then, Redis is the right choice.
## Stream and key naming
Canonical names used across the platform. Both [[tcp-ingestion]] and [[processor]] reference these via the `REDIS_TELEMETRY_STREAM` environment variable, pinned in the deploy stack so the two services cannot drift from each other.
| Name | Purpose | Producer | Consumer |
|---|---|---|---|
| `telemetry:teltonika` | Inbound Position records from Teltonika devices | [[tcp-ingestion]] (XADD) | [[processor]] (XREADGROUP, group=`processor`) |
| `commands:outbound:{instance_id}` | Outbound device commands routed to a specific [[tcp-ingestion]] instance | [[directus]] Flow | [[tcp-ingestion]] |
| `commands:responses` | Command ACK/nACK and replies | [[tcp-ingestion]] | [[directus]] Flow |
| `connections:registry` (hash) | IMEI → instance routing table | [[tcp-ingestion]] | [[directus]] Flow |
| `instance:heartbeat:{instance_id}` (key, `EX 90`) | Liveness signal per [[tcp-ingestion]] instance | [[tcp-ingestion]] | janitor / [[directus]] Flow |
**Naming convention.** Telemetry streams are namespaced by vendor (`telemetry:{vendor}`) so adding a second adapter (Queclink, Concox, etc.) creates `telemetry:queclink` rather than competing for shape on the same stream. [[processor]] consumes the union by joining a consumer group on each.
## Phase 2 usage
Outbound commands ride on per-instance streams: `commands:outbound:{instance_id}`. Responses ride on `commands:responses`. Redis is the transport; the source of truth for commands is the [[directus]] `commands` collection. See [[phase-2-commands]].
The connection registry (`connections:registry` hash) and per-instance heartbeats (`instance:heartbeat:{instance_id}` keys with `EX 90`) also live in Redis.
## Failure mode
Streams are persisted; restart resumes from disk. Complete Redis loss is recoverable from device retransmits and Processor checkpointing. See [[failure-domains]].
## Operational note
**Consumer lag is the canary metric** for the entire telemetry pipeline. Observability dashboards should make it prominent.
## Deployment
Internal-only container. Persistence enabled. Never exposed externally.