Files
docs/wiki/entities/tcp-ingestion.md
T
julian 90d036dbf0 Document canonical Redis stream names in wiki
The wiki was silent on the actual stream name used by tcp-ingestion and
processor — anyone reading it to understand the architecture had no way
to find out what stream the services use. This gap contributed to a
stage-side bug where the two services' compiled defaults drifted
(tcp-ingestion: telemetry:teltonika, processor: telemetry:t), causing
~7 hours of silent zero-throughput before symptoms surfaced.

Changes:
- entities/redis-streams.md — added "Stream and key naming" table
  covering the inbound telemetry stream, Phase 2 command streams, and
  registry/heartbeat keys. Documented the telemetry:{vendor} convention
  so a future Queclink/Concox adapter fits predictably.
- entities/processor.md — opening paragraph names the stream and
  consumer group consumed.
- entities/tcp-ingestion.md — opening paragraph names the stream
  produced; defers full naming convention to redis-streams.
- log.md — note entry recording the canonicalization and the stage
  incident that triggered it.
2026-05-01 11:43:59 +02:00

82 lines
3.4 KiB
Markdown

---
title: TCP Ingestion
type: entity
created: 2026-04-30
updated: 2026-05-01
sources: [gps-tracking-architecture, teltonika-ingestion-architecture]
tags: [service, telemetry-plane]
---
# TCP Ingestion
The service that maintains persistent TCP connections with GPS devices, parses vendor binary protocols, ACKs frames per protocol, and hands off normalized records to the [[redis-streams]] queue (default stream `telemetry:teltonika` for the Teltonika adapter; see [[redis-streams]] for the full naming convention).
## Responsibility
Single concern: **protocol I/O**. Explicitly **not**:
- Apply business rules
- Write to PostgreSQL
- Perform geospatial computation
- Serve any user-facing API
The narrow scope is what keeps the process fast, predictable, and safely restartable.
## Connection model
- Built around `net.createServer()` (Node.js) — each socket is an independent session.
- Per-connection state is small: identifier (e.g. IMEI), parser instance, partial-frame buffer.
- Devices reconnect automatically on network failure → connection loss is routine → service is trivially restartable.
## Vendor abstraction
Each device vendor (Teltonika, Queclink, Concox, etc.) ships its own binary protocol. Vendor-specific code is isolated behind a [[protocol-adapter]] interface:
- **Input**: byte stream from a TCP socket
- **Output**: normalized [[position-record]] (`device_id`, `timestamp`, `lat`, `lon`, `speed`, `heading`, plus a free-form `attributes` bag)
Adding a new vendor = writing a new adapter. Nothing downstream changes.
## Handoff discipline
For every parsed frame:
1. Send protocol-required ACK to the device.
2. Push normalized record to a Redis Stream.
3. Return to reading the socket.
**The TCP handler never blocks on downstream work.** Backpressure is absorbed by the Stream; Ingestion keeps accepting and acknowledging. This is the discipline that keeps the system alive under load.
## Project layout
Lives at `tcp-ingestion/` — single Node.js/TypeScript project. Layout:
```
tcp-ingestion/
├── src/core/ # vendor-agnostic shell (no adapter imports)
├── src/adapters/ # per-vendor adapters
│ └── teltonika/ # see [[teltonika]]
├── src/config/
├── src/observability/
└── test/fixtures/ # real packet captures per codec
```
Three layout rules: `core/` never imports `adapters/`; adapters never import each other; each adapter folder is self-contained so it can be lifted into its own service later via `git mv`.
## Scaling shape
- Single Node.js process handles thousands of concurrent connections at typical telemetry rates.
- Horizontal scaling: multiple instances behind a TCP-aware load balancer (HAProxy, NGINX stream module).
- TCP guarantees session stickiness for the duration of the connection.
- No shared state between instances required — per-device state lives entirely on the open socket.
The pattern ports cleanly to higher-throughput runtimes (Go, Elixir) if a future rewrite is warranted.
## Failure mode
Crash → devices reconnect → in-flight frames are retransmitted by the device per protocol → no data is lost beyond what was unacknowledged. See [[failure-domains]].
## Phase 2 addition
Each Ingestion instance will run a parallel **command consumer** reading from `commands:outbound:{instance_id}` and writing command frames to device sockets. The TCP read path is not blocked. See [[phase-2-commands]].