trm/docs

Files

T

julian 22b1b069df Bootstrap LLM-maintained wiki with TRM architecture knowledge

Initialize CLAUDE.md schema, index, and log; ingest three architecture
sources (system overview, Teltonika ingestion design, official Teltonika
data-sending protocols) into 7 entity pages, 8 concept pages, and 3
source pages with wikilink cross-references.

2026-04-30 13:20:17 +02:00

24 KiB

Raw Blame History

GPS Tracking Platform — Architecture Overview

Document type: Architecture reference Scope: System-level design for a real-time GPS telemetry platform Audience: Engineering, infrastructure, future contributors

1. Purpose and scope

This document describes the high-level architecture of a real-time GPS tracking platform built around four cooperating components:

A TCP Ingestion service that accepts persistent connections from GPS hardware and parses vendor-specific binary protocols.
A Processor service that consumes parsed telemetry, applies domain logic, and writes durable state.
A Directus instance that owns the relational schema, exposes data through REST/GraphQL APIs, and provides an admin UI for back-office users.
A React single-page application that delivers the end-user experience for operators and external participants.

The architecture is deliberately split along the lines of data velocity and failure domain. Hot-path telemetry is isolated from business APIs; live state is decoupled from durable state; user interfaces are separated from data ownership. The result is a system where each component can scale, fail, and be deployed independently, and where adding new device vendors or new front-end surfaces does not require touching the core.

This document does not cover business logic, domain modeling, or specific operational scenarios — it is intentionally generic and focused on the structural shape of the system.

2. System overview

                       ┌────────────────────────────┐
                       │      GPS Devices (TCP)     │
                       └──────────────┬─────────────┘
                                      │ binary frames
                                      ▼
                       ┌────────────────────────────┐
                       │      TCP Ingestion         │
                       │  - Vendor protocol parsers │
                       │  - Frame ACKs to devices   │
                       │  - Normalized Position     │
                       └──────────────┬─────────────┘
                                      │ enqueue
                                      ▼
                       ┌────────────────────────────┐
                       │     Redis Streams          │
                       │  (durable in-flight queue) │
                       └──────────────┬─────────────┘
                                      │ consume
                                      ▼
                       ┌────────────────────────────┐
                       │        Processor           │
                       │  - Per-device state        │
                       │  - Domain rules            │
                       │  - Derived events          │
                       └──────────────┬─────────────┘
                                      │ writes
                                      ▼
                  ┌──────────────────────────────────────┐
                  │   PostgreSQL + TimescaleDB           │
                  │  (positions hypertable, business     │
                  │   schema owned by Directus)          │
                  └──────┬───────────────────────┬───────┘
                         │                       │
                         ▼                       ▼
              ┌───────────────────┐    ┌───────────────────┐
              │     Directus      │    │  Direct DB writes │
              │  REST / GraphQL   │    │  from Processor   │
              │  WebSockets       │    │  (positions,      │
              │  Admin UI         │    │   events)         │
              └─────────┬─────────┘    └───────────────────┘
                        │ HTTPS / WSS
                        ▼
              ┌───────────────────┐
              │   React SPA       │
              │  (operators &     │
              │   participants)   │
              └───────────────────┘

The system is best understood as three concentric concerns:

Telemetry plane — TCP Ingestion + Redis Streams + Processor. Optimized for throughput, low latency, and resilience to bursty input. Stateless or nearly so.
Business plane — Directus over PostgreSQL/TimescaleDB. Owns the schema, the API surface, the permissions model, and back-office workflows.
Presentation plane — React SPA. Consumes the business plane's APIs and real-time subscriptions; never talks to the telemetry plane directly.

The boundary between planes is enforced by the queue (Redis Streams) on one side and the database/API on the other. No component reaches across two boundaries.

3. TCP Ingestion

3.1 Responsibility

The Ingestion service exists to do one thing well: maintain persistent TCP connections with GPS devices, parse incoming binary frames, acknowledge them according to the vendor protocol, and hand off normalized records to the rest of the system.

It deliberately does not:

Apply business rules
Write to the primary database
Perform geospatial computation
Serve any user-facing API

This narrow scope is what allows the Ingestion process to remain fast, predictable, and safely restartable.

3.2 Connection model

GPS hardware typically opens a long-lived TCP connection and streams telemetry frames over it for hours or days at a time. The Ingestion service is built around net.createServer() (or equivalent in another runtime) and treats each socket as an independent session. Per-connection state is small: an identifier (e.g. IMEI), a parser instance, and a buffer for partial frames.

Devices reconnect automatically on network failure, so the Ingestion service treats connection loss as routine — there is no need to preserve sessions across restarts. This makes the service trivially restartable, which in turn makes deployments and crashes cheap.

3.3 Vendor abstraction

Each device vendor (Teltonika, Queclink, Concox, etc.) ships its own binary protocol. To prevent vendor-specific code from leaking into the rest of the system, the Ingestion layer defines a protocol adapter interface:

Input: a stream of bytes from a TCP socket
Output: normalized Position records with a stable shape (device_id, timestamp, latitude, longitude, speed, heading, plus a free-form attribute bag for vendor-specific telemetry)

Adding a new device family means writing a new adapter. Nothing downstream of the Ingestion service changes. This is the single most important property of this layer for long-term maintenance.

3.4 Handoff

Once a frame is parsed and normalized, the Ingestion service:

Sends the protocol-required ACK back to the device.
Pushes the normalized record onto a Redis Stream.
Returns to reading the socket.

The TCP handler never blocks on downstream work. If the Processor falls behind or the database is slow, the Stream absorbs the pressure; the Ingestion path keeps accepting and acknowledging frames. This single discipline is the difference between a system that survives a bad day and one that doesn't.

3.5 Scaling shape

A single Node.js process comfortably handles thousands of concurrent connections at typical telemetry rates. When a single process becomes insufficient — whether due to CPU, file descriptor limits, or operational preference — the service scales horizontally:

Multiple Ingestion instances run behind a TCP-aware load balancer (HAProxy, NGINX stream module).
Each device's connection is sticky for the duration of the session (TCP guarantees this naturally).
No shared state between Ingestion instances is required, because per-device state lives entirely on the open socket.

This is the same scaling pattern used in higher-throughput runtimes (Go, Elixir) and ports cleanly if a future rewrite is ever warranted.

4. Processor

4.1 Responsibility

The Processor is where domain logic lives. It consumes normalized telemetry from Redis Streams and is responsible for:

Maintaining per-device runtime state (last position, derived metrics, current zone, etc.)
Applying domain rules that turn raw telemetry into meaningful events
Writing durable state to the database — both the raw position history and any derived events
Emitting events that downstream consumers (Directus Flows, notification services, dashboards) can react to

Where Ingestion is about throughput and protocol correctness, the Processor is about correctness of meaning. It is the component most likely to evolve as requirements grow, which is why it is isolated from the sockets on one side and the API surface on the other.

4.2 State management

Because devices report at high frequency, the Processor keeps hot state in memory. Reaching for the database on every incoming record would be wasteful and slow. The pattern is:

Static reference data (e.g. spatial assets, configurations) is loaded at startup and refreshed on a known cadence or via explicit invalidation.
Per-device state (last seen, current segment, accumulators) is held in memory keyed by device identifier.
Durable state (position history, derived events, audit trail) is written asynchronously to the database.

The database is the source of truth for replay and analysis; the in-memory state is the source of truth for the current decision being made. If the Processor restarts, it can rehydrate from the database — but this is a recovery path, not a hot path.

4.3 Decoupling via the queue

Redis Streams between Ingestion and Processor provides three things:

Buffering — temporary slowness in the Processor does not push back on the Ingestion sockets.
Replayability — Streams retain messages, so a Processor crash does not lose telemetry; it picks up from its last consumer-group position.
Horizontal scaling — multiple Processor instances can join a consumer group and split the load across device IDs.

Redis is sufficient at this scale and adds minimal operational burden. NATS or Kafka are reasonable upgrades when multi-region durability or very high throughput become real concerns; until then, Redis is the right choice.

4.4 Writing to the database

The Processor writes directly to PostgreSQL/TimescaleDB. It is the only writer for high-volume telemetry tables (e.g. the positions hypertable). Directus does not insert positions; it reads them.

For derived business entities (events, violations, alerts), the Processor writes to tables that Directus also knows about. The schema is owned by Directus — defined and migrated through it — but the Processor inserts rows directly using a database connection. This keeps the hot write path off the Directus HTTP stack, while still letting Directus expose the data through its API and admin UI.

5. Directus

5.1 Role in the system

Directus is the business plane. It owns the relational schema, exposes it through auto-generated REST and GraphQL APIs, enforces role-based permissions, and provides the admin UI for back-office users.

This includes:

Schema management — collections, fields, relations, migrations
API generation — REST and GraphQL endpoints, no boilerplate
Authentication and authorization — users, roles, permissions, JWT issuance
Real-time — WebSocket subscriptions on collections for live UIs
Workflow automation — Flows for orchestrating side effects (notifications, integrations)
Admin UI — a complete back-office interface for operators

Directus is not in the telemetry hot path. It does not accept device connections, run the geofence engine, or hold per-device runtime state. Mixing those responsibilities into the same process would couple deployment lifecycles and contaminate failure domains.

5.2 Why Directus owns the schema

Even though the Processor writes directly to the database, Directus is treated as the owner of the schema. New tables, columns, and relations are defined through Directus. This matters because:

The admin UI and APIs are auto-generated from the schema Directus knows about. Tables created outside Directus are invisible to it.
Permissions are configured per-collection in Directus. Tables it does not manage cannot be permission-controlled through the standard mechanism.
Audit and metadata columns (created_at, updated_at, user_created, etc.) follow Directus conventions; bypassing them inconsistently leads to subtle UI bugs.

The Processor inserts rows into Directus-owned tables, but it respects the schema as defined. This is a normal Directus deployment pattern — Directus does not require sole write access to its database, only schema authority.

5.3 Extension surface

Directus extensions are used for things that genuinely belong in the business layer:

Hooks that react to data changes (e.g. when an event is written, trigger a notification Flow).
Custom endpoints for operations that need permissions, audit, and orchestration but are not throughput-critical.
Custom admin UI panels for back-office workflows that benefit from being inside the Directus admin (e.g. data review, manual overrides, bulk operations).
Flows for declarative orchestration — when X happens, do Y, Z, and notify W.

Extensions are not used for long-running listeners, persistent network sockets, or anything in the telemetry hot path. Those belong in dedicated services.

5.4 Real-time delivery

Directus's WebSocket subscriptions are used to push live data to the React SPA. When the Processor writes a new row (a position, an event), Directus broadcasts the change to subscribed clients. This is sufficient for moderate fan-out scenarios — tens to low hundreds of concurrent subscribers.

If real-time fan-out becomes a bottleneck (many clients each subscribing to many streams), a dedicated WebSocket gateway can be introduced that reads directly from Redis Streams and pushes to clients, bypassing Directus for the live channel only. The REST/GraphQL surface remains in Directus. This is a future evolution, not a day-one concern.

6. React SPA

6.1 Why a separate SPA

The Directus admin UI is designed for data managers — generic CRUD over collections and fields. It is the right tool for back-office editing and for operators who think in records. It is the wrong tool for end users who think in domain concepts.

A separate React SPA delivers:

Domain-shaped UX — screens organized around the user's mental model, not the database schema.
Independent deployment — the front-end can ship on its own cadence without touching Directus.
Targeted access control — public-facing or partner-facing routes can be served without exposing the admin surface.
Mobile and offline considerations — designs and bundles can be tuned for the actual user environment, separate from the desktop-oriented admin UI.

6.2 Single app, role-based views

Unless there is a strong reason to split, a single React application serves multiple user types via role-based routing and conditional UI. All users authenticate through Directus; the SPA receives a JWT, reads the user's role, and renders the appropriate navigation and screens.

Splitting into multiple apps is only justified when the user populations are genuinely disjoint (e.g. a public marketing-style site versus an authenticated operator console) or when bundle size for one audience would meaningfully harm another.

6.3 Data access pattern

The SPA talks exclusively to Directus. It uses:

REST or GraphQL for queries and mutations, via the official Directus SDK.
WebSocket subscriptions for live data, via the same SDK.
JWT authentication managed by the SDK; refresh handled transparently.

The SPA never talks to the Processor, Ingestion, Redis, or the database directly. This boundary is what allows the back-end to evolve internally without breaking the front-end, and what keeps the security model coherent — every request goes through Directus's permission system.

6.4 Suggested stack

A pragmatic, well-supported stack for this kind of application:

Vite + React + TypeScript — fast builds, strong typing, broad ecosystem.
TanStack Router for routing — better TypeScript support than React Router, file-based routing optional.
TanStack Query for server state — caching, invalidation, background refresh, optimistic updates.
@directus/sdk for typed Directus access and real-time subscriptions.
MapLibre GL with react-map-gl for live map views — open source, WebGL-based, no token requirements.
shadcn/ui + Tailwind for UI primitives — fast to assemble, consistent, professional.
Zustand for client-only state that does not fit the server-state model (filters, UI prefs).
react-hook-form + Zod for forms and validation.

This stack covers the spectrum from form-heavy admin screens to real-time map dashboards without architectural changes between them.

6.5 Real-time rendering considerations

For live map views with many moving markers, the React reconciler is not the bottleneck — the actual rendering happens in WebGL via MapLibre, which manages its own layer of features outside React's tree. The React layer is responsible for managing subscription state and feeding the map with updates; the map handles drawing.

For high-frequency tabular updates (live leaderboards, event feeds), the standard React patterns apply: split components so that high-update areas re-render in isolation, use TanStack Query to manage live data, and consider memoization at component boundaries that receive frequent updates.

7. Cross-cutting concerns

7.1 Data flow summary

The end-to-end path of a single telemetry record:

Device sends a binary frame over TCP.
Ingestion parses the frame, ACKs the device, pushes a normalized record to a Redis Stream.
Processor consumes the record, updates in-memory device state, applies domain rules, writes the position to the database, and emits any derived events.
Directus exposes the new data through its API and broadcasts changes to subscribed clients.
React SPA receives the update via a WebSocket subscription and renders it.

End-to-end latency under healthy conditions is dominated by the Processor's logic and the database write — typically tens of milliseconds. Under load, Redis Streams absorbs bursts without back-pressuring the device-facing sockets.

7.2 Failure domains

Each component fails independently:

Ingestion crash — devices reconnect, in-flight frames are retransmitted by the device per protocol, no data is lost beyond what was unacknowledged.
Redis loss — Streams are persisted; restart resumes from disk. A complete Redis loss is recoverable from device retransmits and from Processor checkpointing.
Processor crash — Stream consumer-group offsets ensure the next instance picks up where the last left off. In-memory state is rehydrated from the database.
Directus crash — telemetry continues to flow into the database. The admin UI and SPA are unavailable, but no telemetry is lost.
Database loss — the system stops accepting writes. This is the only single point of failure and is addressed through standard PostgreSQL operational practices (replication, backups, point-in-time recovery via TimescaleDB).

The architecture deliberately makes the database the only part of the system that requires careful operational attention. Everything else is restartable, replaceable, or naturally redundant.

7.3 Deployment topology

A typical Docker-based deployment:

gps-ingest — exposes a TCP port on the host (TCP-aware reverse proxies are used only when TLS termination or sharding is needed).
gps-processor — internal container, no exposed ports, reads from Redis.
redis — internal container, persistence enabled.
postgres (with TimescaleDB extension) — persistence volume, regular backups.
directus — behind an HTTP reverse proxy (e.g. Nginx Proxy Manager) with TLS.
react-spa — static build served behind the same HTTP reverse proxy.

Components scale independently. Ingestion and Processor scale horizontally for throughput; Directus scales horizontally for HTTP load; the database scales vertically and through read replicas for analytics workloads.

7.4 Observability

Each component emits its own telemetry:

Ingestion — connection counts, frame rates, parse errors, ACK latencies.
Processor — consumer lag, processing latency per record, rule outcomes, in-memory state size.
Directus — request latencies, error rates, active subscriptions.
SPA — basic client-side error reporting.

A standard observability stack (Prometheus + Grafana, or a managed equivalent) suffices. The Redis Stream consumer lag is the single most important metric for spotting trouble — it reflects the health of the entire telemetry pipeline in one number.

7.5 Security boundaries

Devices speak plain TCP by default; if confidentiality is required, a TLS-terminating proxy fronts the Ingestion service, or the protocol's own encryption is used where supported.
Redis is internal-only and never exposed.
The database is internal-only and accessed only by Directus and the Processor.
Directus enforces all user-facing authentication and authorization. The SPA holds JWTs; refresh is handled by the SDK.
The admin UI is access-controlled by Directus role and can be restricted at the proxy level (IP allow-listing, separate hostname) for additional defense in depth.

No component other than Directus exposes a user-facing API. This keeps the security model coherent and auditable.

8. Evolution and future considerations

The architecture is intentionally conservative — it favors well-understood components and clear boundaries over novelty. It supports growth along several axes without structural changes:

More device vendors — add protocol adapters in the Ingestion layer.
More throughput — scale Ingestion and Processor horizontally; partition Redis Streams; add read replicas to the database.
More users — scale the SPA (CDN-served static bundle) and Directus instances; introduce a dedicated WebSocket gateway if subscription fan-out becomes a bottleneck.
More domain complexity — extend the Processor; add Directus Flows for orchestration; introduce additional consumer services on Redis Streams for parallel concerns (analytics, archival, third-party integrations).
Multi-region — replace Redis with NATS or Kafka; introduce regional Ingestion and Processor clusters writing to a central or regional database tier.

Structural changes (e.g. replacing Node.js with Go for the Ingestion layer, splitting the SPA into multiple apps, introducing a separate analytics pipeline) are possible but not required by the design. They become considerations when specific bottlenecks or operational concerns make them worthwhile, not before.

9. Summary

The platform is built around four components with clean boundaries:

Component	Concern	Scales by	Failure impact
TCP Ingestion	Protocol I/O	Horizontal sharding	Devices reconnect; brief retransmit
Processor	Domain logic	Consumer groups	Queue absorbs; resume from offset
Directus	Business API & admin	Horizontal HTTP	Admin/UI unavailable; telemetry unaffected
React SPA	Presentation	CDN/static	UI unavailable; back-end unaffected

Telemetry flows in one direction: devices → Ingestion → Streams → Processor → database → Directus → SPA. Each hop is decoupled from the next, each component owns a single concern, and each can be restarted, scaled, or replaced without touching the others.

This shape is the foundation. Domain logic, business workflows, and user-facing features are built on top of it without disturbing it.

24 KiB Raw Blame History