22b1b069df
Initialize CLAUDE.md schema, index, and log; ingest three architecture sources (system overview, Teltonika ingestion design, official Teltonika data-sending protocols) into 7 entity pages, 8 concept pages, and 3 source pages with wikilink cross-references.
1.9 KiB
1.9 KiB
title, type, created, updated, sources, tags
| title | type | created | updated | sources | tags | |||
|---|---|---|---|---|---|---|---|---|
| Failure Domains | concept | 2026-04-30 | 2026-04-30 |
|
|
Failure Domains
Each component of the platform fails independently. The architecture deliberately concentrates operational risk in one place — the database — and keeps everything else restartable, replaceable, or naturally redundant.
Per-component failure behavior
| Component | Crash behavior | Data loss |
|---|---|---|
| tcp-ingestion | Devices reconnect; in-flight frames retransmitted by the device per protocol | None beyond unacknowledged frames |
| redis-streams | Streams are persisted; restart resumes from disk | Recoverable from device retransmits + Processor checkpointing |
| processor | Consumer-group offsets ensure next instance picks up; in-memory state rehydrated from DB | None |
| directus | Telemetry continues to flow into DB; admin UI/SPA unavailable | None |
| postgres-timescaledb | System stops accepting writes | The single point of failure |
| react-spa | UI unavailable | N/A — no state owned |
The discipline behind this
- No component reaches across two plane boundaries — see plane-separation. A failure in one plane cannot cascade through another.
- The TCP handler never blocks on downstream work. Slow Processor or DB pressure is absorbed by redis-streams, not by device sockets.
- Per-device session state lives only on the open socket — Ingestion is trivially restartable.
- The Processor's hot state can always be rehydrated from the DB.
Operational consequence
The database gets careful operational attention — replication, backups, point-in-time recovery via TimescaleDB. Everything else can be restarted, redeployed, or scaled without ceremony.
Canary metric
Redis Streams consumer lag. It reflects the health of the entire telemetry pipeline in one number.