Files
tcp-ingestion/.planning/phase-1-telemetry/03-config-and-logging.md
T
julian 90d6a73a60 Sync ROADMAP statuses with landed work; mark 1.10/1.12/1.13 as paused
Tasks 1.1-1.9 marked done with their landing commit SHAs. Tasks 1.10
(observability), 1.12 (production hardening), and 1.13 (device
authority) marked paused with explicit resume triggers — pilot
deployment on real Teltonika hardware takes priority. Task 1.11
remains as next, in slimmed form for the pilot (no /readyz healthcheck
since the metrics endpoint is part of paused 1.10).
2026-04-30 16:49:07 +02:00

4.2 KiB

Task 1.3 — Configuration & logging

Phase: 1 — Inbound telemetry Status: 🟩 Done — landed in commit 1e9219d Depends on: 1.1 Wiki refs: docs/wiki/sources/gps-tracking-architecture.md § Deployment topology, § Observability

Goal

Provide a single source of truth for runtime configuration (env-var-driven, validated at startup, fail-fast on misconfiguration) and a structured JSON logger.

Deliverables

  • src/config/load.ts:
    • Exports loadConfig(): Config that parses process.env through a zod schema, returning a typed Config object. Throws with a clear error message on missing/malformed values.
    • All env vars optional in dev (with sensible defaults) and required in production-like deployments. Use NODE_ENV to gate.
  • src/observability/logger.ts:
    • Exports a configured pino logger. JSON output by default; pretty-printed via pino-pretty only when NODE_ENV === 'development' (lazy-loaded so it's not in the prod bundle).
    • Log level controlled by LOG_LEVEL env var (default info in production, debug in development).
    • Adds a service: 'tcp-ingestion' and instance_id (from INSTANCE_ID env var or a generated short UUID at startup) to every log line.

Specification

Config schema (zod)

const ConfigSchema = z.object({
  NODE_ENV: z.enum(['development', 'test', 'production']).default('development'),
  INSTANCE_ID: z.string().min(1).default(() => `local-${randomUUID().slice(0, 8)}`),
  LOG_LEVEL: z.enum(['fatal', 'error', 'warn', 'info', 'debug', 'trace']).default('info'),

  // Vendor port bindings — extend as adapters are added.
  TELTONIKA_PORT: z.coerce.number().int().min(1).max(65535).default(5027),

  // Redis
  REDIS_URL: z.string().url(),
  REDIS_TELEMETRY_STREAM: z.string().min(1).default('telemetry:teltonika'),
  REDIS_STREAM_MAXLEN: z.coerce.number().int().min(0).default(1_000_000), // approximate cap

  // Observability
  METRICS_PORT: z.coerce.number().int().min(0).max(65535).default(9090),

  // Phase 2 (planned, not used in Phase 1)
  // COMMANDS_OUTBOUND_STREAM_PREFIX: z.string().default('commands:outbound'),
});

export type Config = z.infer<typeof ConfigSchema>;

The Phase 2 fields are commented out so they do not become runtime requirements before Phase 2 ships. Add them when Phase 2 is in flight.

Logger conventions

  • Always emit JSON in production (pino default).
  • Always include: time, level, service, instance_id, msg.
  • Adapter log lines include imei when known; framing log lines include codec_id when applicable; CRC failures include expected_crc, computed_crc, frame_length.
  • Use logger.child({ imei }) to scope a logger per session, so subsequent log lines auto-include the IMEI.
  • Never log raw frame payloads at info or above — they're huge and may contain sensitive telemetry. At debug, truncate to first/last 16 bytes.

Failure mode

loadConfig() is called once in main.ts. If it throws, the process exits with a non-zero code and a single human-readable line listing the missing/invalid keys. Do not fall back to silent defaults for required keys — the operational habit we want is "missing config = process refuses to start," not "process starts and behaves weirdly later."

Acceptance criteria

  • Calling loadConfig() with REDIS_URL unset throws and the error names REDIS_URL specifically.
  • Calling loadConfig() in dev with NODE_ENV=development and only REDIS_URL set returns a fully valid Config with sensible defaults for everything else.
  • The logger emits JSON when NODE_ENV=production and pretty-printed text when NODE_ENV=development.
  • logger.child({ imei: '...' }) produces lines with imei included.

Risks / open questions

  • INSTANCE_ID default is a random UUID per process start — fine for dev, but in production K8s/compose deployments, set it explicitly to a stable identifier (pod name, hostname, etc.). The Phase 2 connection registry depends on INSTANCE_ID being stable across the lifetime of the process; document this in the deployment notes (task 1.11).
  • Log volume could be high under load. Pino is fast (~100k+ lines/sec on modern hardware) but consider useOnlyCustomLevels or sampling for the busiest events (e.g. per-frame debug logs).

Done

(Fill in once complete.)