Add planning documents for Phase 1 (throughput pipeline) and stub Phases 2-4

ROADMAP.md establishes status legend, architectural anchors pointing at the wiki, and seven non-negotiable design rules — most importantly the core/domain boundary that protects Phase 1 from Phase 2 churn, the schema-authority split (positions hypertable owned here; everything else owned by Directus), and idempotent-writes via (device_id, ts) ON CONFLICT. Phase 1 (throughput pipeline) is fully detailed across 11 task files: scaffold, core types + sentinel decoder, config + logging, Postgres hypertable, Redis Stream consumer, per-device LRU state, batched writer, main wiring, observability, integration test, Dockerfile + Gitea CI. Observability is in Phase 1 (not deferred) — lesson learned from tcp-ingestion task 1.10. Phases 2-4 are stub READMEs. Phase 2 (domain logic) blocks on Directus schema decisions and lists those open questions explicitly. Phase 3 (production hardening) and Phase 4 (future) sketch the task shape.
2026-04-30 21:16:26 +02:00
parent 1a4202f4d1
commit c314ba0902
17 changed files with 1191 additions and 0 deletions
@@ -0,0 +1,76 @@
+# Task 1.3 — Configuration & logging
+
+**Phase:** 1 — Throughput pipeline
+**Status:** ⬜ Not started
+**Depends on:** 1.1
+**Wiki refs:** `docs/wiki/entities/processor.md`
+
+## Goal
+
+Validate environment variables on startup with `zod`, build the pino root logger with the same conventions as `tcp-ingestion` (ISO timestamps, string level labels, instance_id base field), and fail fast with a readable error message if config is invalid.
+
+## Deliverables
+
+- `src/config/load.ts` exporting:
+  - `loadConfig(): Config` — reads `process.env`, runs zod parse, returns a typed `Config`. Throws on invalid input with a multi-line message that names every invalid field.
+  - `Config` type derived from the zod schema.
+- `src/observability/logger.ts` exporting:
+  - `createLogger({ level, nodeEnv, instanceId }): Logger` — pino root logger with base fields `service: 'processor'`, `instance_id`. ISO timestamps via `pino.stdTimeFunctions.isoTime`. Level formatter that emits `"level":"info"` not `"level":30`. In `nodeEnv === 'development'`, use the pino-pretty transport.
+  - `type Logger` re-exported from `pino`.
+- Wire both into `src/main.ts`: `loadConfig()` → `createLogger()` → `logger.info('processor starting')` → exit 0 (still a stub; consumer wiring lands in 1.8).
+
+## Specification
+
+### Environment variables
+
+| Var | Required | Default | Notes |
+|---|---|---|---|
+| `NODE_ENV` | no | `production` | `development` enables pino-pretty |
+| `INSTANCE_ID` | no | `processor-1` | Used in metrics + log base field |
+| `LOG_LEVEL` | no | `info` | `trace` / `debug` / `info` / `warn` / `error` |
+| `REDIS_URL` | yes | — | e.g. `redis://redis:6379` |
+| `POSTGRES_URL` | yes | — | e.g. `postgres://user:pass@db:5432/trm` |
+| `REDIS_TELEMETRY_STREAM` | no | `telemetry:t` | Must match `tcp-ingestion`'s `REDIS_TELEMETRY_STREAM` |
+| `REDIS_CONSUMER_GROUP` | no | `processor` | All Processor instances join this group |
+| `REDIS_CONSUMER_NAME` | no | `${INSTANCE_ID}` | Unique per instance — defaults to instance id |
+| `METRICS_PORT` | no | `9090` | HTTP server port for `/metrics`, `/healthz`, `/readyz` |
+| `BATCH_SIZE` | no | `100` | Max records per `XREADGROUP` call |
+| `BATCH_BLOCK_MS` | no | `5000` | `BLOCK` timeout on `XREADGROUP` when stream is empty |
+| `WRITE_BATCH_SIZE` | no | `50` | Max rows per Postgres `INSERT` |
+| `DEVICE_STATE_LRU_CAP` | no | `10000` | Max devices kept in memory; LRU eviction beyond this |
+
+### Validation rules
+
+- All defaults must be expressed in the zod schema with `.default(...)` so the parsed `Config` is fully typed and never has `undefined` for an optional field.
+- Numeric env vars must be coerced from string and bounded: `BATCH_SIZE` 1–10000, `BATCH_BLOCK_MS` 0–60000, `WRITE_BATCH_SIZE` 1–1000, `DEVICE_STATE_LRU_CAP` 100–1_000_000.
+- `REDIS_URL` and `POSTGRES_URL` must parse as URLs with the expected protocol (`redis:` or `rediss:`; `postgres:` or `postgresql:`).
+- `LOG_LEVEL` must be one of pino's accepted levels.
+
+### Logger conventions
+
+Match `tcp-ingestion/src/observability/logger.ts` line for line where applicable. Future-you grepping across services should see the same shape:
+
+```ts
+const formatters = { level: (label: string) => ({ level: label }) };
+
+if (nodeEnv === 'development') {
+  return pino({ level, base, timestamp: pino.stdTimeFunctions.isoTime, formatters,
+    transport: { target: 'pino-pretty', options: { colorize: true, translateTime: 'SYS:standard', ignore: 'pid,hostname' } } });
+}
+return pino({ level, base, timestamp: pino.stdTimeFunctions.isoTime, formatters });
+```
+
+## Acceptance criteria
+
+- [ ] `pnpm test` covers config validation: missing required vars throw with the right message; invalid URLs throw; bounded numerics throw on out-of-range values.
+- [ ] Running with valid env emits a single `processor starting` info log with `service=processor` and `instance_id=processor-1` base fields.
+- [ ] Running with `NODE_ENV=development` produces colorized output via pino-pretty.
+- [ ] Running with `NODE_ENV=production` produces JSON output with ISO `time` and string `level`.
+
+## Risks / open questions
+
+- `REDIS_CONSUMER_NAME` defaulting to `INSTANCE_ID` means `INSTANCE_ID` must be unique per instance for safe consumer-group operation. Document this in `.env.example` so operators don't accidentally run two instances with the same `INSTANCE_ID`.
+
+## Done
+
+(Fill in once complete: commit SHA, brief notes.)