Implement Phase 1 task 1.10 (Prometheus metrics + /healthz + /readyz)
Replaces the placeholder Metrics shim with a prom-client implementation in src/observability/metrics.ts: all 10 Phase 1 metrics from the wiki spec, plus nodejs_* defaults. Exposes /metrics, /healthz, /readyz over node:http on METRICS_PORT (9090); /readyz returns 503 when Redis status is not 'ready' or the TCP listener isn't bound. The Metrics interface in src/core/types.ts is unchanged — adapter call sites continue to use the same inc/observe shape. Only main.ts sees the extended type that adds serializeMetrics(). Side effects: - Dockerfile re-enables HEALTHCHECK pointing at /readyz, and EXPOSE 9090. - frame-ingested log downgraded back to debug now that teltonika_records_published_total is scrapeable. - 19 new unit tests covering exposition format, all metric types, and every HTTP endpoint path. Total now 98 passing. Note: deploy/compose.yaml still does not expose 9090 — separate decision about how Prometheus reaches the service (host port vs. internal scraper on the same Docker network).
This commit is contained in:
@@ -41,7 +41,7 @@ These rules govern every task. Any deviation must be discussed and documented as
|
||||
|
||||
### Phase 1 — Inbound telemetry (Codec 8, 8E, 16)
|
||||
|
||||
**Status:** 🟨 In progress (core implementation done; observability + hardening + device authority paused for pilot test)
|
||||
**Status:** 🟨 In progress (observability landed; hardening + device authority paused for pilot test)
|
||||
**Outcome:** A production-ready Node.js TCP server ingesting Teltonika telemetry from any FMB/FMC/FMM/FMU device, publishing normalized `Position` records to Redis Streams, with full observability and CI/CD via Gitea.
|
||||
|
||||
[**See `phase-1-telemetry/README.md`**](./phase-1-telemetry/README.md)
|
||||
@@ -57,18 +57,17 @@ These rules govern every task. Any deviation must be discussed and documented as
|
||||
| 1.7 | [Codec 16 parser (incl. Generation Type)](./phase-1-telemetry/07-codec-16.md) | 🟩 | `381287b` |
|
||||
| 1.8 | [Redis Streams publisher & main wiring](./phase-1-telemetry/08-redis-publisher.md) | 🟩 | `af06973` |
|
||||
| 1.9 | [Fixture suite & testing strategy](./phase-1-telemetry/09-fixture-suite.md) | 🟩 | `381287b` |
|
||||
| 1.10 | [Observability (Prometheus metrics)](./phase-1-telemetry/10-observability.md) | ⏸ | *deferred — see below* |
|
||||
| 1.10 | [Observability (Prometheus metrics)](./phase-1-telemetry/10-observability.md) | 🟩 | *(pending commit SHA)* |
|
||||
| 1.11 | [Dockerfile & Gitea workflow](./phase-1-telemetry/11-dockerfile-and-ci.md) | 🟩 | `88b742d` (slim pilot variant) |
|
||||
| 1.12 | [Production hardening](./phase-1-telemetry/12-production-hardening.md) | ⏸ | *deferred — see below* |
|
||||
| 1.13 | [Device authority (Redis allow-list refresher)](./phase-1-telemetry/13-device-authority.md) | ⏸ | *deferred — see below* |
|
||||
|
||||
#### Deferred (resume after the real-device pilot test)
|
||||
|
||||
These three tasks are paused so we can get the service onto real hardware as fast as possible. They are paused, not cancelled — each must be completed before the service is considered production-ready.
|
||||
These two tasks are paused so we can get the service onto real hardware as fast as possible. They are paused, not cancelled — each must be completed before the service is considered production-ready.
|
||||
|
||||
- **1.10 Observability (Prometheus metrics).** Tracking via the placeholder `Metrics` interface for now. **Resume trigger:** as soon as the pilot is generating real traffic and we want to measure it, *or* before any second instance is deployed (without metrics, "consumer lag" and "unknown codec" alerts cannot fire).
|
||||
- **1.12 Production hardening.** Graceful shutdown is a stub today; uncaught-exception handlers are minimal. **Resume trigger:** before the pilot graduates to "always-on" or before any deployment that does rolling restarts. Acceptable for a manual pilot where we can stop/start the process by hand.
|
||||
- **1.13 Device authority (Redis allow-list refresher).** Default `AllowAllAuthority` accepts every IMEI; observability of `known | unknown` is moot until 1.10 lands. **Resume trigger:** when Directus has a `devices` collection publishing the allow-list to Redis, or when the operational picture demands rejecting unknown IMEIs (`STRICT_DEVICE_AUTH=true`).
|
||||
- **1.13 Device authority (Redis allow-list refresher).** Default `AllowAllAuthority` accepts every IMEI. **Resume trigger:** when Directus has a `devices` collection publishing the allow-list to Redis, or when the operational picture demands rejecting unknown IMEIs (`STRICT_DEVICE_AUTH=true`).
|
||||
|
||||
When resuming any of these, change the status from ⏸ back to ⬜ or 🟨 here and in the task file's status badge, and clear the deferral note in the task file.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user