Replaces the placeholder Metrics shim with a prom-client implementation
in src/observability/metrics.ts: all 10 Phase 1 metrics from the wiki
spec, plus nodejs_* defaults. Exposes /metrics, /healthz, /readyz over
node:http on METRICS_PORT (9090); /readyz returns 503 when Redis status
is not 'ready' or the TCP listener isn't bound.
The Metrics interface in src/core/types.ts is unchanged — adapter call
sites continue to use the same inc/observe shape. Only main.ts sees the
extended type that adds serializeMetrics().
Side effects:
- Dockerfile re-enables HEALTHCHECK pointing at /readyz, and EXPOSE 9090.
- frame-ingested log downgraded back to debug now that
teltonika_records_published_total is scrapeable.
- 19 new unit tests covering exposition format, all metric types, and
every HTTP endpoint path. Total now 98 passing.
Note: deploy/compose.yaml still does not expose 9090 — separate decision
about how Prometheus reaches the service (host port vs. internal scraper
on the same Docker network).
- Emit ISO-8601 timestamps and string level labels (info/warn/...) so
Portainer's log viewer renders seconds and human-readable levels.
- Classify ETIMEDOUT/ECONNRESET/EPIPE/ENOTCONN as info one-liners
rather than warns with stack traces. These are routine on cellular.
- Add an info "frame ingested" line per accepted AVL frame so device
activity is visible at info level until task 1.10 wires up prom-client.
The Redis-publisher integration test uses testcontainers to spin up a real
Redis. On the Gitea CI runner, `container.start()` hangs (likely image-pull
delay or restricted Docker access), and the 60s beforeAll timeout fails the
suite even though both tests ultimately would skip. The skip-on-error path
only fires when start() throws, not when it times out.
Fix: separate unit tests (default) from integration tests (opt-in). The
default `pnpm test` now runs only `test/**/*.test.ts` excluding
`*.integration.test.ts`. A new `pnpm test:integration` script runs them
via `vitest.integration.config.ts` with generous hook/test timeouts for
container startup.
CI runs `pnpm test` and is unaffected by Docker availability. Integration
tests can be run locally or in a future CI job that explicitly provisions
Docker.
- Multi-stage Dockerfile (Node 22 alpine, BuildKit cache, non-root user).
HEALTHCHECK and metrics port (9090) deferred until task 1.10 ships;
comments document the resume.
- .gitea/workflows/build.yml — single build job following the pattern
of other TRM repos (no services/container, ubuntu-latest direct).
Tests + typecheck + lint inline; image tagged :main.
- compose.dev.yaml — local-build variant for verifying Dockerfile
changes pre-push. Production deploy lives in the sibling deploy/ repo.
- .env.example documenting all runtime env vars.
- README updated to point at deploy/ for production and explain CI.
- Task 1.11 marked done (slim variant) in ROADMAP and task file.
Tasks 1.1-1.9 marked done with their landing commit SHAs. Tasks 1.10
(observability), 1.12 (production hardening), and 1.13 (device
authority) marked paused with explicit resume triggers — pilot
deployment on real Teltonika hardware takes priority. Task 1.11
remains as next, in slimmed form for the pilot (no /readyz healthcheck
since the metrics endpoint is part of paused 1.10).
- Bounded in-memory queue (default 10000); overflow throws PublishOverflowError
so the framing layer skips ACK and the device retransmits.
- Background worker drains via XADD with MAXLEN ~ approximate trimming.
- JSON serialization with sentinel encoding for bigint/Buffer/Date; correctly
handles Buffer.prototype.toJSON firing before the replacer.
- AdapterContext.publish(position, codec) with codec-label closure at dispatch
in adapters/teltonika/index.ts; zero changes to the three codec parsers.
- connectRedis with retry-on-startup; main.ts wires the full pipeline.
- installGracefulShutdown stubbed (full hardening in task 1.12).
- 19 new tests (17 unit + 2 Docker-conditional integration). Total 81 passing.