be48da9baa
src/observability/metrics.ts — full prom-client implementation. All 10
Phase 1 metrics registered (processor_consumer_reads_total,
_records_total, _lag, _decode_errors_total, processor_position_writes_total
{status}, _write_duration_seconds, processor_acks_total,
processor_device_state_{size,evictions_total}) plus nodejs_* defaults.
node:http server with /metrics, /healthz, /readyz. /readyz checks
redis.status === 'ready' AND a 5s-cached SELECT 1 Postgres probe.
processor_consumer_lag sampled every 10s via XINFO GROUPS, falling back
to a no-op when the consumer group hasn't been created yet.
src/main.ts — replaces the trace-logging shim with createMetrics() and
startMetricsServer(); shutdown closes the metrics server before
redis.quit() and pool.end().
test/metrics.test.ts — 22 unit tests: exposition format, every metric
type behaviour, all four HTTP endpoint paths including /readyz 503 cases.
test/pipeline.integration.test.ts — testcontainers Redis 7 +
TimescaleDB latest-pg16. Four scenarios: happy path with bigint+Buffer
attribute round-trip, idempotency on (device_id, ts), malformed payload
stays in PEL (decode_errors_total increments), writer failure → retry
(weaker variant per spec: stop Postgres before publish, restart, verify
row appears). Skip-on-no-Docker pattern verified — exits 0 without
Docker.
Dockerfile — multi-stage matching tcp-ingestion. EXPOSE 9090 only,
HEALTHCHECK on /readyz, image-source label points at processor repo.
.gitea/workflows/build.yml — single-job workflow mirroring
tcp-ingestion. Path filters cover src/, test/, build config, Dockerfile.
Portainer webhook step uncommented for :main auto-deploy.
compose.dev.yaml — local-build variant with Redis + TimescaleDB +
processor-dev for verifying Dockerfile changes without the registry
round-trip.
README.md — fleshed out from stub: quick-start, Docker build, deployment
note, env vars, tests (unit vs. integration), CI behavior. Flags the
deploy-side change needed: deploy/compose.yaml needs a TimescaleDB
service and a processor service entry added.
Verification: typecheck, lint clean; 134 unit tests passing across 8
files (+22 from this batch). pnpm test:integration runs cleanly under
the no-Docker skip pattern.
Phase 1 is now complete. Service is pilot-ready.
59 lines
4.5 KiB
Markdown
59 lines
4.5 KiB
Markdown
# Task 1.10 — Integration test (testcontainers Redis + Postgres)
|
|
|
|
**Phase:** 1 — Throughput pipeline
|
|
**Status:** 🟩 Done
|
|
**Depends on:** 1.5, 1.7, 1.8, 1.9
|
|
**Wiki refs:** —
|
|
|
|
## Goal
|
|
|
|
End-to-end pipeline test: spin up Redis 7 and TimescaleDB via testcontainers, boot the Processor against them, publish a synthetic `Position` to `telemetry:t`, verify the row appears in `positions` with byte-equivalent attribute decoding (bigint, Buffer included).
|
|
|
|
This is the integration test that proves the upstream contract from `tcp-ingestion` flows through end-to-end. Mirror `tcp-ingestion/test/publish.integration.test.ts`'s structure and skip-on-no-Docker pattern.
|
|
|
|
## Deliverables
|
|
|
|
- `test/pipeline.integration.test.ts`:
|
|
- `beforeAll`: start Redis container, start TimescaleDB container, run migrations, build a Processor instance pointed at both. If Docker is unavailable, log a clear skip message and set a flag so all `it` blocks early-return without failing.
|
|
- `afterAll`: stop the Processor, stop containers.
|
|
- Test 1: publish a Position with `bigint` and `Buffer` attributes via `XADD`; wait for the row in `positions` (poll, timeout 10s); assert `device_id`, `ts`, GPS fields, and a JSON round-trip of `attributes` matches the original (bigint as string, Buffer as base64).
|
|
- Test 2: publish two records with the same `(device_id, ts)`; verify only one row in `positions` (idempotency check).
|
|
- Test 3: publish a malformed payload (broken JSON) on the stream; verify `processor_decode_errors_total` increments and the bad entry stays in PEL (not ACKed).
|
|
- Test 4: simulate the writer failing once (e.g. by temporarily shutting Postgres mid-test, then bringing it back); verify the record gets retried and eventually lands.
|
|
|
|
- Use the **TimescaleDB image**, not stock `postgres:7-alpine`. Suggested: `timescale/timescaledb:latest-pg16`. Confirm the migration's `CREATE EXTENSION IF NOT EXISTS timescaledb` no-ops (extension already loaded).
|
|
- Use the same Vitest config split as `tcp-ingestion`: `vitest.integration.config.ts` with `hookTimeout: 120_000`, `testTimeout: 60_000`. Default `pnpm test` excludes `*.integration.test.ts`; opt-in via `pnpm test:integration`.
|
|
|
|
## Specification
|
|
|
|
### Skip-on-no-Docker pattern
|
|
|
|
Copy `tcp-ingestion/test/publish.integration.test.ts`'s pattern verbatim:
|
|
- Try to start the first container in `beforeAll`. On error, set `dockerAvailable = false`, log a warning, and return.
|
|
- Each `it` block early-returns with a `console.warn` if `!dockerAvailable`.
|
|
- This pattern was the fix for the CI test failure on the runner without Docker — keep it.
|
|
|
|
### Synthetic Position publishing
|
|
|
|
Reuse `serializePosition` from `tcp-ingestion`'s `publish.ts` if it can be imported (likely not — separate repos). Otherwise inline the encoding: a Position object → JSON.stringify with the bigint/Buffer replacer → `XADD telemetry:t * ts <iso> device_id <imei> codec 8E payload <json>`.
|
|
|
|
### Why test 4 (writer failure → retry)
|
|
|
|
This validates the core ACK semantics: if a write fails, the record stays pending, and re-delivery brings it back. Without this test, we have unit tests showing each piece behaves correctly, but no proof the pieces compose right. Skip-conditions: if simulating Postgres failure mid-test is too flaky in testcontainers, weaken to: stop Postgres before publishing, publish, start Postgres, verify row appears.
|
|
|
|
## Acceptance criteria
|
|
|
|
- [ ] `pnpm test:integration` runs all four scenarios green when Docker is available.
|
|
- [ ] Without Docker, the suite logs skip messages and exits 0 (does not fail).
|
|
- [ ] CI (`pnpm test`, unit only) does not run these — they are opt-in.
|
|
- [ ] First-run container pull is reasonable; subsequent runs are fast (testcontainers caches the image).
|
|
|
|
## Risks / open questions
|
|
|
|
- **Image pull on first CI run.** The TimescaleDB image is large (~700MB). If we ever wire integration tests into CI (separate job with Docker), pre-pulling may be required. Document but defer.
|
|
- **Test flakiness from polling.** Polling for "row appears in `positions`" uses a 10s timeout. If CI is slow, raise it. Don't replace polling with `await sleep(2000)` — that's reliably wrong.
|
|
|
|
## Done
|
|
|
|
`test/pipeline.integration.test.ts`: four scenarios (happy path with bigint+Buffer, idempotency, malformed payload stays pending, writer failure → retry after Postgres restart). Uses `timescale/timescaledb:latest-pg16`; skip-on-no-Docker pattern verified (exits 0 without Docker). `pnpm test:integration` runs 4 tests green with Docker, 4 skips without. *(pending commit SHA)*
|