note: Phase 1.5 IMEI/UUID fix + Phase 2 cold-load fix + positions decision
Three log entries plus matching wiki updates from yesterday and today's work: [2026-05-03] Phase 1.5 live broadcast fix (yesterday). snapshot.ts and device-event-map.ts joined positions.device_id (text/IMEI) directly against entry_devices.device_id (uuid). Subscribe returned an empty snapshot (Postgres 42883), and device-event-map cache misses meant the streaming path classified every record as orphan. Fixed in processor f2c64a2: both queries hop through the devices table. Embedded lesson: integration test fixtures must mirror production join shapes — the existing fixture had entry_devices.device_id text, masking the prod type mismatch. [2026-05-04 morning] Positions-as-collection abandoned. Pushed db-init/004 (PRIMARY KEY device_id, ts) and the always-re-apply runner; boot logs confirmed re-apply 001/002/003, apply 004. But Directus still emits "Collection 'positions' doesn't have a primary key column": Directus introspection requires a SINGLE-column PK, and TimescaleDB requires the partition column in every unique index — these constraints are mutually exclusive on the positions hypertable. Operator faulty-flag UI re-scoped to a custom Directus endpoint extension, deferred until after Rally Albania 2026-06-06. Migration 004 left in place (harmless, redundant with positions_device_ts). [2026-05-04 afternoon] Phase 2 cold-load truly works. Three independent lifecycle bugs in spa/src/map/core (committed separately in trm/spa, 8859c95): Map.loaded() deadlock on styles without `sprite`; styledata cascade triggered by installSprites' addImage calls (87k events in 28s); BasemapSwitcher remount cycle re-firing setBasemap on every gate cycle. Verified end-to-end via Playwright on a local backends-only stack (trm/deploy/compose.local.yaml, e9592cd). Resolves the "SPA Phase 2 is the biggest fish" debt — Phase 3 dogfood readiness is next. Wiki updates: - position-record.md: §"device_id is the IMEI — not the business devices.id" capturing the storage shape and join chain through devices - processor-ws-contract.md: corrected deviceId field semantics (Phase 1 ships IMEI text on the wire, not devices.id uuid as originally spec'd); flagged as documented divergence; added Server-side data resolution section with the snapshot + device-event-map SQL - directus-schema-draft.md §"Position flagging": full architectural decision record for the custom-endpoint approach, replacing the earlier "exposed as a Directus collection" plan
This commit is contained in:
@@ -196,3 +196,72 @@ Wiki realignments landed in this session:
|
||||
1. **Runner gap — `apply-db-init.sh` doesn't verify schema state on subsequent boots.** The runner records success in `migrations_applied` and trusts that exclusively; in-file assertion blocks (e.g. the `DO $$ ... RAISE EXCEPTION` block at the bottom of `002_positions_hypertable.sql`) only run during apply, not on skip. Out-of-band drops produce silent drift — exactly today's failure mode. Two cheap mitigations: (a) re-run idempotent files unconditionally (cheap given `IF NOT EXISTS` everywhere), or (b) per-migration `_check.sql` companion files the runner executes even when skipping. Worth a hardening task in directus's planning.
|
||||
|
||||
2. **Positions hypertable as a Directus collection — primary-key blocker.** Discussed the design tension: positions DDL lives in `directus/db-init/` (TimescaleDB-specific, must exist before Directus boots), but Directus refuses to register the table as a collection because `002_positions_hypertable.sql` deliberately omits a PRIMARY KEY (per its divergence note 6, calling unique-index "more idiomatic" for hypertables). Directus introspection requires a PK to expose the table — log evidence: `WARN: Collection "positions" doesn't have a primary key column and will be ignored`. To enable the operator `faulty` workflow described in [[directus-schema-draft]], a future migration `004_positions_primary_key.sql` would `ALTER TABLE positions ADD PRIMARY KEY (device_id, ts)` and `DROP INDEX positions_device_ts` (now redundant). PKs that include the partition column are legal on hypertables; the divergence note's preference for unique-index is a soft style choice, not a correctness constraint. Not done in this session — pending user go-ahead.
|
||||
|
||||
## [2026-05-03] note | Live broadcast SQL fix — IMEI/UUID translation
|
||||
|
||||
Phase 1.5 was effectively dead in production: `snapshot.ts` and `device-event-map.ts` joined `positions.device_id` (text/IMEI) directly against `entry_devices.device_id` (uuid FK to `devices.id`). Two coordinated symptoms:
|
||||
|
||||
1. **Snapshot crash.** Subscribe → `subscribed { snapshot: [] }` because Postgres rejected the join with `operator does not exist: uuid = text` (42883); registry caught the error and returned an empty snapshot, masking the failure in surface UX.
|
||||
2. **Streaming silence.** `device-event-map` cache keyed on `entry_devices.device_id` (uuid); `broadcast.ts:141` looked up by `position.device_id` (imei). Cache missed every record → all positions classified as orphans → no fan-out, no errors.
|
||||
|
||||
`broadcast.ts:53–64` had documented the IMEI/UUID divergence ("Phase 1's positions table stores the raw IMEI as device_id") but the join code in `snapshot.ts` and `device-event-map.ts` didn't follow through. Fix in `processor` `f2c64a2`: both queries hop through the `devices` table (`devices.imei = positions.device_id`, `devices.id = entry_devices.device_id`); `device-event-map` aliases `d.imei AS device_id` so cache keys remain IMEI strings and `broadcast.ts` is unchanged.
|
||||
|
||||
End-to-end verified via Playwright on `app.dev.trmtracking.org/ws-live` after Komodo redeploy: snapshot returns the live device's last position; 12 streamed `position` frames in a 30s window from IMEI `350424064163619` on event `9ddeba93-...` (Rally Albania 2026 dev seed scaffolded today via `directus-dev` MCP — org `msc-albania`, vehicle, devices, entry+entry_devices).
|
||||
|
||||
**Durable lesson — integration-test fixture schemas must mirror production join shapes.** `processor/test/fixtures/test-schema.sql` shipped with `entry_devices.device_id text` to match the broken production join, so 178/178 unit tests + 6 integration scenarios passed against fiction. Production exposed it on first real subscribe. Updated the fixture schema to add a `devices` table and change `entry_devices.device_id` to `uuid REFERENCES devices(id)`; integration test seed now inserts `devices` first.
|
||||
|
||||
Wiki updates landed in this session:
|
||||
- [[position-record]] — added §"`device_id` is the IMEI — not the business `devices.id`" capturing the storage shape, the join chain through `devices`, and why this is true ([[plane-separation]] means [[tcp-ingestion]] writes positions without a business-plane round-trip, so the only identifier it has is the IMEI).
|
||||
- [[processor-ws-contract]] — corrected the `deviceId` field semantics (Phase 1 implementation ships **IMEI text** on the wire, not `devices.id` uuid as originally specified — flagged as a documented divergence rather than a silent contract change). Added a "Server-side data resolution" section with the snapshot SQL and device-event-map SQL so any replacement gateway service reproduces the join shape rather than rediscovering it.
|
||||
|
||||
Three open architectural debts still outstanding (carried over from the earlier 2026-05-03 entry):
|
||||
1. `apply-db-init.sh` runner gap — schema verification on skip.
|
||||
2. `positions` hypertable PK — `004_positions_primary_key.sql` would unblock Directus collection registration for the operator `faulty` workflow.
|
||||
3. Wire-format IMEI/UUID closure — Phase 2 question, not blocking dogfood.
|
||||
|
||||
## [2026-05-04] note | Positions-as-collection abandoned; faulty-flag UI deferred to custom endpoint
|
||||
|
||||
Pushed `directus/db-init/004_positions_primary_key.sql` (PK `(device_id, ts)`) plus the runner hardening (`apply-db-init.sh` always re-applies, treats `migrations_applied` as a log). Boot logs confirmed the runner behaved as designed: `re-apply 001/002/003`, `apply 004`, summary `4 total (1 first-apply, 3 re-apply)`. **But Directus still emits `WARN: Collection "positions" doesn't have a primary key column and will be ignored`.**
|
||||
|
||||
Root cause: Directus introspection requires a **single-column** primary key per collection (confirmed against Directus docs); composite PKs trigger the exact warning above. TimescaleDB requires the partitioning column (`ts`) to be in **every** unique index on a hypertable. The two constraints are mutually exclusive on `positions` — no PK shape satisfies both. Migration 004's premise was wrong.
|
||||
|
||||
**Decision (architectural):** the operator faulty-flag UI ships as a **custom Directus endpoint extension**, not as a Directus-introspected collection. Endpoint exposes flag candidates and accepts PATCH-by-`(device_id, ts)`, emits the recompute webhook, enforces the same dynamic-filter authorization. Operator interface lives in the SPA (or a custom Directus module) consuming the endpoint. Migration 004 is left in place — the PK is harmless (redundant with `positions_device_ts` unique index) and may have other uses.
|
||||
|
||||
**Deferred until after Rally Albania 2026-06-06.** Per the dogfood event scope ("manual workflows for penalties/results/faulty-flag"), the flag stays manual SQL for the test event.
|
||||
|
||||
Wiki update: [[directus-schema-draft]] §"Position flagging" now carries the full decision record (the previous "exposed as a Directus collection" claim was the wrong premise) and the corresponding bullet in "Decisions made" was rewritten. [[position-record]] delegates to the schema draft and required no edit. Resolves debt #2 from the previous entry — not by implementing it as planned, but by abandoning the approach.
|
||||
|
||||
Open debts now:
|
||||
1. ~~`apply-db-init.sh` runner gap~~ — resolved (always-re-apply runner).
|
||||
2. ~~`positions` hypertable PK / Directus collection~~ — abandoned (custom endpoint deferred post-Albania).
|
||||
3. Wire-format IMEI/UUID closure — Phase 2 question, not blocking dogfood.
|
||||
4. **SPA Phase 2 live map UX** — markers, trails, rAF coalescer, event picker. Not built. Biggest remaining blocker for the dogfood event (without the map there's nothing user-visible at Rally Albania).
|
||||
|
||||
## [2026-05-04] note | SPA Phase 2 cold-load truly works — three lifecycle bugs found and fixed
|
||||
|
||||
Started the day investigating "Phase 2 done in commits, but `/monitor` shows only the map canvas — no controls." [[react-spa]]'s `phase-2-live-map/README.md` claimed all 9 tasks shipped, ROADMAP.md still said `⬜ Not started` (drift). Code archaeology confirmed the README — every task had a `feat:` commit + a `docs: backfill` companion. So Phase 2 was *committed* but never *verified* against the production environment, and yesterday's processor IMEI/UUID fix was the first time the WS path produced anything for `/monitor` to consume.
|
||||
|
||||
Stood up a backends-only local stack (`trm/deploy/compose.local.yaml`, new file — Postgres + Redis + Directus + processor + tcp-ingestion, no Traefik, no SPA service; SPA runs via `pnpm dev` and Vite's dev proxy) so the cold-load could be debugged with source maps + devtools. Seeded the same Rally Albania 2026 shape via the `directus-local` MCP. Three independent lifecycle bugs in `spa/src/map/core/`:
|
||||
|
||||
1. **`Map.loaded()` deadlock.** `<MapView>`'s `onStyleData` waited for `_map.loaded()` before opening the gate. `Map.loaded()` checks `style.imageManager.isLoaded()` internally, and that flag only flips true after MapLibre fetches a `sprite` URL declared in the style JSON. Our styles in `map/core/styles.ts` deliberately omit `sprite` — we manage images ourselves via `installSprites()`. So the predicate stayed false forever, gate never opened. Confirmed via React fiber dump: `MapView.useMapReady() === false` while every other precondition was satisfied.
|
||||
|
||||
2. **`installSprites` → `styledata` cascade.** Even after replacing `loaded()` with `isStyleLoaded() && areTilesLoaded()`, instrumentation showed 87,494 styledata events in 28 seconds. Cause: `installSprites()` calls `map.addImage()` ~32 times; each `addImage` triggers a `styledata` event; each event re-entered `onStyleData`, scheduling another `installSprites`, ad infinitum. The original code masked this by virtue of `loaded()` always returning false — the loop never reached `installSprites` at all. Fix: stop listening to `styledata` (too noisy — fires for every internal style mutation including image registration, source data updates), listen to `idle` instead, scoped per-`setStyle` via `_map.once('idle', …)`. `idle` only fires when style + tiles + transitions are all settled, which is exactly the lifecycle signal we need. Also extracted a `setBasemap(style)` API so style swaps go through one supported entry point that closes the gate, calls `setStyle`, and schedules the handshake.
|
||||
|
||||
3. **`BasemapSwitcher` bootstrap remount cycle.** With (1) and (2) fixed, the map briefly opened then slammed shut every 350ms. Cause: `BasemapSwitcher` is a child of `<MapView>`, so it unmounts when the gate closes and remounts when it reopens. Its bootstrap `useEffect` (which applies the user's saved basemap preference at first mount) re-fired on every remount, calling `setBasemap` again, which closed the gate again. Fix: a module-level `_basemapBootstrapped` flag so the bootstrap fires once per page load, not once per mount. This bug had been present in the original code too — it was the deeper cause of the gate never opening; (1) and (2) were proximate causes that masked it.
|
||||
|
||||
End-to-end Playwright verification on local: cold reload `/monitor` → all controls render → EventPicker dropdown opens on click → basemap swap (Esri → Topo) completes with one clean handshake cycle (`scheduled → idle-fired @ +316ms → sprites-installed → ready-true`). Body text reads "Tiles © OpenTopoMap … Rally Albania 2026 Satellite Topo Street TRAILS None Selected All Follow Live" — every Phase 2 component mounted. The "Live" connection chip means the WS to processor connected via Vite's `/ws-live` proxy (since both Directus and processor are bound to localhost on the host, same-origin holds via the proxy).
|
||||
|
||||
Side-issue surfaced and resolved during the same session: tcp-ingestion locally, when the user forwarded WAN port 5027 to the PC, packets weren't reaching the container. Root cause was not Docker — `wslrelay.exe` (Microsoft's WSL2 port forwarder) listens only on `127.0.0.1` regardless of the `0.0.0.0:5027:5027` binding inside the WSL VM. Bridge with one elevated `netsh interface portproxy add v4tov4 listenaddress=0.0.0.0 listenport=5027 connectaddress=127.0.0.1 connectport=5027`. Long-term cleanup: WSL2 mirrored networking mode in `~/.wslconfig`. Not Phase 2 work, but logged here for future reference if anyone else stumbles into it.
|
||||
|
||||
Wiki/planning updates landing in the same push:
|
||||
- `trm/spa/.planning/ROADMAP.md` — Phase 2 row flipped from `⬜ Not started` to `🟩 Done` with the last commit reference.
|
||||
- `trm/spa/src/map/core/map-view.tsx` — replaces `styledata`-driven gate with `idle`-event handshake, exposes `setBasemap()` as the supported style-swap entry, prepends (not appends) the map container so floating cards paint above. Header docstring on `scheduleStyleLoadHandshake` explains the `idle`-vs-`styledata` rationale so the next person doesn't revert it.
|
||||
- `trm/spa/src/map/core/basemap-switcher.tsx` — uses `setBasemap()`, gates the bootstrap useEffect with a module-level flag.
|
||||
- `trm/deploy/compose.local.yaml` (new file) — Postgres + Redis + Directus + processor + tcp-ingestion bound to host ports for local debugging via Vite dev proxy. No Traefik, no SPA service. Header comment documents the `WEBSOCKETS_REST_PATH` divergence from `compose.dev.yaml` (default `/websocket` here vs `/ws-business` on dev, because Vite's proxy rewrites `/ws-business` → `/websocket`).
|
||||
|
||||
Resolves debt #4 from the previous entry — Phase 2 cold-load is verified end-to-end. Open debts:
|
||||
1. ~~`apply-db-init.sh` runner gap~~ — resolved.
|
||||
2. ~~`positions` hypertable PK / Directus collection~~ — abandoned.
|
||||
3. Wire-format IMEI/UUID closure — Phase 2 backend question, not blocking dogfood.
|
||||
4. ~~SPA Phase 2 cold-load lifecycle~~ — resolved today; pending push to `dev`.
|
||||
5. **Phase 3 dogfood readiness** — error boundaries, mobile responsive baseline, per-device detail panel, operator-friendly empty/loading states. The next biggest fish for Rally Albania 2026-06-06.
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
title: Position Record
|
||||
type: concept
|
||||
created: 2026-04-30
|
||||
updated: 2026-05-01
|
||||
updated: 2026-05-03
|
||||
sources: [gps-tracking-architecture, teltonika-ingestion-architecture, teltonika-data-sending-protocols]
|
||||
tags: [data-model, boundary-contract]
|
||||
---
|
||||
@@ -68,3 +68,30 @@ The Position type above is the **wire shape** — what [[tcp-ingestion]] produce
|
||||
- **`faulty: boolean`** (default `false`) — set after the fact by track operators through [[directus]] when a position is unrealistic (jumpy GPS, impossible coordinate/speed). [[processor]] evaluators filter `WHERE faulty = false` on every read; flagged positions are excluded from peak-speed calculations, crossing detection, and recompute. See the operator workflow in the [[directus-schema-draft]].
|
||||
|
||||
This field exists only at rest. Ingestion and the live channel never see it; it has no meaning until a human reviews the data.
|
||||
|
||||
## `device_id` is the IMEI — not the business `devices.id`
|
||||
|
||||
`Position.device_id` (and the `positions.device_id` column it lands in) is the **IMEI text string** announced by the device during the Teltonika handshake — the vendor-issued hardware identifier. It is **not** the [[directus]] business key `devices.id` (a uuid).
|
||||
|
||||
Every join from positions into the business plane must translate between the two via the `devices` table:
|
||||
|
||||
```
|
||||
positions.device_id (text/IMEI)
|
||||
⇅
|
||||
devices.imei (text) ⇆ devices.id (uuid)
|
||||
⇅
|
||||
entry_devices.device_id (uuid FK)
|
||||
⇅
|
||||
entries.id → entries.event_id
|
||||
```
|
||||
|
||||
Any query that joins `positions` directly to `entry_devices` (or any uuid-keyed table) hits Postgres `42883` — `operator does not exist: uuid = text`. This bit Phase 1.5's live broadcast on 2026-05-03; both `snapshot.ts` and `device-event-map.ts` shipped with the direct (broken) join, and the integration-test fixture had `entry_devices.device_id text` to match — so the production type mismatch was masked by a deliberately simplified fixture. Fixed in `processor` `f2c64a2`. Test fixtures must mirror production join shapes from now on.
|
||||
|
||||
The wire format from the Processor's WebSocket also carries the IMEI as `deviceId` (not the business uuid) — see [[processor-ws-contract]] §"Field semantics — `deviceId`" for that side of the contract.
|
||||
|
||||
Why store the IMEI rather than the uuid:
|
||||
- The IMEI is what the device knows about itself — it doesn't need to look up its own row in [[directus]] before reporting.
|
||||
- [[tcp-ingestion]] writes positions without any business-plane round-trip ([[plane-separation]]), so the only identifier it has at write time is the IMEI.
|
||||
- Devices can move between `entry_devices` rows across events; positions are an immutable per-IMEI record.
|
||||
|
||||
If/when a Phase 2 redesign moves to uuid-keyed positions, this is the section to revise — and `processor` `src/live/snapshot.ts` + `src/live/device-event-map.ts` both lose the translation hop.
|
||||
|
||||
@@ -429,9 +429,17 @@ Devices emit faulty data — jumpy GPS, impossible coordinates, unrealistic spee
|
||||
|
||||
### Mechanism
|
||||
|
||||
The positions hypertable in [[postgres-timescaledb]] carries a `faulty boolean DEFAULT false` column. The hypertable is **exposed as a Directus collection** (read + update) so operators can flip the flag through the admin UI like any other row. Position records are owned by the [[processor]] (write side, telemetry plane), but the `faulty` flag is exclusively a business-plane operator concern.
|
||||
The positions hypertable in [[postgres-timescaledb]] carries a `faulty boolean DEFAULT false` column. Position records are owned by the [[processor]] (write side, telemetry plane), but the `faulty` flag is exclusively a business-plane operator concern.
|
||||
|
||||
Permission scope: operators with role `race-director` (or a more specific `track-operator` role added to the `organization_users.role` enum if granularity is needed) get update access to `positions.faulty`, scoped via the same dynamic-filter Policy pattern — only positions whose `device_id` belongs to a device entered in an event of the operator's org.
|
||||
> **Architectural decision (2026-05-04): operator UI ships as a custom Directus endpoint extension, not as a Directus-introspected collection.**
|
||||
>
|
||||
> The original plan exposed `positions` as a regular collection so operators could flip the flag from the admin UI. That doesn't work: Directus's schema introspection requires a **single-column** primary key per collection, and TimescaleDB requires the partitioning column (`ts`) to be in **every** unique index on a hypertable. These two constraints are incompatible — any PK that satisfies Directus violates the hypertable contract, and vice versa. Confirmed empirically (db-init migration 004 added `PRIMARY KEY (device_id, ts)`; Directus still emits `WARN: Collection "positions" doesn't have a primary key column and will be ignored`) and against Directus docs (collections require a single PK field).
|
||||
>
|
||||
> The forward path is a custom **endpoint extension** in [[directus]] that surfaces flag candidates, accepts PATCH-by-`(device_id, ts)`, and emits the same recompute webhook described below. The operator interface lives in the SPA (or a custom Directus module) consuming that endpoint — `positions` itself is never registered as a collection. Migration 004 is left in place; the PK is harmless (redundant with the existing unique index) and we may want it for other reasons later.
|
||||
>
|
||||
> **Deferred until after Rally Albania 2026-06-06.** For the dogfood event the faulty-flag workflow is manual SQL, consistent with the "manual workflows for penalties/results/faulty-flag" scope cut on that event.
|
||||
|
||||
Permission scope (when the endpoint extension is built): operators with role `race-director` (or a more specific `track-operator` role added to the `organization_users.role` enum if granularity is needed) authorize against the endpoint, scoped via the same dynamic-filter logic — only positions whose `device_id` belongs to a device entered in an event of the operator's org.
|
||||
|
||||
### Effect on the processor
|
||||
|
||||
@@ -470,7 +478,7 @@ This is a third recompute kind alongside formula recompute (cheap, snapshot arit
|
||||
- **Speed limit penalties are progressive (slice-by-slice).** Each bracket contributes only to the portion of the input within its range — same as income tax. `peak_overspeed × per-bracket-rate` summed across brackets the peak crossed.
|
||||
- **`retroactive` defaults differ by what's edited.** Formulas default `true` (math fixes apply across the field). Geometry defaults `false` (physical crossings stand on their own). Per-edit override at save time.
|
||||
- **`entry_penalties` snapshot inputs and rule rows.** Recompute is pure arithmetic from snapshots, not re-derivation from raw GPS — the rare exception being geometry retroactive changes.
|
||||
- **Positions carry a `faulty` flag.** Operator-controlled, default `false`, set after the fact through Directus when a GPS reading is unrealistic. [[processor]] filters `WHERE faulty = false` on every read; flagging a position triggers a windowed recompute of affected penalties. The hypertable is exposed as a Directus collection for this workflow.
|
||||
- **Positions carry a `faulty` flag.** Operator-controlled, default `false`, set after the fact when a GPS reading is unrealistic. [[processor]] filters `WHERE faulty = false` on every read; flagging a position triggers a windowed recompute of affected penalties. **The flag is *not* exposed via a Directus collection** — Directus's single-column-PK requirement collides with TimescaleDB's partition-column-in-unique-index requirement on `positions`. Operator UI ships as a custom Directus endpoint extension (deferred until after Rally Albania 2026-06-06; manual SQL for the dogfood event). See "Position flagging" section above for the full decision record.
|
||||
- **Start order is per-stage, declarative, strategy-driven.** Stages carry `start_order_strategy` + params, materialized into `entry_segment_starts` at stage-open time, scoped per category. Penalties never feed the next stage's grid; only clean SS time does. Operator overrides via `manual_override` for late arrivals.
|
||||
- **Stages have a `role`.** `prologue` / `regular` / `epilogue`. Drives default strategy choice and lets the standings logic exclude the prologue from overall-time totals when a federation requires that. Most rallies = one prologue + N regulars; some end with an epilogue seeded by inverse-of-overall.
|
||||
- **CP missing vs CP-late-past-closing are distinct event types** with the same penalty formula. Rally Albania §12.4 (missing) and §12.6 (arrived after closing hour) both pay "worst valid + 120 min" but trigger on different processor signals — one from "no crossing detected within stage window," one from "crossing detected but after `time_control_closed_at`." Both surface as `entry_penalties` rows with distinct `type`s sharing a formula row.
|
||||
|
||||
@@ -161,7 +161,7 @@ Field semantics:
|
||||
|---|---|---|---|
|
||||
| `type` | `"position"` | yes | Discriminator. |
|
||||
| `topic` | string | yes | Echoes the subscription. Allows multiplexing on one connection. |
|
||||
| `deviceId` | uuid | yes | The `devices.id` (not the IMEI). SPA looks up device → entry → vehicle/crew via TanStack Query against [[directus]]. |
|
||||
| `deviceId` | string | yes | **Phase 1: the IMEI** (vendor identifier, e.g. `"350424064163619"`) — same value as `Position.device_id` per [[position-record]]. Originally specified as `devices.id` (uuid) here; the implementation diverged because [[tcp-ingestion]] only knows the IMEI at write time and the live channel ships the same identifier through end-to-end. SPA joins `deviceId` → `devices.imei` to look up entry/vehicle/crew via TanStack Query against [[directus]]. Closing this divergence (uuid on the wire) is a Phase 2 question; not blocking dogfood. |
|
||||
| `lat` / `lon` | number (degrees, WGS84) | yes | GPS coordinates. **Coordinate order in JSON is `lat`/`lon`** (not `[lon,lat]` GeoJSON ordering — that conversion happens in the SPA). |
|
||||
| `ts` | number (epoch milliseconds, UTC) | yes | Authoritative timestamp from the device's GPS fix. **Always use this, never `Date.now()` on the client.** |
|
||||
| `speed` | number (km/h) | optional | Omitted if device reports speed=0 with invalid GPS fix (per [[teltonika]] convention). |
|
||||
@@ -203,6 +203,43 @@ On reconnect, the client **must re-subscribe to all previously-active topics**.
|
||||
|
||||
The server should accept reconnects from the same user without rate-limiting at pilot scale. Phase 3 may add a per-user concurrent-connection cap.
|
||||
|
||||
## Server-side data resolution
|
||||
|
||||
The producer (currently [[processor]]; potentially the lifted-out gateway service in [[live-channel-architecture]] §"Scale considerations") must answer two queries against [[postgres-timescaledb]] to honour this contract. Both are documented here so any replacement implementation reproduces the join shape rather than rediscovering it.
|
||||
|
||||
### 1. Snapshot at subscribe time
|
||||
|
||||
For `event:<eventId>`, return the **latest non-faulty position per device** registered to the event. Implemented as `DISTINCT ON (p.device_id) ... ORDER BY p.device_id, p.ts DESC`:
|
||||
|
||||
```sql
|
||||
SELECT DISTINCT ON (p.device_id)
|
||||
p.device_id, p.latitude, p.longitude, p.ts, p.speed, p.angle
|
||||
FROM positions p
|
||||
JOIN devices d ON d.imei = p.device_id -- text = text
|
||||
JOIN entry_devices ed ON ed.device_id = d.id -- uuid = uuid
|
||||
JOIN entries e ON e.id = ed.entry_id
|
||||
WHERE e.event_id = $1
|
||||
AND p.faulty = false
|
||||
ORDER BY p.device_id, p.ts DESC;
|
||||
```
|
||||
|
||||
### 2. Device → events map (for streaming fan-out)
|
||||
|
||||
Refreshed on a configurable cadence (default 30 s, env var `LIVE_DEVICE_EVENT_REFRESH_MS`). The map is keyed on **IMEI** because [[redis-streams]] payloads carry the IMEI as `position.device_id`:
|
||||
|
||||
```sql
|
||||
SELECT d.imei AS device_id, e.event_id
|
||||
FROM entry_devices ed
|
||||
JOIN devices d ON d.id = ed.device_id
|
||||
JOIN entries e ON e.id = ed.entry_id;
|
||||
```
|
||||
|
||||
### Why the `devices` translation hop is mandatory
|
||||
|
||||
`positions.device_id` is **text/IMEI**; `entry_devices.device_id` is **uuid FK** to `devices.id`. Joining them directly produces Postgres `42883` (`operator does not exist: uuid = text`). The `devices` table is the bridge — it carries the only column (`devices.imei`) that has the same type as `positions.device_id` *and* the only column (`devices.id`) the entry/event chain references. Skip the hop and the snapshot crashes and the device→event cache silently misses every record. See [[position-record]] §"`device_id` is the IMEI — not the business `devices.id`" for the underlying reason.
|
||||
|
||||
> Lesson banked from the 2026-05-03 fix: **integration-test fixture schemas must mirror production join shapes**, not "simplified" approximations. Phase 1.5 shipped with `test/fixtures/test-schema.sql` setting `entry_devices.device_id text` to match the broken join — so 178/178 unit tests + 6 integration scenarios passed against fiction. Production exposed it on first real subscribe.
|
||||
|
||||
## Multi-instance behaviour
|
||||
|
||||
When [[processor]] (or the gateway service) runs more than one replica:
|
||||
|
||||
Reference in New Issue
Block a user