diff --git a/.claude/agent-memory/directus-devops-architect/MEMORY.md b/.claude/agent-memory/directus-devops-architect/MEMORY.md index cfe3803..2f1122d 100644 --- a/.claude/agent-memory/directus-devops-architect/MEMORY.md +++ b/.claude/agent-memory/directus-devops-architect/MEMORY.md @@ -3,3 +3,6 @@ - [Directus image facts](reference_directus_image.md) — directus/directus:11.17.4 confirmed on Docker Hub; Alpine-based (node:22-alpine); runs as user `node`; CMD is `node cli.js bootstrap && pm2-runtime start ecosystem.config.cjs` - [TimescaleDB-HA image facts](reference_timescaledb_ha.md) — timescale/timescaledb-ha:pg16-latest includes PostGIS by default; PGDATA=/pgdata (not /var/lib/postgresql/data); env vars: POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB - [Phase 1 scaffold decisions](project_phase1_scaffold.md) — key implementation decisions made during task 1.1 (image pins, volume paths, entrypoint command, apk package name) +- [Phase 1 task 1.2 decisions](project_phase1_task12.md) — apply-db-init.sh: .gitattributes required for LF on Windows, chmod+x via git update-index, collision detection pattern, psql error capture idiom +- [Phase 1 task 1.3 decisions](project_phase1_task13.md) — positions table: 8 divergences between spec and processor migration; processor wins; assertion block patterns for information_schema + pg_indexes checks +- [Phase 1 task 1.8 decisions](project_phase1_task18.md) — CI workflow: no build-push-action (image must be in local daemon for docker run); --network host requires DB_HOST=localhost not postgres; curl -fsS for Portainer webhook diff --git a/.claude/agent-memory/directus-devops-architect/project_phase1_task12.md b/.claude/agent-memory/directus-devops-architect/project_phase1_task12.md new file mode 100644 index 0000000..3f75ec4 --- /dev/null +++ b/.claude/agent-memory/directus-devops-architect/project_phase1_task12.md @@ -0,0 +1,26 @@ +--- +name: Phase 1 task 1.2 — db-init runner decisions +description: Implementation decisions and gotchas from writing apply-db-init.sh +type: project +--- + +# Phase 1 task 1.2 — db-init runner decisions + +## Key decisions + +- **`.gitattributes` required**: `core.autocrlf=true` on the Windows dev machine would corrupt shell scripts at checkout (turning LF → CRLF → `bad interpreter: /usr/bin/env bash^M`). Added `directus/.gitattributes` with `*.sh text eol=lf` and `*.sql text eol=lf`. This is a non-negotiable requirement for any new shell script in the directus service. +- **`git update-index --chmod=+x`**: The only way to set executable bit on Windows without WSL. Must be done after `git add` (not before — the file must already be indexed). If re-staged after a chmod, the bit is preserved as long as `--chmod=+x` was called on the indexed file. +- **Filename collision detection uses `[[ -v "ARRAY[key]" ]]`**: The Bash 4.2+ idiom for checking associative array key existence. Safe in the `node:22-alpine` base image (ships Bash 5.x). Do not use `[[ "${ARRAY[$key]+_}" ]]` — less readable. +- **`set -euo pipefail` + `|| psql_exit=$?`**: The `||` short-circuits `set -e` for that single statement, which is correct and idiomatic Bash. This is the approved pattern for capturing psql exit status without disabling `set -e` globally. +- **`run_psql` wrapper vs inline**: The guard-table bootstrap and the record-insert use `run_psql` / inline psql respectively. The apply step cannot use `run_psql` because it needs `-v ON_ERROR_STOP=1 -1 --file=` which are not generic flags. This duplication is intentional. +- **`compgen -G` for glob check**: Used to detect whether any `*.sql` files exist before calling `ls | sort`. Avoids `ls: cannot access ... No such file or directory` error under `set -e`. +- **Filename quoting in SQL**: Basenames are passed directly into SQL strings with single quotes. This is safe for filenames matching `[0-9]+_[a-z0-9_]+.sql` (the project convention). If someone introduces a filename with a single quote, it would break. Acceptable for now — document if it becomes a concern. + +## What was NOT done (and why) + +- No `trap ERR` — the spec says "if it adds clarity". With `set -euo pipefail` and explicit exit codes on every failure path, ERR trapping would only add noise. +- No lock file for concurrent-boot safety — the task spec acknowledges this risk and defers it to Phase 3+. + +## Acceptance testing status + +Docker was not available in the agent's shell environment. Acceptance tests must be run manually — see the task report for the exact commands. diff --git a/.claude/agent-memory/directus-devops-architect/project_phase1_task13.md b/.claude/agent-memory/directus-devops-architect/project_phase1_task13.md new file mode 100644 index 0000000..63120d5 --- /dev/null +++ b/.claude/agent-memory/directus-devops-architect/project_phase1_task13.md @@ -0,0 +1,48 @@ +--- +name: Phase 1 task 1.3 decisions +description: positions hypertable schema divergences from spec, cross-check against processor migration, and assertion block patterns +type: project +--- + +Task 1.3 authored three SQL migrations under directus/db-init/. The critical finding was a substantial divergence between the task spec (03-initial-migrations.md) and the processor's actual migration (processor/src/db/migrations/0001_positions.sql). The processor migration wins in all cases. + +**Why:** The processor is the sole writer for positions. If the table schema doesn't match what the processor inserts, writes fail at runtime with NOT NULL violations or column-not-found errors. + +**How to apply:** Always read processor/src/db/migrations/0001_positions.sql before writing or modifying the positions table schema. Do not trust 03-initial-migrations.md column list without cross-checking. + +## Divergences found (spec vs. processor ground truth) + +1. **ingested_at** — not in spec, present in processor migration as `timestamptz NOT NULL DEFAULT now()`. Required. +2. **codec** — not in spec, present in processor migration as `text NOT NULL`. Required. +3. **altitude/angle/speed** — spec says `DOUBLE PRECISION nullable`; processor has `real NOT NULL`. Use `real NOT NULL`. +4. **satellites/priority** — spec says nullable; processor has `NOT NULL`. Use NOT NULL. +5. **attributes DEFAULT** — spec adds `DEFAULT '{}'::jsonb`; processor has no default. No default in the migration (processor always supplies the value). +6. **PRIMARY KEY vs UNIQUE INDEX** — spec uses `PRIMARY KEY (device_id, ts)`; processor uses `CREATE UNIQUE INDEX positions_device_ts ON positions (device_id, ts)` (no PK). TimescaleDB idiomatic: unique index, no PK. Index name is `positions_device_ts` (no `_idx` suffix). +7. **Chunk interval** — spec says `INTERVAL '7 days'`; processor uses `INTERVAL '1 day'`. Use 1 day. +8. **Indexes** — spec has one composite `(device_id, ts DESC)` index; processor has two: `positions_device_ts (device_id, ts)` (unique) and `positions_ts (ts DESC)`. Both are required. + +## Assertion block pattern established + +Each migration ends with a `DO $$ DECLARE ... BEGIN ... END $$;` block that: +- Checks table/column existence via `information_schema.columns` +- Checks hypertable registration via `timescaledb_information.hypertables` +- Checks index existence via `pg_indexes` +- Checks index predicate via `pg_indexes.indexdef ILIKE '%where (faulty = false)%'` +- Raises named exceptions on any mismatch + +For the faulty column NOT NULL check, the pattern is: +```sql +SELECT data_type, is_nullable = 'NO', column_default +INTO _col_type, _col_notnull, _col_default +FROM information_schema.columns ... +``` + +The column_default for `DEFAULT FALSE` is stored by Postgres as the string `'false'` (lower-case, no quotes) in information_schema. + +## Files created + +- directus/db-init/001_extensions.sql +- directus/db-init/002_positions_hypertable.sql +- directus/db-init/003_faulty_column.sql + +All three are append-only once applied. The runner's checksum guard (exit 2) enforces this. diff --git a/.claude/agent-memory/directus-devops-architect/project_phase1_task18.md b/.claude/agent-memory/directus-devops-architect/project_phase1_task18.md new file mode 100644 index 0000000..9da3697 --- /dev/null +++ b/.claude/agent-memory/directus-devops-architect/project_phase1_task18.md @@ -0,0 +1,33 @@ +--- +name: Phase 1 task 1.8 decisions +description: Key implementation decisions and divergences from spec made during task 1.8 (Gitea CI dry-run workflow) +type: project +--- + +# Phase 1 task 1.8 — Gitea CI dry-run workflow decisions + +## Core decisions + +**No `docker/build-push-action`**: Used plain `docker build -t trm-directus:ci .` instead of `docker/build-push-action`. +**Why**: `build-push-action` with the docker-container Buildx driver exports the image into a separate buildkitd cache that is NOT accessible to a subsequent `docker run`. The dry-run step needs the image in the local Docker daemon. The processor workflow uses `build-push-action` but it has no post-build dry-run step. +**How to apply**: Any Directus workflow variant that needs to run the image after building must use plain `docker build`, not `build-push-action`. + +**`--network host` + `DB_HOST=localhost`**: Service container is bound via `ports: ['5432:5432']` to the runner's loopback (127.0.0.1:5432). The `docker run` container uses `--network host` to share that namespace, making Postgres reachable as `localhost:5432`. +**Why**: The spec draft had a bug — it used `--network host` but set `DB_HOST: postgres`. With host networking, service containers are NOT reachable by their service name; only `localhost` works. The service name (`postgres`) is only resolvable in bridge-network mode. +**How to apply**: Always use `DB_HOST=localhost` when pairing `--network host` with a `services:` port-mapped container. + +**`health-retries 20`**: Raised from spec's default of 10. +**Why**: The timescaledb-ha image has a slower startup than plain postgres (init script runs TimescaleDB preload). 10 retries at 5s = 50s max wait; 20 retries = 100s, safer margin. + +**Portainer step uses `curl -fsS`**: Added `-f` (fail on HTTP error) and `-sS` (silent but show errors). +**Why**: Bare `curl -X POST` exits 0 even on a 4xx/5xx response. `-f` makes curl exit non-zero on server errors, so a misconfigured webhook URL surfaces as a workflow failure rather than a silent no-op. + +**`--health-cmd` includes `-d directus`**: Spec draft had `pg_isready -U directus` without `-d directus`. Added the `-d` flag for precision. + +**Deliberate divergences from processor workflow**: +- No `actions/setup-node`, no `corepack enable`, no `pnpm install` — Directus is not a Node project; no TypeScript to compile or test. +- No `docker/setup-buildx-action` — Buildx with docker-container driver sequesters images from `docker run`. +- No typecheck/lint/test steps — Phase 1 has no extensions. Phase 5 will add these. +- Added `services:` block — processor has no service dependency. +- Separate build + dry-run + push steps instead of single `build-push-action`. +- `runs-on: ubuntu-22.04` (pinned) vs processor's `ubuntu-latest` (floating). diff --git a/.claude/agent-memory/directus-devops-architect/reference_timescaledb_ha.md b/.claude/agent-memory/directus-devops-architect/reference_timescaledb_ha.md index 97c31a0..73331da 100644 --- a/.claude/agent-memory/directus-devops-architect/reference_timescaledb_ha.md +++ b/.claude/agent-memory/directus-devops-architect/reference_timescaledb_ha.md @@ -1,16 +1,24 @@ --- name: TimescaleDB-HA Docker image facts -description: Key facts about timescale/timescaledb-ha needed for compose and CI configuration +description: Empirically verified facts about timescale/timescaledb-ha used in TRM compose files type: reference --- # TimescaleDB-HA Docker image facts -- **Image used**: `timescale/timescaledb-ha:pg16-latest` -- **PostGIS**: Included by default in all (non-oss) tags. The `-oss` suffix variants ship only OSS-licensed extensions (PostGIS is Apache-licensed, included even in `-oss`). The GitHub README confirms: "By default, the Docker image contains many extensions, including TimescaleDB and PostGIS." -- **`-all` suffix meaning**: Contains multiple Postgres major versions in one image (e.g. pg13+pg14+pg15). NOT about extra extensions. Not needed for TRM. -- **PGDATA directory**: `/pgdata` — set via `PGDATA=/pgdata` env var. NOT the standard `/var/lib/postgresql/data`. Volume mounts must target `/pgdata`. +**CORRECTED after task 1.1 live boot (2026-05-01).** + +- **Image pinned in TRM**: `timescale/timescaledb-ha:pg16.6-ts2.17.2-all` + - `:pg16-latest` does NOT exist on Docker Hub — it 404s at pull time. + - The `-all` suffix here means "all extensions bundled" (TimescaleDB + PostGIS + more), NOT "all PG versions". + - Always pin a concrete `pgX.Y-tsA.B.C-all` tag; floating tags are unreliable for this image. +- **PGDATA directory**: `/home/postgres/pgdata/data` + - NOT `/pgdata`, NOT `/var/lib/postgresql/data`. + - Setting `PGDATA=/pgdata` and mounting a volume to `/pgdata` causes initdb to fail with "could not change permissions of directory". + - compose.dev.yaml correctly sets `PGDATA: /home/postgres/pgdata/data` and mounts to the same path. +- **PostGIS**: Binaries are bundled in the `-all` image but the extension is NOT auto-created on user databases. + - Directus logs a benign warning: `PostGIS isn't installed. Geometry type support will be limited.` + - Must be created explicitly: `CREATE EXTENSION IF NOT EXISTS postgis;` — lands in db-init Phase 2. - **Environment variables**: `POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_DB` (same as official postgres image). -- **Healthcheck**: `pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB` (double-$ in compose YAML to escape interpolation; the container sees single $ at runtime). -- **Compatible with `CREATE EXTENSION postgis`**: Yes, binaries present. -- **Compatible with `CREATE EXTENSION timescaledb`**: Yes, it's the whole point of the image. +- **Healthcheck**: `pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB` (double-$ in compose YAML to escape interpolation). +- **`psql` / `pg_isready` in the Directus container**: Installed via `apk add --no-cache postgresql16-client` in the Dockerfile (task 1.1). Available on PATH at runtime.