dec2d190ce
scripts/apply-db-init.sh implements the boot-time runner that walks db-init/*.sql in numeric-prefix order, applies each via psql, and records successful applications in a migrations_applied guard table so re-runs are no-ops. All 7 acceptance criteria pass live against the dev compose stack: empty dir, missing env var, apply, idempotent re-run, checksum mismatch, filename collision, broken SQL. Two retroactive Dockerfile corrections folded in (exposed by the first live-test attempt of 1.2's script): 1. apk add bash. The directus/directus:11.17.4 base is Alpine and ships ash via BusyBox, not bash. The script uses bash-specific features (associative arrays, [[ ]], mapfile, BASH_REMATCH) and fails at line 69 in sh. 2. .gitattributes added at repo root forcing LF on *.sh, *.sql, *.yaml, *.yml. Without it, Windows checkouts with core.autocrlf=true (the Git-for-Windows default) silently inject CRLF, causing "bad interpreter: /usr/bin/env bash^M" inside the Linux container. This failure mode only manifests in the container. Both corrections are documented in 01-project-scaffold.md's Done section; 02-db-init-runner.md's Done section captures the live-test results, the corrected docker compose run --entrypoint commands, and the gotcha about compose env defaults masking missing-env-var tests. ROADMAP marks 1.2 done; 1.3 next.
151 lines
12 KiB
Markdown
151 lines
12 KiB
Markdown
# Task 1.2 — db-init runner script
|
||
|
||
**Phase:** 1 — Slice 1 schema + deploy pipeline
|
||
**Status:** ⬜ Not started
|
||
**Depends on:** 1.1
|
||
**Wiki refs:** `docs/wiki/entities/postgres-timescaledb.md`, `docs/wiki/entities/directus.md` (Schema management section)
|
||
|
||
## Goal
|
||
|
||
Implement `scripts/apply-db-init.sh` — the boot-time runner that walks `db-init/*.sql` in numeric order, applies each via `psql` against the configured Postgres, and records successful applications in a `migrations_applied` guard table so re-runs are no-ops. This is the foundation Phase 1 (and every later phase) depends on for non-Directus DDL.
|
||
|
||
## Deliverables
|
||
|
||
- `scripts/apply-db-init.sh` — POSIX-compatible bash. Does the following, in order:
|
||
1. **Wait for Postgres readiness.** Loop calling `pg_isready -h $DB_HOST -p $DB_PORT -U $DB_USER -d $DB_DATABASE` until success or timeout (configurable, default 60 s). Exit non-zero on timeout with a clear log message.
|
||
2. **Bootstrap the guard table.**
|
||
```sql
|
||
CREATE TABLE IF NOT EXISTS migrations_applied (
|
||
filename TEXT PRIMARY KEY,
|
||
applied_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||
checksum TEXT NOT NULL
|
||
);
|
||
```
|
||
3. **Walk `db-init/*.sql` in numeric-prefix order** (sorted lexically; the `NNN_` prefix enforces order). For each file:
|
||
- Compute `sha256sum` of the file contents → `checksum`.
|
||
- Query `migrations_applied WHERE filename = <basename>`.
|
||
- If a row exists and the checksums match → log `skip filename` and continue.
|
||
- If a row exists and checksums DON'T match → log error and exit non-zero. (Migrations are append-only; never edit a file once applied.)
|
||
- If no row exists → apply the file via `psql -v ON_ERROR_STOP=1 -f <path>`. On success, insert the row. On failure, exit non-zero with the SQL error.
|
||
4. **Log a one-line summary** at the end: `db-init complete: <N> applied, <M> skipped`.
|
||
|
||
## Specification
|
||
|
||
- **Environment variables expected:** `DB_HOST`, `DB_PORT`, `DB_USER`, `DB_PASSWORD`, `DB_DATABASE`. Plus `DB_INIT_DIR` (default `/directus/db-init`) and `DB_INIT_TIMEOUT_SECONDS` (default `60`).
|
||
- **Use `PGPASSWORD` for psql auth** — exported in the script before `psql` calls, never printed in logs.
|
||
- **Each migration runs in a single transaction** by virtue of `psql -v ON_ERROR_STOP=1 -1 -f`. The `-1` flag wraps the whole file in `BEGIN/COMMIT`. (Some statements like `CREATE EXTENSION` or `CREATE INDEX CONCURRENTLY` can't run in a transaction — those go in their own files without `-1` if needed. Document the exception inline.)
|
||
- **Numeric-prefix convention.** `001_`, `002_`, …, `999_`. Pad to 3 digits; gives 999 slots which is well beyond what we'll need.
|
||
- **Filename uniqueness.** Two files can't share a prefix. Lint check at script start: detect collisions, error out before applying anything.
|
||
- **Logging.** One line per file at INFO level. Failure logs include the psql exit code and the offending file. No SQL output to stdout (verbose `psql` output goes to stderr and is suppressed unless `DEBUG=1` is set).
|
||
- **Idempotency.** Running the script twice in a row → second run does zero psql work beyond the readiness check + guard-table query.
|
||
- **Exit codes.** `0` = success, `1` = readiness timeout, `2` = checksum mismatch, `3` = psql error, `4` = filename collision.
|
||
|
||
## Acceptance criteria
|
||
|
||
- [ ] Script is executable (`chmod +x`), shebang is `#!/usr/bin/env bash`.
|
||
- [ ] `set -euo pipefail` at the top.
|
||
- [ ] Against a fresh Postgres, no `db-init/*.sql` files yet → script creates `migrations_applied` table, prints "0 applied, 0 skipped", exits 0.
|
||
- [ ] After 1.3 lands, script applies all three migrations on first run (3 applied, 0 skipped), no-ops on second run (0 applied, 3 skipped).
|
||
- [ ] Manually editing an applied file → next run exits 2 with a clear "checksum mismatch" error.
|
||
- [ ] Adding two files with the same numeric prefix → script exits 4 before applying anything.
|
||
- [ ] Killing Postgres mid-run during file 002 → script exits 3 with the psql error; on next run, file 002 retries cleanly. *Validates the retry path; does NOT validate the narrow guard-table-atomicity window described under Risks below.*
|
||
|
||
## Risks / open questions
|
||
|
||
- **`CREATE EXTENSION` inside a transaction.** Some Postgres extensions can be created inside a transaction (timescaledb, postgis), some cannot (pg_partman with parallel apply). For Phase 1 the only extension is timescaledb, which is fine. Re-evaluate per phase.
|
||
- **Concurrent boots.** If two Directus containers boot against the same DB at the same time (rolling deploy), both will try to apply migrations. The guard table's `PRIMARY KEY` on `filename` makes the insert race-safe, but two containers running the *same* `psql -f` at once is risky. Mitigation for Phase 1: assume single-replica boot during deploy; Phase 3+ revisit if rolling deploy is a goal.
|
||
- **Guard-table atomicity gap.** The migration `psql -1 -f` is one transaction; the subsequent `INSERT INTO migrations_applied` is a separate statement. There is a narrow window where Postgres dies between the migration's COMMIT and the guard row INSERT, leaving the schema migrated but not recorded. Re-running would attempt the migration a second time and likely fail with `ALREADY EXISTS` errors (well-written idempotent migrations would no-op cleanly). Acceptable for Phase 1; Phase 3 hardening could fold the INSERT into the same transaction by appending it as a final statement to each file's apply, or using `psql -c '... ; INSERT INTO migrations_applied ...'` as a single command.
|
||
|
||
## Done
|
||
|
||
**Implementation landed and live-verified 2026-05-01.** All 7 acceptance criteria pass against the dev compose stack.
|
||
|
||
Files created at `C:\Users\Administrator\projects\trm\directus\`:
|
||
- `scripts/apply-db-init.sh` — 302-line bash runner per the spec.
|
||
- `.gitattributes` — `*.sh`, `*.sql`, `*.yaml`, `*.yml` forced to LF. **Required, not optional** (see Implementation findings below).
|
||
|
||
Both files staged via `git add` + `git update-index --chmod=+x`; `entrypoint.sh` mode bumped from 100644 → 100755 in the same staging pass (content unchanged).
|
||
|
||
**Implementation findings:**
|
||
|
||
1. **`.gitattributes` is a required deliverable, not optional.** The original spec didn't mention it. On Windows dev machines with `core.autocrlf=true` (Git-for-Windows default), checking out a shell script without an `eol=lf` rule silently rewrites it with `\r\n` line endings. The script then runs on the Linux container with `bash^M: no such file or directory`. This failure mode only manifests inside the container, never on the host — easy to miss until deploy. The agent added `.gitattributes` covering `*.sh`, `*.sql`, `*.yaml`, `*.yml` to lock in LF.
|
||
|
||
2. **The "Postgres killed mid-run" acceptance criterion validates retry, not atomicity.** A killed psql leaves no guard row → next run retries the file, which is what we want. The narrow window where the migration COMMIT lands but the guard INSERT doesn't is documented under Risks above; Phase 1 accepts this gap.
|
||
|
||
**Acceptance criteria — static (passed) vs live (pending Docker run by user):**
|
||
|
||
Passed via static inspection:
|
||
- ✅ Executable bit (`100755`), shebang (`#!/usr/bin/env bash`), `set -euo pipefail` on line 51.
|
||
- ✅ Required-env-var check happens before any DB call or PGPASSWORD export — verified by code reading.
|
||
|
||
Live-verified 2026-05-01 against the dev compose stack:
|
||
- ✅ Fresh DB, no SQL files → "0 applied, 0 skipped", exit 0.
|
||
- ✅ One file → applies (`1 applied, 0 skipped`); re-run → no-op (`0 applied, 1 skipped`).
|
||
- ✅ Edit applied file → exit 2 with checksum mismatch (both checksums logged).
|
||
- ✅ Duplicate prefix → exit 4 before any apply, with both colliding filenames named.
|
||
- ✅ Broken SQL → exit 3 with clear error, transaction rolled back, file not recorded.
|
||
- ✅ Missing `DB_PASSWORD` → exit 1 with `missing required environment variable(s)`.
|
||
|
||
**Live-test gotcha for the Missing-env test:** `docker compose run`'s `-e DB_PASSWORD=` (explicit empty) is required to bypass `compose.dev.yaml`'s `${DB_PASSWORD:-directus}` fallback. Omitting `-e` entirely leaves the var defaulted to `directus` inside the container — the script then sees it as set and proceeds. Documented inline in the corrected test commands above.
|
||
|
||
**Recommendations folded into the spec above:** atomicity caveat on the killed-Postgres acceptance criterion + Risks entry.
|
||
|
||
**Recommendation NOT yet folded (deliberate — owner's call):** the `.gitattributes` requirement should land in task 1.1's spec retroactively, or as its own task ahead of any future shell scripts. Currently it's mentioned only here. Suggest a one-line add to task 1.1's "Deliverables" section: `.gitattributes` enforcing LF for `*.sh` / `*.sql` / `*.yaml` / `*.yml`. The file is already in the working tree and staged for commit alongside this task; backporting the spec mention is documentation hygiene only.
|
||
|
||
---
|
||
|
||
**Live-verification findings (2026-05-01) — required Dockerfile + test-command corrections:**
|
||
|
||
First live-test pass surfaced two real issues that the agent's static analysis missed:
|
||
|
||
1. **`bash` is NOT in the directus base image.** The upstream `directus/directus:11.17.4` is Alpine-based; Alpine ships `ash` (BusyBox), not bash. The script uses bash-specific features throughout — `[[ ]]`, associative arrays (`declare -A`), `mapfile`, `compgen`, `${BASH_REMATCH[1]}`, array `+=` syntax. Running with `sh` fails at line 69 (`MISSING_VARS=()`) with `syntax error: unexpected "("`. **Fix:** task 1.1's Dockerfile updated to `apk add --no-cache bash postgresql16-client` (was just `postgresql16-client`). Folded into task 1.2's commit because the dependency is exposed by 1.2's script. Image must be rebuilt after this change.
|
||
|
||
2. **`docker compose run <service> bash /path/to/script.sh` does NOT run the script — it runs Directus.** The Dockerfile sets `ENTRYPOINT ["/directus/entrypoint.sh"]`. `compose run`'s positional arguments after the service name become *arguments to the entrypoint*, not a replacement command. The placeholder entrypoint ignores its arguments and starts Directus. **Fix:** test commands must use `--entrypoint` to override. Corrected commands documented below.
|
||
|
||
**Corrected test commands (use these for live verification):**
|
||
|
||
```bash
|
||
# Rebuild the image first (picks up bash)
|
||
docker compose -f compose.dev.yaml build
|
||
|
||
# Start a fresh db-only stack
|
||
docker compose -f compose.dev.yaml up -d db
|
||
|
||
# Test A — fresh DB, no SQL files → "0 applied, 0 skipped", exit 0
|
||
docker compose -f compose.dev.yaml run --rm --no-deps \
|
||
-e DB_HOST=db -e DB_PORT=5432 -e DB_USER=directus \
|
||
-e DB_PASSWORD=directus -e DB_DATABASE=directus \
|
||
-e DB_INIT_DIR=/directus/db-init \
|
||
--entrypoint /directus/scripts/apply-db-init.sh \
|
||
directus
|
||
|
||
# Test B — missing DB_PASSWORD → exit 1
|
||
docker compose -f compose.dev.yaml run --rm --no-deps \
|
||
-e DB_HOST=db -e DB_PORT=5432 -e DB_USER=directus \
|
||
-e DB_DATABASE=directus \
|
||
--entrypoint /directus/scripts/apply-db-init.sh \
|
||
directus
|
||
|
||
# Tests C–F — interactive shell with bash, override DB_INIT_DIR to a tmp dir
|
||
docker compose -f compose.dev.yaml run --rm --no-deps \
|
||
-e DB_HOST=db -e DB_PORT=5432 -e DB_USER=directus \
|
||
-e DB_PASSWORD=directus -e DB_DATABASE=directus \
|
||
-e DB_INIT_DIR=/tmp/test-migrations \
|
||
--entrypoint bash \
|
||
directus
|
||
|
||
# inside the bash shell:
|
||
mkdir /tmp/test-migrations
|
||
echo "SELECT 1;" > /tmp/test-migrations/001_test.sql
|
||
/directus/scripts/apply-db-init.sh # → 1 applied, 0 skipped
|
||
/directus/scripts/apply-db-init.sh # → 0 applied, 1 skipped
|
||
echo "SELECT 2;" > /tmp/test-migrations/001_test.sql
|
||
/directus/scripts/apply-db-init.sh # → exit 2 (checksum)
|
||
echo "SELECT 1;" > /tmp/test-migrations/001_test.sql
|
||
echo "SELECT 1;" > /tmp/test-migrations/001_dup.sql
|
||
/directus/scripts/apply-db-init.sh # → exit 4 (collision)
|
||
rm /tmp/test-migrations/001_dup.sql
|
||
echo "THIS IS NOT SQL" > /tmp/test-migrations/002_broken.sql
|
||
/directus/scripts/apply-db-init.sh # → exit 3 (psql error)
|
||
```
|