Files
julian dec2d190ce Task 1.2 — db-init runner script
scripts/apply-db-init.sh implements the boot-time runner that walks
db-init/*.sql in numeric-prefix order, applies each via psql, and
records successful applications in a migrations_applied guard table
so re-runs are no-ops.

All 7 acceptance criteria pass live against the dev compose stack:
empty dir, missing env var, apply, idempotent re-run, checksum
mismatch, filename collision, broken SQL.

Two retroactive Dockerfile corrections folded in (exposed by the
first live-test attempt of 1.2's script):

1. apk add bash. The directus/directus:11.17.4 base is Alpine and
   ships ash via BusyBox, not bash. The script uses bash-specific
   features (associative arrays, [[ ]], mapfile, BASH_REMATCH) and
   fails at line 69 in sh.

2. .gitattributes added at repo root forcing LF on *.sh, *.sql,
   *.yaml, *.yml. Without it, Windows checkouts with core.autocrlf=true
   (the Git-for-Windows default) silently inject CRLF, causing
   "bad interpreter: /usr/bin/env bash^M" inside the Linux container.
   This failure mode only manifests in the container.

Both corrections are documented in 01-project-scaffold.md's Done
section; 02-db-init-runner.md's Done section captures the live-test
results, the corrected docker compose run --entrypoint commands, and
the gotcha about compose env defaults masking missing-env-var tests.

ROADMAP marks 1.2 done; 1.3 next.
2026-05-01 22:35:17 +02:00

12 KiB
Raw Permalink Blame History

Task 1.2 — db-init runner script

Phase: 1 — Slice 1 schema + deploy pipeline Status: Not started Depends on: 1.1 Wiki refs: docs/wiki/entities/postgres-timescaledb.md, docs/wiki/entities/directus.md (Schema management section)

Goal

Implement scripts/apply-db-init.sh — the boot-time runner that walks db-init/*.sql in numeric order, applies each via psql against the configured Postgres, and records successful applications in a migrations_applied guard table so re-runs are no-ops. This is the foundation Phase 1 (and every later phase) depends on for non-Directus DDL.

Deliverables

  • scripts/apply-db-init.sh — POSIX-compatible bash. Does the following, in order:
    1. Wait for Postgres readiness. Loop calling pg_isready -h $DB_HOST -p $DB_PORT -U $DB_USER -d $DB_DATABASE until success or timeout (configurable, default 60 s). Exit non-zero on timeout with a clear log message.
    2. Bootstrap the guard table.
      CREATE TABLE IF NOT EXISTS migrations_applied (
        filename TEXT PRIMARY KEY,
        applied_at TIMESTAMPTZ NOT NULL DEFAULT now(),
        checksum TEXT NOT NULL
      );
      
    3. Walk db-init/*.sql in numeric-prefix order (sorted lexically; the NNN_ prefix enforces order). For each file:
      • Compute sha256sum of the file contents → checksum.
      • Query migrations_applied WHERE filename = <basename>.
      • If a row exists and the checksums match → log skip filename and continue.
      • If a row exists and checksums DON'T match → log error and exit non-zero. (Migrations are append-only; never edit a file once applied.)
      • If no row exists → apply the file via psql -v ON_ERROR_STOP=1 -f <path>. On success, insert the row. On failure, exit non-zero with the SQL error.
    4. Log a one-line summary at the end: db-init complete: <N> applied, <M> skipped.

Specification

  • Environment variables expected: DB_HOST, DB_PORT, DB_USER, DB_PASSWORD, DB_DATABASE. Plus DB_INIT_DIR (default /directus/db-init) and DB_INIT_TIMEOUT_SECONDS (default 60).
  • Use PGPASSWORD for psql auth — exported in the script before psql calls, never printed in logs.
  • Each migration runs in a single transaction by virtue of psql -v ON_ERROR_STOP=1 -1 -f. The -1 flag wraps the whole file in BEGIN/COMMIT. (Some statements like CREATE EXTENSION or CREATE INDEX CONCURRENTLY can't run in a transaction — those go in their own files without -1 if needed. Document the exception inline.)
  • Numeric-prefix convention. 001_, 002_, …, 999_. Pad to 3 digits; gives 999 slots which is well beyond what we'll need.
  • Filename uniqueness. Two files can't share a prefix. Lint check at script start: detect collisions, error out before applying anything.
  • Logging. One line per file at INFO level. Failure logs include the psql exit code and the offending file. No SQL output to stdout (verbose psql output goes to stderr and is suppressed unless DEBUG=1 is set).
  • Idempotency. Running the script twice in a row → second run does zero psql work beyond the readiness check + guard-table query.
  • Exit codes. 0 = success, 1 = readiness timeout, 2 = checksum mismatch, 3 = psql error, 4 = filename collision.

Acceptance criteria

  • Script is executable (chmod +x), shebang is #!/usr/bin/env bash.
  • set -euo pipefail at the top.
  • Against a fresh Postgres, no db-init/*.sql files yet → script creates migrations_applied table, prints "0 applied, 0 skipped", exits 0.
  • After 1.3 lands, script applies all three migrations on first run (3 applied, 0 skipped), no-ops on second run (0 applied, 3 skipped).
  • Manually editing an applied file → next run exits 2 with a clear "checksum mismatch" error.
  • Adding two files with the same numeric prefix → script exits 4 before applying anything.
  • Killing Postgres mid-run during file 002 → script exits 3 with the psql error; on next run, file 002 retries cleanly. Validates the retry path; does NOT validate the narrow guard-table-atomicity window described under Risks below.

Risks / open questions

  • CREATE EXTENSION inside a transaction. Some Postgres extensions can be created inside a transaction (timescaledb, postgis), some cannot (pg_partman with parallel apply). For Phase 1 the only extension is timescaledb, which is fine. Re-evaluate per phase.
  • Concurrent boots. If two Directus containers boot against the same DB at the same time (rolling deploy), both will try to apply migrations. The guard table's PRIMARY KEY on filename makes the insert race-safe, but two containers running the same psql -f at once is risky. Mitigation for Phase 1: assume single-replica boot during deploy; Phase 3+ revisit if rolling deploy is a goal.
  • Guard-table atomicity gap. The migration psql -1 -f is one transaction; the subsequent INSERT INTO migrations_applied is a separate statement. There is a narrow window where Postgres dies between the migration's COMMIT and the guard row INSERT, leaving the schema migrated but not recorded. Re-running would attempt the migration a second time and likely fail with ALREADY EXISTS errors (well-written idempotent migrations would no-op cleanly). Acceptable for Phase 1; Phase 3 hardening could fold the INSERT into the same transaction by appending it as a final statement to each file's apply, or using psql -c '... ; INSERT INTO migrations_applied ...' as a single command.

Done

Implementation landed and live-verified 2026-05-01. All 7 acceptance criteria pass against the dev compose stack.

Files created at C:\Users\Administrator\projects\trm\directus\:

  • scripts/apply-db-init.sh — 302-line bash runner per the spec.
  • .gitattributes*.sh, *.sql, *.yaml, *.yml forced to LF. Required, not optional (see Implementation findings below).

Both files staged via git add + git update-index --chmod=+x; entrypoint.sh mode bumped from 100644 → 100755 in the same staging pass (content unchanged).

Implementation findings:

  1. .gitattributes is a required deliverable, not optional. The original spec didn't mention it. On Windows dev machines with core.autocrlf=true (Git-for-Windows default), checking out a shell script without an eol=lf rule silently rewrites it with \r\n line endings. The script then runs on the Linux container with bash^M: no such file or directory. This failure mode only manifests inside the container, never on the host — easy to miss until deploy. The agent added .gitattributes covering *.sh, *.sql, *.yaml, *.yml to lock in LF.

  2. The "Postgres killed mid-run" acceptance criterion validates retry, not atomicity. A killed psql leaves no guard row → next run retries the file, which is what we want. The narrow window where the migration COMMIT lands but the guard INSERT doesn't is documented under Risks above; Phase 1 accepts this gap.

Acceptance criteria — static (passed) vs live (pending Docker run by user):

Passed via static inspection:

  • Executable bit (100755), shebang (#!/usr/bin/env bash), set -euo pipefail on line 51.
  • Required-env-var check happens before any DB call or PGPASSWORD export — verified by code reading.

Live-verified 2026-05-01 against the dev compose stack:

  • Fresh DB, no SQL files → "0 applied, 0 skipped", exit 0.
  • One file → applies (1 applied, 0 skipped); re-run → no-op (0 applied, 1 skipped).
  • Edit applied file → exit 2 with checksum mismatch (both checksums logged).
  • Duplicate prefix → exit 4 before any apply, with both colliding filenames named.
  • Broken SQL → exit 3 with clear error, transaction rolled back, file not recorded.
  • Missing DB_PASSWORD → exit 1 with missing required environment variable(s).

Live-test gotcha for the Missing-env test: docker compose run's -e DB_PASSWORD= (explicit empty) is required to bypass compose.dev.yaml's ${DB_PASSWORD:-directus} fallback. Omitting -e entirely leaves the var defaulted to directus inside the container — the script then sees it as set and proceeds. Documented inline in the corrected test commands above.

Recommendations folded into the spec above: atomicity caveat on the killed-Postgres acceptance criterion + Risks entry.

Recommendation NOT yet folded (deliberate — owner's call): the .gitattributes requirement should land in task 1.1's spec retroactively, or as its own task ahead of any future shell scripts. Currently it's mentioned only here. Suggest a one-line add to task 1.1's "Deliverables" section: .gitattributes enforcing LF for *.sh / *.sql / *.yaml / *.yml. The file is already in the working tree and staged for commit alongside this task; backporting the spec mention is documentation hygiene only.


Live-verification findings (2026-05-01) — required Dockerfile + test-command corrections:

First live-test pass surfaced two real issues that the agent's static analysis missed:

  1. bash is NOT in the directus base image. The upstream directus/directus:11.17.4 is Alpine-based; Alpine ships ash (BusyBox), not bash. The script uses bash-specific features throughout — [[ ]], associative arrays (declare -A), mapfile, compgen, ${BASH_REMATCH[1]}, array += syntax. Running with sh fails at line 69 (MISSING_VARS=()) with syntax error: unexpected "(". Fix: task 1.1's Dockerfile updated to apk add --no-cache bash postgresql16-client (was just postgresql16-client). Folded into task 1.2's commit because the dependency is exposed by 1.2's script. Image must be rebuilt after this change.

  2. docker compose run <service> bash /path/to/script.sh does NOT run the script — it runs Directus. The Dockerfile sets ENTRYPOINT ["/directus/entrypoint.sh"]. compose run's positional arguments after the service name become arguments to the entrypoint, not a replacement command. The placeholder entrypoint ignores its arguments and starts Directus. Fix: test commands must use --entrypoint to override. Corrected commands documented below.

Corrected test commands (use these for live verification):

# Rebuild the image first (picks up bash)
docker compose -f compose.dev.yaml build

# Start a fresh db-only stack
docker compose -f compose.dev.yaml up -d db

# Test A — fresh DB, no SQL files → "0 applied, 0 skipped", exit 0
docker compose -f compose.dev.yaml run --rm --no-deps \
  -e DB_HOST=db -e DB_PORT=5432 -e DB_USER=directus \
  -e DB_PASSWORD=directus -e DB_DATABASE=directus \
  -e DB_INIT_DIR=/directus/db-init \
  --entrypoint /directus/scripts/apply-db-init.sh \
  directus

# Test B — missing DB_PASSWORD → exit 1
docker compose -f compose.dev.yaml run --rm --no-deps \
  -e DB_HOST=db -e DB_PORT=5432 -e DB_USER=directus \
  -e DB_DATABASE=directus \
  --entrypoint /directus/scripts/apply-db-init.sh \
  directus

# Tests CF — interactive shell with bash, override DB_INIT_DIR to a tmp dir
docker compose -f compose.dev.yaml run --rm --no-deps \
  -e DB_HOST=db -e DB_PORT=5432 -e DB_USER=directus \
  -e DB_PASSWORD=directus -e DB_DATABASE=directus \
  -e DB_INIT_DIR=/tmp/test-migrations \
  --entrypoint bash \
  directus

# inside the bash shell:
mkdir /tmp/test-migrations
echo "SELECT 1;" > /tmp/test-migrations/001_test.sql
/directus/scripts/apply-db-init.sh                    # → 1 applied, 0 skipped
/directus/scripts/apply-db-init.sh                    # → 0 applied, 1 skipped
echo "SELECT 2;" > /tmp/test-migrations/001_test.sql
/directus/scripts/apply-db-init.sh                    # → exit 2 (checksum)
echo "SELECT 1;" > /tmp/test-migrations/001_test.sql
echo "SELECT 1;" > /tmp/test-migrations/001_dup.sql
/directus/scripts/apply-db-init.sh                    # → exit 4 (collision)
rm /tmp/test-migrations/001_dup.sql
echo "THIS IS NOT SQL" > /tmp/test-migrations/002_broken.sql
/directus/scripts/apply-db-init.sh                    # → exit 3 (psql error)