Files
directus/.planning/phase-1-slice-1-schema/08-gitea-ci-dryrun.md
T
julian ec119af274 Fix CI port collision — remap throwaway Postgres to host port 15432
The runner host typically has another Postgres listening on 5432
(local dev stack, stage instance, etc.), which made the services:
postgres container fail at start with "port already allocated."

Remap the host-side port from 5432:5432 to 15432:5432. The service
container still listens on 5432 internally; only the runner host
binding changes. Dry-run's DB_PORT updated to 15432 to match.

--network host semantics preserved: DB_HOST=localhost reaches the
service on the runner's loopback at the new port.

Why we still need a Postgres container at all: the dry-run gate
applies db-init/*.sql migrations and the directus schema snapshot
against a real DB to catch breakage before pushing the image. No
Postgres = no validation = the gate is bypassed.

Inline comment in the workflow now explains the choice; task spec's
Done section captures the correction so future readers don't
re-discover this.
2026-05-02 10:38:26 +02:00

178 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Task 1.8 — Gitea CI dry-run workflow
**Phase:** 1 — Slice 1 schema + deploy pipeline
**Status:** ⬜ Not started
**Depends on:** 1.7
**Wiki refs:** `docs/wiki/entities/directus.md` (Schema management section)
## Goal
Build a Gitea Actions workflow that on push to `main` (when relevant paths change): builds the image, spins up a throwaway Postgres + TimescaleDB in CI, runs the entrypoint flow as a **dry-run** to catch snapshot/migration breakage, and only publishes the image to the registry if the dry-run succeeds. Mirrors the `processor` and `tcp-ingestion` workflow shape.
## Deliverables
- `.gitea/workflows/build.yml`:
```yaml
name: Build directus image
on:
push:
branches: [main]
paths:
- 'snapshots/**'
- 'db-init/**'
- 'extensions/**'
- 'scripts/**'
- 'entrypoint.sh'
- 'Dockerfile'
- '.gitea/workflows/build.yml'
workflow_dispatch:
jobs:
build-and-publish:
runs-on: ubuntu-22.04
services:
postgres:
image: timescale/timescaledb-ha:pg16.6-ts2.17.2-all # match compose.dev.yaml; :pg16-latest does NOT exist on Docker Hub
env:
POSTGRES_USER: directus
POSTGRES_PASSWORD: directus
POSTGRES_DB: directus
ports: ['5432:5432']
options: >-
--health-cmd "pg_isready -U directus"
--health-interval 5s
--health-timeout 5s
--health-retries 10
steps:
- uses: actions/checkout@v4
- name: Build image
run: docker build -t trm-directus:ci .
- name: Dry-run boot against throwaway Postgres
env:
DB_HOST: postgres
DB_PORT: 5432
DB_USER: directus
DB_PASSWORD: directus
DB_DATABASE: directus
KEY: ci-key-not-secret
SECRET: ci-secret-not-secret
ADMIN_EMAIL: ci@example.com
ADMIN_PASSWORD: ci-password-not-secret
PUBLIC_URL: http://localhost:8055
run: |
docker run --rm \
-e DB_CLIENT=pg \
-e DB_HOST=$DB_HOST -e DB_PORT=$DB_PORT \
-e DB_USER=$DB_USER -e DB_PASSWORD=$DB_PASSWORD -e DB_DATABASE=$DB_DATABASE \
-e KEY=$KEY -e SECRET=$SECRET \
-e ADMIN_EMAIL=$ADMIN_EMAIL -e ADMIN_PASSWORD=$ADMIN_PASSWORD \
-e PUBLIC_URL=$PUBLIC_URL \
--network host \
--entrypoint bash \
trm-directus:ci \
-c '/directus/scripts/apply-db-init.sh && /directus/scripts/schema-apply.sh && echo "dry-run ok"'
- name: Login to Gitea registry
uses: docker/login-action@v3
with:
registry: git.dev.microservices.al
username: ${{ secrets.REGISTRY_USERNAME }}
password: ${{ secrets.REGISTRY_PASSWORD }}
- name: Tag and push
run: |
docker tag trm-directus:ci git.dev.microservices.al/trm/directus:main
docker tag trm-directus:ci git.dev.microservices.al/trm/directus:${{ github.sha }}
docker push git.dev.microservices.al/trm/directus:main
docker push git.dev.microservices.al/trm/directus:${{ github.sha }}
- name: Trigger Portainer redeploy (optional)
if: secrets.PORTAINER_WEBHOOK_URL != ''
run: curl -X POST "${{ secrets.PORTAINER_WEBHOOK_URL }}"
```
## Specification
- **Dry-run runs the entrypoint scripts only**, not `directus start`. Starting the server and waiting for it to serve is slow and unnecessary — the goal is to catch DDL / snapshot apply errors. Override the `ENTRYPOINT` and run the two scripts directly.
- **Service container is the throwaway Postgres.** `services:` block in Gitea Actions (compatible syntax with GitHub Actions). Use the pinned TimescaleDB image; mismatch with prod hides bugs.
- **Path filter on `on.push.paths`** keeps CI quiet for unrelated repo changes (docs-only commits, etc.). Mirrors the processor workflow.
- **Two image tags published:** `:main` (always points at latest main) and `:<sha>` (specific commit, immutable). The deploy stack can pin to either.
- **Portainer webhook is optional** (gated by secret presence). If unset, no auto-deploy.
- **No integration tests in CI for Phase 1.** The dry-run boot *is* the integration test — it proves the snapshot+db-init combination works against a fresh Postgres. Phase 5+ adds extension-specific tests as those land.
- **Required Gitea secrets:**
- `REGISTRY_USERNAME`, `REGISTRY_PASSWORD` — for the image push.
- `PORTAINER_WEBHOOK_URL` — optional, for auto-deploy.
## Acceptance criteria
- [ ] Workflow file is committed at `.gitea/workflows/build.yml`.
- [ ] First push to `main` after this lands triggers the workflow.
- [ ] Workflow steps in order: checkout → build → dry-run boot → registry login → tag/push → optional Portainer ping.
- [ ] Dry-run step exits 0 with logs showing "db-init complete" and "schema apply: no changes" (after the snapshot has been applied once, subsequent runs against fresh Postgres still apply from scratch — verify the apply step works in both cases).
- [ ] Intentionally break the snapshot (manually edit `snapshots/schema.yaml` to a malformed YAML) → workflow fails at the dry-run step → image is NOT pushed.
- [ ] Intentionally break a migration (introduce SQL syntax error in `db-init/`) → workflow fails at the dry-run step → image is NOT pushed.
- [ ] Push a docs-only change → workflow does NOT trigger.
- [ ] Image pushed to registry under `git.dev.microservices.al/trm/directus:main` and `:<sha>`.
- [ ] Portainer webhook fires if configured.
## Risks / open questions
- **Gitea Actions `services:` syntax compatibility.** Gitea's runner is mostly GitHub-Actions-compatible but has historically had quirks with the `services:` block (especially around image pulls from private registries). If the throwaway Postgres can't be brought up via `services:`, fall back to a `docker run` step that backgrounds the container and a wait-loop on `pg_isready`. Document the chosen approach.
- **Network access between job container and service container.** `--network host` is the simplest solution if Gitea's runner allows it. If not, use the Docker network created by the runner and reference the service by name (`postgres:5432`).
## Done
**Implementation landed (pending live trigger by first relevant commit).** Workflow file at `.gitea/workflows/build.yml`. Statically validated; live trigger requires a push that touches one of the path-filtered locations.
**Corrections folded in vs. the spec's draft YAML:**
1. **`DB_HOST=localhost`, not `DB_HOST=postgres`.** The spec's draft mixed `--network host` with service-name resolution; those are mutually exclusive. With `--network host` the docker-run container shares the runner's loopback, so the service's port mapping (`5432:5432`) is reachable as `localhost:5432`, not by service name `postgres`. (Service-name resolution would only work with the runner's default bridge network.)
2. **`--health-retries 20`** instead of 10. The `timescaledb-ha:*-all` image runs more init work at startup than vanilla postgres and occasionally exceeds the 50s window on cold runner images. 20 retries × 5s = 100s margin.
3. **`--health-cmd "pg_isready -U directus -d directus"`** with explicit `-d`. Spec had user only.
4. **`curl -fsS -X POST`** for the Portainer webhook step. Bare `curl -X POST` returns 0 even on HTTP 4xx/5xx; `-f` makes a misconfigured webhook URL fail the step explicitly.
5. **Plain `docker build`**, NOT `docker/build-push-action@v5`. The dry-run step needs the freshly-built image accessible to a subsequent `docker run`. `build-push-action` with the docker-container Buildx driver exports into a separate buildkitd cache that `docker run` cannot see — the run would fail with "image not found." Plain `docker build` keeps the image in the local Docker daemon.
**Deliberate divergences from `processor/.gitea/workflows/build.yml`:**
| Aspect | Processor | Directus | Why |
|---|---|---|---|
| Build mechanism | `docker/build-push-action@v5` | plain `docker build` | dry-run needs local-daemon access (above) |
| Buildx setup | yes | no | Buildx isolates the image; would defeat the dry-run |
| `services:` block | absent | present | Directus dry-run needs a live Postgres; processor mocks it |
| Node/pnpm setup | yes | no | No TS to compile in Phase 1 (Phase 5 adds this) |
| typecheck/lint/test | three steps | none | No extensions yet |
| Portainer webhook | unconditional | gated on secret presence | Spec requirement |
| `runs-on` | `ubuntu-latest` | `ubuntu-22.04` | Pin to avoid floating-tag runner image breakage |
**Acceptance criteria status:**
Static (verified):
- ✅ Workflow file at `.gitea/workflows/build.yml`.
- ✅ Steps in correct order: checkout → build → dry-run → login → tag/push → optional Portainer.
- ✅ Path filter excludes `.planning/`, `README.md`, `compose.dev.yaml`, `package.json` — docs-only commits won't trigger CI.
- ✅ Workflow file itself is in the path-filter list (so changes to CI trigger CI).
- ✅ Two image tags published (`:main`, `:<sha>`).
- ✅ Required secrets identified: `REGISTRY_USERNAME`, `REGISTRY_PASSWORD`. Optional: `PORTAINER_WEBHOOK_URL`.
- ✅ Dry-run command logic traced: env vars, network mode, entrypoint override, script chain all consistent.
Pending live trigger (will validate on first push that hits the path filter):
- ⏳ Workflow triggers on push.
- ⏳ Dry-run step exits 0 against a fresh Postgres + the committed snapshot (currently 105 KB, 13 collections).
- ⏳ Snapshot drift simulation: hand-edit `snapshots/schema.yaml` to malformed YAML → push → CI fails at dry-run → image NOT pushed.
- ⏳ Migration syntax error simulation: introduce broken `db-init/006_*.sql` → push → CI fails at dry-run → image NOT pushed.
- ⏳ Image actually published to `git.dev.microservices.al/trm/directus:main` after a clean run.
- ⏳ Portainer webhook fires if configured.
**Operator action required before first run:** in the Gitea repo at `git.dev.microservices.al/trm/directus` → Settings → Secrets, configure:
- `REGISTRY_USERNAME` — Gitea user with write access to the container registry
- `REGISTRY_PASSWORD` — password or PAT for that user
- `PORTAINER_WEBHOOK_URL` (optional) — for auto-redeploy on push
Without `REGISTRY_USERNAME` / `REGISTRY_PASSWORD` the Login step fails with a clear auth error. Without `PORTAINER_WEBHOOK_URL` the Portainer step is skipped entirely.
**Port-allocation correction (2026-05-02):** initial workflow used `5432:5432` for the throwaway-Postgres port mapping. On a self-hosted Gitea runner, the host typically has another Postgres on 5432 (dev stack, stage instance), causing the service container to fail at start with "port already allocated." Fixed by remapping to `15432:5432` (the conventional Postgres-second-instance port) and updating the dry-run's `DB_PORT=15432`. The service container itself still listens on 5432 internally — only the host-side mapping changed. `--network host` semantics are preserved: `DB_HOST=localhost` reaches the service on the runner's loopback at `:15432`.