Files
directus/.planning/phase-1-slice-1-schema/08-gitea-ci-dryrun.md
T
julian 0f89fea913 Task 1.8 — Gitea CI dry-run workflow
.gitea/workflows/build.yml builds the directus image on path-filtered
pushes to main and validates the boot pipeline against a throwaway
Postgres before pushing the image to the registry. The dry-run is the
gate that catches snapshot drift, broken db-init scripts, or
incompatible schema changes before they reach stage.

Workflow shape (mirrors processor's CI but tailored to Directus):
- Path filter: snapshots/, db-init/, extensions/, scripts/,
  entrypoint.sh, Dockerfile, the workflow file itself.
  Docs-only commits (.planning/, README.md, compose.dev.yaml,
  package.json) do NOT trigger CI.
- Throwaway Postgres via services: block, pinned to the same
  timescale/timescaledb-ha:pg16.6-ts2.17.2-all tag as compose.dev.yaml.
- Plain `docker build` (NOT build-push-action) so the image stays in
  the local daemon for the subsequent docker run dry-run.
- Dry-run: --network host + --entrypoint bash to override the upstream
  entrypoint and run only apply-db-init.sh && schema-apply.sh.
  Skips bootstrap and pm2-runtime — the schema apply is the gate.
- Two image tags: :main (mutable) and :<sha> (immutable).
- Optional Portainer webhook gated on secret presence; curl -fsS so a
  misconfigured URL fails the step explicitly.

Spec corrections folded in (the spec's draft had two contradictions
that would have failed at runtime):
1. DB_HOST=localhost (not 'postgres'). With --network host, service
   containers are reachable on the runner's loopback by their port
   mapping, NOT by service name. Service-name resolution requires the
   default bridge network; --network host overrides it.
2. health-retries 20 (not 10). timescaledb-ha:*-all does more init
   work at boot than vanilla postgres; 50s isn't always enough.

Operator action required in the Gitea repo Settings before first run:
configure REGISTRY_USERNAME and REGISTRY_PASSWORD secrets (required for
push); optionally PORTAINER_WEBHOOK_URL (for auto-deploy).

Live verification deferred to first relevant commit. Documented in the
task spec's Done section: positive (clean snapshot → push succeeds)
and negative (malformed snapshot → halt before push) cases to validate
once CI runs.

ROADMAP marks 1.8 done. Phase 1 progress: 8/9 tasks complete (1.1–1.8);
only 1.9 (Rally Albania 2026 dogfood seed) remains before Phase 1 ships.
2026-05-02 10:04:39 +02:00

176 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Task 1.8 — Gitea CI dry-run workflow
**Phase:** 1 — Slice 1 schema + deploy pipeline
**Status:** ⬜ Not started
**Depends on:** 1.7
**Wiki refs:** `docs/wiki/entities/directus.md` (Schema management section)
## Goal
Build a Gitea Actions workflow that on push to `main` (when relevant paths change): builds the image, spins up a throwaway Postgres + TimescaleDB in CI, runs the entrypoint flow as a **dry-run** to catch snapshot/migration breakage, and only publishes the image to the registry if the dry-run succeeds. Mirrors the `processor` and `tcp-ingestion` workflow shape.
## Deliverables
- `.gitea/workflows/build.yml`:
```yaml
name: Build directus image
on:
push:
branches: [main]
paths:
- 'snapshots/**'
- 'db-init/**'
- 'extensions/**'
- 'scripts/**'
- 'entrypoint.sh'
- 'Dockerfile'
- '.gitea/workflows/build.yml'
workflow_dispatch:
jobs:
build-and-publish:
runs-on: ubuntu-22.04
services:
postgres:
image: timescale/timescaledb-ha:pg16.6-ts2.17.2-all # match compose.dev.yaml; :pg16-latest does NOT exist on Docker Hub
env:
POSTGRES_USER: directus
POSTGRES_PASSWORD: directus
POSTGRES_DB: directus
ports: ['5432:5432']
options: >-
--health-cmd "pg_isready -U directus"
--health-interval 5s
--health-timeout 5s
--health-retries 10
steps:
- uses: actions/checkout@v4
- name: Build image
run: docker build -t trm-directus:ci .
- name: Dry-run boot against throwaway Postgres
env:
DB_HOST: postgres
DB_PORT: 5432
DB_USER: directus
DB_PASSWORD: directus
DB_DATABASE: directus
KEY: ci-key-not-secret
SECRET: ci-secret-not-secret
ADMIN_EMAIL: ci@example.com
ADMIN_PASSWORD: ci-password-not-secret
PUBLIC_URL: http://localhost:8055
run: |
docker run --rm \
-e DB_CLIENT=pg \
-e DB_HOST=$DB_HOST -e DB_PORT=$DB_PORT \
-e DB_USER=$DB_USER -e DB_PASSWORD=$DB_PASSWORD -e DB_DATABASE=$DB_DATABASE \
-e KEY=$KEY -e SECRET=$SECRET \
-e ADMIN_EMAIL=$ADMIN_EMAIL -e ADMIN_PASSWORD=$ADMIN_PASSWORD \
-e PUBLIC_URL=$PUBLIC_URL \
--network host \
--entrypoint bash \
trm-directus:ci \
-c '/directus/scripts/apply-db-init.sh && /directus/scripts/schema-apply.sh && echo "dry-run ok"'
- name: Login to Gitea registry
uses: docker/login-action@v3
with:
registry: git.dev.microservices.al
username: ${{ secrets.REGISTRY_USERNAME }}
password: ${{ secrets.REGISTRY_PASSWORD }}
- name: Tag and push
run: |
docker tag trm-directus:ci git.dev.microservices.al/trm/directus:main
docker tag trm-directus:ci git.dev.microservices.al/trm/directus:${{ github.sha }}
docker push git.dev.microservices.al/trm/directus:main
docker push git.dev.microservices.al/trm/directus:${{ github.sha }}
- name: Trigger Portainer redeploy (optional)
if: secrets.PORTAINER_WEBHOOK_URL != ''
run: curl -X POST "${{ secrets.PORTAINER_WEBHOOK_URL }}"
```
## Specification
- **Dry-run runs the entrypoint scripts only**, not `directus start`. Starting the server and waiting for it to serve is slow and unnecessary — the goal is to catch DDL / snapshot apply errors. Override the `ENTRYPOINT` and run the two scripts directly.
- **Service container is the throwaway Postgres.** `services:` block in Gitea Actions (compatible syntax with GitHub Actions). Use the pinned TimescaleDB image; mismatch with prod hides bugs.
- **Path filter on `on.push.paths`** keeps CI quiet for unrelated repo changes (docs-only commits, etc.). Mirrors the processor workflow.
- **Two image tags published:** `:main` (always points at latest main) and `:<sha>` (specific commit, immutable). The deploy stack can pin to either.
- **Portainer webhook is optional** (gated by secret presence). If unset, no auto-deploy.
- **No integration tests in CI for Phase 1.** The dry-run boot *is* the integration test — it proves the snapshot+db-init combination works against a fresh Postgres. Phase 5+ adds extension-specific tests as those land.
- **Required Gitea secrets:**
- `REGISTRY_USERNAME`, `REGISTRY_PASSWORD` — for the image push.
- `PORTAINER_WEBHOOK_URL` — optional, for auto-deploy.
## Acceptance criteria
- [ ] Workflow file is committed at `.gitea/workflows/build.yml`.
- [ ] First push to `main` after this lands triggers the workflow.
- [ ] Workflow steps in order: checkout → build → dry-run boot → registry login → tag/push → optional Portainer ping.
- [ ] Dry-run step exits 0 with logs showing "db-init complete" and "schema apply: no changes" (after the snapshot has been applied once, subsequent runs against fresh Postgres still apply from scratch — verify the apply step works in both cases).
- [ ] Intentionally break the snapshot (manually edit `snapshots/schema.yaml` to a malformed YAML) → workflow fails at the dry-run step → image is NOT pushed.
- [ ] Intentionally break a migration (introduce SQL syntax error in `db-init/`) → workflow fails at the dry-run step → image is NOT pushed.
- [ ] Push a docs-only change → workflow does NOT trigger.
- [ ] Image pushed to registry under `git.dev.microservices.al/trm/directus:main` and `:<sha>`.
- [ ] Portainer webhook fires if configured.
## Risks / open questions
- **Gitea Actions `services:` syntax compatibility.** Gitea's runner is mostly GitHub-Actions-compatible but has historically had quirks with the `services:` block (especially around image pulls from private registries). If the throwaway Postgres can't be brought up via `services:`, fall back to a `docker run` step that backgrounds the container and a wait-loop on `pg_isready`. Document the chosen approach.
- **Network access between job container and service container.** `--network host` is the simplest solution if Gitea's runner allows it. If not, use the Docker network created by the runner and reference the service by name (`postgres:5432`).
## Done
**Implementation landed (pending live trigger by first relevant commit).** Workflow file at `.gitea/workflows/build.yml`. Statically validated; live trigger requires a push that touches one of the path-filtered locations.
**Corrections folded in vs. the spec's draft YAML:**
1. **`DB_HOST=localhost`, not `DB_HOST=postgres`.** The spec's draft mixed `--network host` with service-name resolution; those are mutually exclusive. With `--network host` the docker-run container shares the runner's loopback, so the service's port mapping (`5432:5432`) is reachable as `localhost:5432`, not by service name `postgres`. (Service-name resolution would only work with the runner's default bridge network.)
2. **`--health-retries 20`** instead of 10. The `timescaledb-ha:*-all` image runs more init work at startup than vanilla postgres and occasionally exceeds the 50s window on cold runner images. 20 retries × 5s = 100s margin.
3. **`--health-cmd "pg_isready -U directus -d directus"`** with explicit `-d`. Spec had user only.
4. **`curl -fsS -X POST`** for the Portainer webhook step. Bare `curl -X POST` returns 0 even on HTTP 4xx/5xx; `-f` makes a misconfigured webhook URL fail the step explicitly.
5. **Plain `docker build`**, NOT `docker/build-push-action@v5`. The dry-run step needs the freshly-built image accessible to a subsequent `docker run`. `build-push-action` with the docker-container Buildx driver exports into a separate buildkitd cache that `docker run` cannot see — the run would fail with "image not found." Plain `docker build` keeps the image in the local Docker daemon.
**Deliberate divergences from `processor/.gitea/workflows/build.yml`:**
| Aspect | Processor | Directus | Why |
|---|---|---|---|
| Build mechanism | `docker/build-push-action@v5` | plain `docker build` | dry-run needs local-daemon access (above) |
| Buildx setup | yes | no | Buildx isolates the image; would defeat the dry-run |
| `services:` block | absent | present | Directus dry-run needs a live Postgres; processor mocks it |
| Node/pnpm setup | yes | no | No TS to compile in Phase 1 (Phase 5 adds this) |
| typecheck/lint/test | three steps | none | No extensions yet |
| Portainer webhook | unconditional | gated on secret presence | Spec requirement |
| `runs-on` | `ubuntu-latest` | `ubuntu-22.04` | Pin to avoid floating-tag runner image breakage |
**Acceptance criteria status:**
Static (verified):
- ✅ Workflow file at `.gitea/workflows/build.yml`.
- ✅ Steps in correct order: checkout → build → dry-run → login → tag/push → optional Portainer.
- ✅ Path filter excludes `.planning/`, `README.md`, `compose.dev.yaml`, `package.json` — docs-only commits won't trigger CI.
- ✅ Workflow file itself is in the path-filter list (so changes to CI trigger CI).
- ✅ Two image tags published (`:main`, `:<sha>`).
- ✅ Required secrets identified: `REGISTRY_USERNAME`, `REGISTRY_PASSWORD`. Optional: `PORTAINER_WEBHOOK_URL`.
- ✅ Dry-run command logic traced: env vars, network mode, entrypoint override, script chain all consistent.
Pending live trigger (will validate on first push that hits the path filter):
- ⏳ Workflow triggers on push.
- ⏳ Dry-run step exits 0 against a fresh Postgres + the committed snapshot (currently 105 KB, 13 collections).
- ⏳ Snapshot drift simulation: hand-edit `snapshots/schema.yaml` to malformed YAML → push → CI fails at dry-run → image NOT pushed.
- ⏳ Migration syntax error simulation: introduce broken `db-init/006_*.sql` → push → CI fails at dry-run → image NOT pushed.
- ⏳ Image actually published to `git.dev.microservices.al/trm/directus:main` after a clean run.
- ⏳ Portainer webhook fires if configured.
**Operator action required before first run:** in the Gitea repo at `git.dev.microservices.al/trm/directus` → Settings → Secrets, configure:
- `REGISTRY_USERNAME` — Gitea user with write access to the container registry
- `REGISTRY_PASSWORD` — password or PAT for that user
- `PORTAINER_WEBHOOK_URL` (optional) — for auto-redeploy on push
Without `REGISTRY_USERNAME` / `REGISTRY_PASSWORD` the Login step fails with a clear auth error. Without `PORTAINER_WEBHOOK_URL` the Portainer step is skipped entirely.