Skip to content

Platform-Control Release Promotion Policy

Purpose: - prevent release/platform-control drift from master - prevent piecemeal release-only hotfixes from silently dropping runtime files, config keys, generated artifacts, or tests - make platform-control deploys reproducible from one exact source SHA

Required policy

release/platform-control is a promotion branch, not a development branch.

It must always point to one exact source commit that is already integrated in master, unless an explicit temporary divergence override is being used during incident response.

Do not: - cherry-pick individual files onto release/platform-control - make release-only edits directly in the release worktree as a normal workflow - treat release deploy success as proof that release contents are synchronized with master

Required promotion flow

  1. Integrate the intended change into master.
  2. Promote one exact source SHA to release/platform-control.
  3. Deploy from the frozen release candidate generated from that same SHA.

Use:

make platform-control-local-preflight
scripts/ci/platform_control_promote_release_branch.sh origin/master

Required sequence: - confirm CI is green on the exact source SHA you intend to promote - run make platform-control-local-preflight - review any local preflight failures and decide whether they represent a real release blocker - then run scripts/ci/platform_control_promote_release_branch.sh origin/master

The local preflight is required as a signal for platform-control release work, but it is advisory by default. Green CI on the exact source SHA is the primary promotion gate. A local preflight failure should block promotion only when it demonstrates a real release-risking defect rather than local-environment drift or local-only e2e instability.

Release execution mode defaults: - full is the default trigger mode for platform-control release pipelines. - use deploy only as an explicit lighter-weight override when post-deploy report-only security/runtime evidence is intentionally being skipped.

Temporary release profile override for demo-week web iteration: - PLATFORM_CONTROL_RELEASE_PROFILE=web-fast is allowed only for low-risk web-only changes. - web-fast keeps the promotion-branch discipline, but release CI intentionally narrows to: - contract checks - frontend build/test - web-only runtime publish/deploy - lightweight remote web smoke/runtime validation - do not use web-fast for backend/API/schema/worker/infra changes. - preferred wrapper:

scripts/ci/platform_control_release_web_fast.sh origin/master

Platform-Control Dev-Test Bypass

Some platform features cannot be validated meaningfully in kind because the real behavior depends on MAAS, physical GPU nodes, node-agent lifecycle, platform Funnel/auth exposure, or platform-control-only secrets. For those cases, a bounded dev-test bypass is allowed to shorten the feedback loop.

Use this only when:

  • the change needs platform-control hardware/runtime behavior to validate;
  • the source commit is disposable or under active development;
  • the result is not being treated as a release;
  • the follow-up plan is to merge to master and run the normal promotion flow after the implementation is meaningful and reviewed.

Use:

scripts/ci/platform_control_dev_test_deploy.sh <source-ref>

What the wrapper does:

  • requires a clean working tree;
  • allows the source SHA to be outside origin/master;
  • skips local preflight by default;
  • force-promotes that SHA to release/platform-control;
  • triggers a deploy-mode GitLab pipeline with PLATFORM_CONTROL_ALLOW_RELEASE_BRANCH_DIVERGENCE=true and PLATFORM_CONTROL_DEV_TEST_DEPLOY=true.

What it does not do:

  • it does not replace CI on master;
  • it does not mark a change released;
  • it does not waive contract/schema/test ownership for the final change;
  • it does not allow release-only fixes to remain unmerged.

After a successful dev-test deploy, the required path is still:

git switch master
git merge --ff-only <validated-branch>
git push origin master
scripts/ci/platform_control_promote_release_branch.sh origin/master
PLATFORM_CONTROL_RELEASE_MODE=deploy PLATFORM_CONTROL_RELEASE_PROFILE=standard scripts/ci/gitlab_pipeline_trigger.sh

The local preflight still exists to catch issue classes that recently escaped into deploy: - frontend admin route-guard regressions - platform-role bind/list/revoke authz drift - audit-log visibility regressions on the admin read path

Current default local preflight shape: - full frontend Playwright e2e - role authz smoke - kind parity validation

Use scoped frontend preflight only as an explicit debugging override:

PLATFORM_CONTROL_LOCAL_PREFLIGHT_FRONTEND_MODE=scoped make platform-control-local-preflight

The promotion script: - resolves one source SHA - verifies it is reachable from origin/master by default - runs scripts/ci/platform_control_local_preflight.sh unless PLATFORM_CONTROL_SKIP_LOCAL_PREFLIGHT=true is set intentionally - treats local preflight failure as advisory by default and continues with a warning - can be switched back to a blocking local gate with PLATFORM_CONTROL_REQUIRE_LOCAL_PREFLIGHT=true - force-updates release/platform-control to that exact SHA - regenerates dist/platform-control-release-candidate.env

CI enforcement

Release branch CI now includes:

  • scripts/ci/platform_control_release_branch_guard.sh

This guard fails release/platform-control pipelines if the branch contains commits not reachable from origin/master.

Temporary escape hatch:

PLATFORM_CONTROL_ALLOW_RELEASE_BRANCH_DIVERGENCE=true

Use this only for bounded incident response, and merge the release-only fixes back to master immediately after stabilization.

Incident lesson captured

Recent terminal and bootstrap recovery work exposed the failure mode this policy is meant to prevent:

  • release branch lost terminal-gateway internal listener/runtime files
  • release branch lost config keys required for internal terminal WebSocket transport
  • release branch had stale generated SDK artifacts
  • release branch had stale web test helpers/tests
  • deploy/runtime failures were caused by release drift rather than by the owning feature code

The corrective rule is simple:

  • fix on the source branch
  • merge to master
  • promote one exact SHA

Operator checklist before release

Before triggering a platform-control deploy:

  1. Confirm the intended fix is on master.
  2. Confirm promotion will use scripts/ci/platform_control_promote_release_branch.sh.
  3. Confirm CI is green on the exact SHA being promoted.
  4. Run make platform-control-local-preflight and review the result.
  5. If local preflight fails, confirm whether the failure is a real release blocker or local-environment-only noise before continuing.
  6. Confirm no manual release-only edits are pending in the release worktree.
  7. Confirm generated artifacts required by CI are committed.
  8. Confirm bootstrap/config/runtime files for changed services are present on the promoted SHA, not only in local worktrees.

Release order of operations

Use this order for platform-control releases, especially after app/runtime/API/schema work.

  1. Stabilize on master.
  2. Commit all code, docs, schema, seed, generated SDK artifacts, and web test updates on master.
  3. Push master to the canonical remotes before touching release/platform-control.
  4. Do not patch release/platform-control directly.

  5. Run targeted local checks for the files changed.

  6. API contract changes: run codegen/SDK checks and the relevant API tests.
  7. Web/UI changes: run TypeScript plus the relevant Vitest/Playwright slice.
  8. Runtime image changes: build the changed image locally when practical and inspect required OCI labels.
  9. Schema/seed changes: verify the schema is idempotent for existing databases, not only correct for fresh databases.

  10. Run release preflight.

  11. Preferred: make platform-control-local-preflight.
  12. If preflight is skipped because the local environment is noisy, record that decision and run the targeted checks from step 2 first.

  13. Promote the exact source SHA.

  14. Use scripts/ci/platform_control_promote_release_branch.sh origin/master.
  15. Confirm the release candidate file reports the intended SHA.
  16. Confirm release/platform-control points at that same SHA.

  17. Trigger the release pipeline explicitly.

  18. Standard deploy mode: PLATFORM_CONTROL_RELEASE_MODE=deploy PLATFORM_CONTROL_RELEASE_PROFILE=standard scripts/ci/gitlab_pipeline_trigger.sh.
  19. Use full when post-deploy report-only security/runtime evidence is required.
  20. Use web-fast only for web-only changes.

  21. Watch the pipeline by stage, not only final status.

  22. Contracts: OpenAPI/AsyncAPI and release branch guard must pass.
  23. Build/test: backend, frontend, integration, SDK, and security gates must pass.
  24. Package: every runtime in scripts/ci/platform_control_runtime_matrix.sh must have a matching publish job or an intentional filter.
  25. Digest fan-in: platform_control_assemble_runtime_digests must find every runtime digest snippet.
  26. Deploy: schema, seed, manifests, image annotations, rollout, and controller secret reconciliation must pass.
  27. Remote validation: public endpoints, observability, runtime image conformance, app/runtime smoke, and disposable cleanup must pass.

  28. Fix failures at the owning layer.

  29. Frontend failures: update the UI contract/test expectation together.
  30. SDK failures: regenerate and commit generated artifacts from the API contract.
  31. Missing runtime artifact failures: update the runtime matrix, publish job graph, and digest fan-in together.
  32. Image label failures: fix the runtime Dockerfile/build path, not the publish check.
  33. Schema/seed failures: make db_schema_v1.sql safe for existing databases and keep scripts/seed.sql aligned.
  34. Remote cleanup failures: treat them as lifecycle/read-model defects unless proven to be external infrastructure failure.

  35. Repeat from master.

  36. Commit the fix on master.
  37. Push all remotes.
  38. Promote the new exact SHA.
  39. Trigger a new pipeline.
  40. Do not retry the same failed release SHA after code/schema fixes unless the failure was proven to be transient infrastructure.

  41. Close the release only after final validation.

  42. Confirm the final pipeline status is success.
  43. Confirm deploy and remote validation jobs passed.
  44. Record the final SHA and pipeline URL in the handoff or task notes.

Common platform-control release failure classes

These are release blockers observed in recent platform-control work:

  • Stale frontend tests after UI/API payload changes.
  • Stale SDK/generated artifacts after OpenAPI changes.
  • Runtime present in platform_control_runtime_matrix.sh but missing a split publish CI job.
  • Runtime Dockerfile missing required OCI labels: org.opencontainers.image.version, org.opencontainers.image.revision, and org.opencontainers.image.created.
  • Fresh-schema-only changes that do not backfill existing deployed databases with ALTER TABLE ... ADD COLUMN IF NOT EXISTS or refreshed constraints.
  • Seed data that introduces a new enum/check value before the deployed database constraint allows it.
  • Disposable validation cleanup surfacing read-model or lifecycle API defects after the main smoke path passed.