Platform-Control Release Promotion Policy¶
Purpose:
- prevent release/platform-control drift from master
- prevent piecemeal release-only hotfixes from silently dropping runtime files, config keys, generated artifacts, or tests
- make platform-control deploys reproducible from one exact source SHA
Required policy¶
release/platform-control is a promotion branch, not a development branch.
It must always point to one exact source commit that is already integrated in master, unless an explicit temporary divergence override is being used during incident response.
Do not:
- cherry-pick individual files onto release/platform-control
- make release-only edits directly in the release worktree as a normal workflow
- treat release deploy success as proof that release contents are synchronized with master
Required promotion flow¶
- Integrate the intended change into
master. - Promote one exact source SHA to
release/platform-control. - Deploy from the frozen release candidate generated from that same SHA.
Use:
make platform-control-local-preflight
scripts/ci/platform_control_promote_release_branch.sh origin/master
Required sequence:
- confirm CI is green on the exact source SHA you intend to promote
- run make platform-control-local-preflight
- review any local preflight failures and decide whether they represent a real release blocker
- then run scripts/ci/platform_control_promote_release_branch.sh origin/master
The local preflight is required as a signal for platform-control release work, but it is advisory by default. Green CI on the exact source SHA is the primary promotion gate. A local preflight failure should block promotion only when it demonstrates a real release-risking defect rather than local-environment drift or local-only e2e instability.
Release execution mode defaults:
- full is the default trigger mode for platform-control release pipelines.
- use deploy only as an explicit lighter-weight override when post-deploy report-only security/runtime evidence is intentionally being skipped.
Temporary release profile override for demo-week web iteration:
- PLATFORM_CONTROL_RELEASE_PROFILE=web-fast is allowed only for low-risk web-only changes.
- web-fast keeps the promotion-branch discipline, but release CI intentionally narrows to:
- contract checks
- frontend build/test
- web-only runtime publish/deploy
- lightweight remote web smoke/runtime validation
- do not use web-fast for backend/API/schema/worker/infra changes.
- preferred wrapper:
Platform-Control Dev-Test Bypass¶
Some platform features cannot be validated meaningfully in kind because the real behavior depends on MAAS, physical GPU nodes, node-agent lifecycle, platform Funnel/auth exposure, or platform-control-only secrets. For those cases, a bounded dev-test bypass is allowed to shorten the feedback loop.
Use this only when:
- the change needs platform-control hardware/runtime behavior to validate;
- the source commit is disposable or under active development;
- the result is not being treated as a release;
- the follow-up plan is to merge to
masterand run the normal promotion flow after the implementation is meaningful and reviewed.
Use:
What the wrapper does:
- requires a clean working tree;
- allows the source SHA to be outside
origin/master; - skips local preflight by default;
- force-promotes that SHA to
release/platform-control; - triggers a deploy-mode GitLab pipeline with
PLATFORM_CONTROL_ALLOW_RELEASE_BRANCH_DIVERGENCE=trueandPLATFORM_CONTROL_DEV_TEST_DEPLOY=true.
What it does not do:
- it does not replace CI on
master; - it does not mark a change released;
- it does not waive contract/schema/test ownership for the final change;
- it does not allow release-only fixes to remain unmerged.
After a successful dev-test deploy, the required path is still:
git switch master
git merge --ff-only <validated-branch>
git push origin master
scripts/ci/platform_control_promote_release_branch.sh origin/master
PLATFORM_CONTROL_RELEASE_MODE=deploy PLATFORM_CONTROL_RELEASE_PROFILE=standard scripts/ci/gitlab_pipeline_trigger.sh
The local preflight still exists to catch issue classes that recently escaped into deploy: - frontend admin route-guard regressions - platform-role bind/list/revoke authz drift - audit-log visibility regressions on the admin read path
Current default local preflight shape: - full frontend Playwright e2e - role authz smoke - kind parity validation
Use scoped frontend preflight only as an explicit debugging override:
The promotion script:
- resolves one source SHA
- verifies it is reachable from origin/master by default
- runs scripts/ci/platform_control_local_preflight.sh unless PLATFORM_CONTROL_SKIP_LOCAL_PREFLIGHT=true is set intentionally
- treats local preflight failure as advisory by default and continues with a warning
- can be switched back to a blocking local gate with PLATFORM_CONTROL_REQUIRE_LOCAL_PREFLIGHT=true
- force-updates release/platform-control to that exact SHA
- regenerates dist/platform-control-release-candidate.env
CI enforcement¶
Release branch CI now includes:
scripts/ci/platform_control_release_branch_guard.sh
This guard fails release/platform-control pipelines if the branch contains commits not reachable from origin/master.
Temporary escape hatch:
Use this only for bounded incident response, and merge the release-only fixes back to master immediately after stabilization.
Incident lesson captured¶
Recent terminal and bootstrap recovery work exposed the failure mode this policy is meant to prevent:
- release branch lost terminal-gateway internal listener/runtime files
- release branch lost config keys required for internal terminal WebSocket transport
- release branch had stale generated SDK artifacts
- release branch had stale web test helpers/tests
- deploy/runtime failures were caused by release drift rather than by the owning feature code
The corrective rule is simple:
- fix on the source branch
- merge to
master - promote one exact SHA
Operator checklist before release¶
Before triggering a platform-control deploy:
- Confirm the intended fix is on
master. - Confirm promotion will use
scripts/ci/platform_control_promote_release_branch.sh. - Confirm CI is green on the exact SHA being promoted.
- Run
make platform-control-local-preflightand review the result. - If local preflight fails, confirm whether the failure is a real release blocker or local-environment-only noise before continuing.
- Confirm no manual release-only edits are pending in the release worktree.
- Confirm generated artifacts required by CI are committed.
- Confirm bootstrap/config/runtime files for changed services are present on the promoted SHA, not only in local worktrees.
Release order of operations¶
Use this order for platform-control releases, especially after app/runtime/API/schema work.
- Stabilize on
master. - Commit all code, docs, schema, seed, generated SDK artifacts, and web test updates on
master. - Push
masterto the canonical remotes before touchingrelease/platform-control. -
Do not patch
release/platform-controldirectly. -
Run targeted local checks for the files changed.
- API contract changes: run codegen/SDK checks and the relevant API tests.
- Web/UI changes: run TypeScript plus the relevant Vitest/Playwright slice.
- Runtime image changes: build the changed image locally when practical and inspect required OCI labels.
-
Schema/seed changes: verify the schema is idempotent for existing databases, not only correct for fresh databases.
-
Run release preflight.
- Preferred:
make platform-control-local-preflight. -
If preflight is skipped because the local environment is noisy, record that decision and run the targeted checks from step 2 first.
-
Promote the exact source SHA.
- Use
scripts/ci/platform_control_promote_release_branch.sh origin/master. - Confirm the release candidate file reports the intended SHA.
-
Confirm
release/platform-controlpoints at that same SHA. -
Trigger the release pipeline explicitly.
- Standard deploy mode:
PLATFORM_CONTROL_RELEASE_MODE=deploy PLATFORM_CONTROL_RELEASE_PROFILE=standard scripts/ci/gitlab_pipeline_trigger.sh. - Use
fullwhen post-deploy report-only security/runtime evidence is required. -
Use
web-fastonly for web-only changes. -
Watch the pipeline by stage, not only final status.
- Contracts: OpenAPI/AsyncAPI and release branch guard must pass.
- Build/test: backend, frontend, integration, SDK, and security gates must pass.
- Package: every runtime in
scripts/ci/platform_control_runtime_matrix.shmust have a matching publish job or an intentional filter. - Digest fan-in:
platform_control_assemble_runtime_digestsmust find every runtime digest snippet. - Deploy: schema, seed, manifests, image annotations, rollout, and controller secret reconciliation must pass.
-
Remote validation: public endpoints, observability, runtime image conformance, app/runtime smoke, and disposable cleanup must pass.
-
Fix failures at the owning layer.
- Frontend failures: update the UI contract/test expectation together.
- SDK failures: regenerate and commit generated artifacts from the API contract.
- Missing runtime artifact failures: update the runtime matrix, publish job graph, and digest fan-in together.
- Image label failures: fix the runtime Dockerfile/build path, not the publish check.
- Schema/seed failures: make
db_schema_v1.sqlsafe for existing databases and keepscripts/seed.sqlaligned. -
Remote cleanup failures: treat them as lifecycle/read-model defects unless proven to be external infrastructure failure.
-
Repeat from
master. - Commit the fix on
master. - Push all remotes.
- Promote the new exact SHA.
- Trigger a new pipeline.
-
Do not retry the same failed release SHA after code/schema fixes unless the failure was proven to be transient infrastructure.
-
Close the release only after final validation.
- Confirm the final pipeline status is
success. - Confirm deploy and remote validation jobs passed.
- Record the final SHA and pipeline URL in the handoff or task notes.
Common platform-control release failure classes¶
These are release blockers observed in recent platform-control work:
- Stale frontend tests after UI/API payload changes.
- Stale SDK/generated artifacts after OpenAPI changes.
- Runtime present in
platform_control_runtime_matrix.shbut missing a split publish CI job. - Runtime Dockerfile missing required OCI labels:
org.opencontainers.image.version,org.opencontainers.image.revision, andorg.opencontainers.image.created. - Fresh-schema-only changes that do not backfill existing deployed databases with
ALTER TABLE ... ADD COLUMN IF NOT EXISTSor refreshed constraints. - Seed data that introduces a new enum/check value before the deployed database constraint allows it.
- Disposable validation cleanup surfacing read-model or lifecycle API defects after the main smoke path passed.