Runbook: App Artifact Lifecycle Incident¶
Trigger¶
- A project admin, operator service account, CLI flow, or SDK flow cannot issue an app-artifact publish intent.
- Artifact registration, promotion, deprecation, or retirement returns an error envelope with
correlation_id. - An artifact shows
trust_state=failed_verificationorrevokedand the app team needs operator triage. - Operators need a deterministic path for artifact lifecycle incidents after the project app-artifact API baseline landed.
Scope¶
- Endpoints:
GET /api/v1/projects/{project_id}/app-artifactsPOST /api/v1/projects/{project_id}/app-artifacts/publish-intentsPOST /api/v1/projects/{project_id}/app-artifactsPOST /api/v1/projects/{project_id}/app-artifacts/{artifact_id}/promotePOST /api/v1/projects/{project_id}/app-artifacts/{artifact_id}/deprecatePOST /api/v1/projects/{project_id}/app-artifacts/{artifact_id}/retire- Lifecycle states:
publishedpromoteddeprecatedretired- Trust states:
unverifiedverifiedfailed_verificationrevoked- This runbook covers control-plane lifecycle and trust metadata only.
- It does not replace runtime deployment triage once an artifact has already been selected by an app instance.
Required Context¶
- Error envelope fields:
codemessagecorrelation_iddetails- Identity and scope:
org_idproject_idartifact_idif one existsapp_slugapp_versionactor_user_idoroperator_service_account_id- Artifact metadata:
repositorydigestdigest_algorithmartifact_kindsource_typelifecycle_statetrust_state- Request context:
X-Project-IDX-Idempotency-Keyfor mutation paths- intended promotion
channelwhen promotion failed
Immediate Triage¶
- Confirm the failing path and method:
- publish intent
- registration
- list/read
- promote
- deprecate
- retire
- Confirm
project_idin the route matchesX-Project-ID. - Confirm the actor is allowed to mutate project-owned artifacts.
- Capture whether the issue is:
- new artifact cannot enter the lifecycle
- lifecycle transition rejected
- trust verification failed
- artifact inventory read path degraded
Correlation-First Query Flow¶
- API logs by
correlation_id: {service="gpuaas-api"} | json | correlation_id="<CORRELATION_ID>"- Narrow to artifact endpoints:
path=~".*/app-artifacts.*"- Audit evidence for privileged mutations:
- search
audit_logsfor actions:app_artifact.registerapp_artifact.promoteapp_artifact.deprecateapp_artifact.retireapp_artifact.verifyapp_artifact.revoke
- If
trace_idexists, confirm the control-plane path in Tempo before assuming registry or policy defects. - If the failure was raised from CLI or SDK, pivot to the corresponding client runbook only after the API evidence is captured.
Expected Error Classes¶
invalid_request- malformed repository, digest, media type, source metadata, or missing project context
insufficient_permissions- actor or service account cannot mutate this project or requested promotion target
app_artifact_not_found- wrong
artifact_id, wrong project scope, or stale client state app_artifact_already_exists- digest already registered for the same project
app_artifact_state_invalid- attempted promotion/deprecation/retirement is not valid for the current lifecycle or trust state
service_unavailableorupstream_error- dependency failure in storage/registry/policy verification path
internal_error- control-plane defect
Failure Class Triage¶
Publish Intent Failure¶
- Confirm the request used an idempotency key and same project context on retry.
- Confirm the returned repository path matches the platform-owned naming model.
- If the path fails before upload begins, treat this as a control-plane issue, not a registry blob-transfer issue.
Registration Failure¶
- Confirm the digest is immutable and formatted canonically.
- Confirm the same digest is not already registered in the target project.
- Confirm
artifact_kindandsource_typeare explicit and allowed by policy. - If registration succeeds but trust remains
unverified, that is not automatically an outage unless policy requiresverifiedfor the next action.
Promotion Failure¶
- Confirm the artifact is not
deprecatedorretired. - Confirm the target
channelis valid and the actor is allowed to promote into it. - Confirm project policy does not require a stronger trust state than the artifact currently has.
Trust Failure¶
- Treat
trust_state=failed_verificationas the primary pivot, not as a generic lifecycle failure. - Confirm whether the failure came from digest mismatch, source allowlist rejection, or signature/provenance policy.
- Do not promote, deprecate around, or otherwise bypass a trust failure as a normal operator step.
Retirement or Deprecation Failure¶
- Confirm the artifact belongs to the project in the route.
- Confirm the current state allows the requested transition.
- If the artifact must be blocked immediately for safety, escalate toward revoke/trust-policy ownership instead of forcing lifecycle drift.
Boundary Validation¶
- Project ownership:
- artifact belongs to the same
project_id - Contract alignment:
- request shape matches
doc/api/openapi.draft.yaml - Policy alignment:
- source type and promotion target comply with policy
- Audit path:
- privileged mutation wrote an
audit_logsrow with the samecorrelation_id - Root-cause ownership:
- distinguish control-plane lifecycle defect from downstream registry/storage defect before mitigation
Recovery Guidance¶
invalid_request- correct the request shape or project context and retry with a new idempotency key only if the prior request was malformed
insufficient_permissions- fix project/admin or service-account scope, then retry the exact intended action
app_artifact_already_exists- reuse the existing artifact record; do not register duplicate digests to work around the error
app_artifact_state_invalid- move the artifact through a valid lifecycle path or stop if the request violates trust/lifecycle invariants
failed_verificationorrevoked- quarantine the artifact from further promotion and escalate to artifact trust/policy ownership
- Dependency outage
- restore the owning storage/registry/policy dependency before retrying artifact lifecycle mutations
Escalation Map¶
- Project-context or membership issue:
doc/operations/runbooks/Tenant_Project_Authorization_Runbook.md- Client-only reproduction issue with API healthy:
doc/operations/runbooks/CLI_Incident_and_Support_Triage_Runbook.mddoc/operations/runbooks/Python_SDK_Incident_and_Observability_Runbook.md- Broad API degradation:
doc/operations/runbooks/API_Degradation_Runbook.md- App instance deploy/runtime impact after artifact selection:
doc/operations/runbooks/App_Runtime_Lifecycle_Incident_Runbook.md
Evidence to Capture¶
- Exact failing endpoint and method
correlation_idandtrace_idproject_id,artifact_id,app_slug,app_versionrepository,digest,artifact_kind,source_type- Previous and current
lifecycle_stateandtrust_state - Audit evidence for any privileged mutation
- Whether the owning layer is:
- request/client misuse
- authz/policy
- control-plane lifecycle implementation
- storage/registry dependency
- trust verification path
Escalation Rule¶
If the only apparent fix is to bypass digest-only registration, suppress trust failure handling, or mutate lifecycle state outside the contract, stop and file a control-plane defect. Do not normalize that workaround in operations.