App Runtime Instance Lifecycle v1¶
Purpose¶
Define the contract-first lifecycle for project-scoped app instances so teams can deploy and operate apps without bypassing IAM, policy, audit, or observability controls.
Scope¶
Includes:
1. app instance lifecycle states and transitions
2. REST endpoint surface (v1 target)
3. domain event surface (apps.instance.*)
4. authorization and audit requirements
5. observability and incident triage requirements
6. effective runtime operating-mode metadata
Excludes: 1. provider-specific runtime implementation details (K8s/Slurm/Ray specifics) 2. MaaS internals 3. app-specific custom workflows
Companion operating-mode model:
- doc/architecture/App_Runtime_Operating_Modes_v1.md
Lifecycle State Model¶
Canonical states:
1. requested
2. deploying
3. running
4. upgrading
5. rolling_back
6. decommissioning
7. decommissioned
8. failed
Terminal states:
1. decommissioned
2. failed (recoverable only via explicit retry/redeploy action)
Transition Rules¶
requested -> deploying -> runningrunning -> upgrading -> runningrunning -> rolling_back -> runningrunning|failed -> decommissioning -> decommissioned- any in-flight state may transition to
failedon irrecoverable error
Operating-Mode Metadata¶
App lifecycle does not imply a single deployment topology.
Effective instance metadata should include:
1. operating_mode (tenant_dedicated | platform_managed)
2. control_plane_scope (project | tenant | platform)
3. runtime_backend (k8s | slurm | ray | bare_metal)
4. tenant_boundary_mode (tenant_isolated | shared_service)
Rules: 1. app instance ownership remains project-scoped 2. runtime control plane may be project-, tenant-, or platform-scoped 3. server computes effective mode/scope from app policy and backend rules
API Surface (v1 target)¶
Project-scoped operations:
1. GET /api/v1/projects/{project_id}/apps/instances
2. POST /api/v1/projects/{project_id}/apps/instances
3. GET /api/v1/projects/{project_id}/apps/instances/{instance_id}
4. POST /api/v1/projects/{project_id}/apps/instances/{instance_id}/upgrade
5. POST /api/v1/projects/{project_id}/apps/instances/{instance_id}/rollback
6. POST /api/v1/projects/{project_id}/apps/instances/{instance_id}/decommission
Admin/operator read-only surface:
1. GET /api/v1/admin/apps/instances
2. GET /api/v1/admin/apps/instances/{instance_id}
Contract requirements:
1. canonical error envelope for all failures (code, message, correlation_id, details)
2. idempotency-key semantics on mutation endpoints
3. explicit X-Project-ID context enforcement where applicable
Event Surface (v1 target)¶
All events use canonical envelope from doc/api/asyncapi.draft.yaml.
Event types:
1. apps.instance.requested
2. apps.instance.deploying
3. apps.instance.running
4. apps.instance.upgrade_requested
5. apps.instance.upgraded
6. apps.instance.rollback_requested
7. apps.instance.rolled_back
8. apps.instance.decommission_requested
9. apps.instance.decommissioned
10. apps.instance.failed
Outbox requirement: 1. lifecycle state change and outbox row must commit in the same DB transaction.
Authorization Baseline¶
- all instance mutations are project-scoped
- actor must satisfy project role permissions from role-policy model
- platform-admin paths are explicit and auditable; no hidden bypass
- action authorization evaluates against canonical decision interface (
actor,tenant,project,action,resource)
Audit Baseline¶
Every privileged mutation writes audit_logs with:
1. actor identity/role
2. target app_instance_id
3. action (deploy, upgrade, rollback, decommission)
4. result
5. correlation_id
Observability Baseline¶
Required logging fields:
1. correlation_id
2. trace_id
3. org_id
4. project_id
5. resource_name (when resolved)
Triage baseline:
1. start from UI/API correlation_id
2. pivot to logs by correlation_id
3. pivot to trace by trace_id
4. reconcile with apps.instance.* event timeline
v1 Deliverables¶
- OpenAPI contract updates for endpoints/schemas
- AsyncAPI contract updates for lifecycle events
- queue task split for A/B/C implementation and ops runbook coverage
- local smoke path covering create -> running -> decommission lifecycle