Runbook: CLI Incident and Support Triage
Trigger
gpuaas CLI command fails for auth, project context, allocation, terminal, or billing path.
- User reports non-zero exit and provides
correlation_id.
- CLI command invoked (with sensitive values redacted).
- CLI stderr output with:
code
message
correlation_id
- Active project context used by CLI (
--project-id or active default).
- Timestamp and user identity.
Correlation-First Triage
- API log lookup:
{service="gpuaas-api"} | json | correlation_id="<CORRELATION_ID>"
- If terminal command path involved:
{service=~"gpuaas-(terminal-gateway|api)"} | json | correlation_id="<CORRELATION_ID>"
- If allocation provisioning path involved:
{service=~"gpuaas-(provisioning-worker|api)"} | json | correlation_id="<CORRELATION_ID>"
- If billing path involved:
{service=~"gpuaas-(billing-worker|webhook-worker|api)"} | json | correlation_id="<CORRELATION_ID>"
- Pivot to Tempo using
trace_id from matching log line when present.
Common Failure Classes and Routing
token_missing|token_invalid|token_expired
- route: auth/session owner.
invalid_request with project-context message
- route: tenant/project authz runbook.
insufficient_permissions|admin_required
- route: IAM membership/role assignment runbook.
allocation_*|sku_unavailable|node_*
- route: provisioning/inventory owner.
service_unavailable|upstream_error|internal_error
- route: API degradation runbook.
Operator Response Baseline
- Do not request DB access from user.
- Use
correlation_id as primary key for all investigation notes.
- Capture canonical
resource_name when present and include it in incident handoff.
- Provide user-facing remediation step tied to the specific error code class.
Escalation Runbooks
doc/operations/runbooks/API_Degradation_Runbook.md
doc/operations/runbooks/Tenant_Project_Authorization_Runbook.md
doc/operations/runbooks/IAM_Role_Assignment_and_Membership_Incident_Runbook.md
doc/operations/runbooks/Provisioning_Workflow_Stuck_Runbook.md
doc/operations/runbooks/Terminal_Gateway_Incident_Runbook.md
doc/operations/runbooks/Billing_Worker_Failure_Runbook.md