IAM Role Assignment and Membership Incident Runbook
Trigger
- Spike in IAM mutation failures (bind/revoke membership, role assignment).
- Reports of unexpected
403 insufficient_permissions.
- Reports of role ceiling denial confusion (for example owner grants denied).
Required Context
correlation_id from API/UI error envelope.
- Actor identity (
actor_user_id) and target identity (target_user_id) where applicable.
- Scope context:
tenant_id
project_id
- attempted
tenant_role / project_role / platform role.
- Classify scope:
- platform-role mutation path vs tenant/project membership path.
- Confirm whether denial is expected policy behavior or system degradation.
- If widespread unexpected denials:
- pause bulk role/membership operations until root cause is identified.
Diagnosis (Correlation-First)
- Query API logs by
correlation_id.
- Confirm canonical error code/message:
- expected authz ceiling:
insufficient_permissions
- malformed request:
invalid_request
- backend dependency issue:
service_unavailable / internal_error
- Verify role assignment rules:
- tenant admin must not grant tenant owner.
- tenant admin must not grant project owner.
- platform-role mutations require platform-admin authorization.
- Verify membership state in DB:
- active tenant membership exists in expected tenant.
- project belongs to expected tenant.
- no cross-tenant membership conflict on strict mode.
- Verify audit coverage:
- privileged mutation should produce
audit_logs row with matching correlation_id.
Common Failure Classes
- Expected role ceiling denial:
- actor role lacks grant ceiling for requested target role.
- Scope mismatch:
- target project outside actor tenant boundary.
- Cross-tenant binding conflict:
- user already has active membership in another tenant (strict mode).
- Platform-role binding unavailable:
- role-binding store/dependency unavailable.
Mitigation
- Expected denial:
- communicate correct grant ceiling and retry with permitted role.
- Scope mismatch:
- correct tenant/project selection in UI/CLI and retry.
- Cross-tenant user move:
- use approved rehome flow where allowed and audited.
- Dependency degradation:
- restore backing store/service before retrying IAM mutations.
Recovery Criteria
- IAM mutation paths return expected deterministic outcomes.
- Audit logs are present for privileged successful mutations.
- No unresolved incidents with ambiguous scope/ceiling behavior.
Evidence to Capture
correlation_id, trace_id, actor/target IDs, scope IDs.
- Error envelopes and log excerpts showing decision path.
- Audit log rows for successful privileged mutations.
- Runbook decision and final remediation action.