Skip to content

IAM Role Assignment and Membership Incident Runbook

Trigger

  1. Spike in IAM mutation failures (bind/revoke membership, role assignment).
  2. Reports of unexpected 403 insufficient_permissions.
  3. Reports of role ceiling denial confusion (for example owner grants denied).

Required Context

  1. correlation_id from API/UI error envelope.
  2. Actor identity (actor_user_id) and target identity (target_user_id) where applicable.
  3. Scope context:
  4. tenant_id
  5. project_id
  6. attempted tenant_role / project_role / platform role.

Immediate Actions

  1. Classify scope:
  2. platform-role mutation path vs tenant/project membership path.
  3. Confirm whether denial is expected policy behavior or system degradation.
  4. If widespread unexpected denials:
  5. pause bulk role/membership operations until root cause is identified.

Diagnosis (Correlation-First)

  1. Query API logs by correlation_id.
  2. Confirm canonical error code/message:
  3. expected authz ceiling: insufficient_permissions
  4. malformed request: invalid_request
  5. backend dependency issue: service_unavailable / internal_error
  6. Verify role assignment rules:
  7. tenant admin must not grant tenant owner.
  8. tenant admin must not grant project owner.
  9. platform-role mutations require platform-admin authorization.
  10. Verify membership state in DB:
  11. active tenant membership exists in expected tenant.
  12. project belongs to expected tenant.
  13. no cross-tenant membership conflict on strict mode.
  14. Verify audit coverage:
  15. privileged mutation should produce audit_logs row with matching correlation_id.

Common Failure Classes

  1. Expected role ceiling denial:
  2. actor role lacks grant ceiling for requested target role.
  3. Scope mismatch:
  4. target project outside actor tenant boundary.
  5. Cross-tenant binding conflict:
  6. user already has active membership in another tenant (strict mode).
  7. Platform-role binding unavailable:
  8. role-binding store/dependency unavailable.

Mitigation

  1. Expected denial:
  2. communicate correct grant ceiling and retry with permitted role.
  3. Scope mismatch:
  4. correct tenant/project selection in UI/CLI and retry.
  5. Cross-tenant user move:
  6. use approved rehome flow where allowed and audited.
  7. Dependency degradation:
  8. restore backing store/service before retrying IAM mutations.

Recovery Criteria

  1. IAM mutation paths return expected deterministic outcomes.
  2. Audit logs are present for privileged successful mutations.
  3. No unresolved incidents with ambiguous scope/ceiling behavior.

Evidence to Capture

  1. correlation_id, trace_id, actor/target IDs, scope IDs.
  2. Error envelopes and log excerpts showing decision path.
  3. Audit log rows for successful privileged mutations.
  4. Runbook decision and final remediation action.