Skip to content

Platform IAM Model v1

Purpose

Define the platform IAM model in terms of:

  1. what the platform already implements,
  2. what should be treated as the canonical long-term model,
  3. what must still be built or modified to avoid hardening the wrong assumptions.

This document is intentionally not a Keycloak design doc. In GPUaaS, Keycloak is an authentication and federation component, not the authoritative product IAM model.

Core Design Rule

Model IAM in three dimensions:

  1. resource hierarchy
  2. subject model
  3. scoped role bindings over capability bundles

Do not start from a large catalog of named roles. Default roles are product bundles built on top of capability families.

Boundary: What Keycloak Does vs What Platform IAM Owns

Keycloak in current GPUaaS

Keycloak is currently responsible for:

  1. OIDC login and auth-code exchange
  2. refresh-token exchange
  3. logout / token revocation
  4. JWT issuance for browser/API sessions
  5. JWKS publication for token validation
  6. identity federation entry point for OIDC/SAML-style flows

Keycloak is not the authoritative source for:

  1. tenant/project membership
  2. tenant/project/platform scoped role bindings
  3. service-account ownership and platform authorization
  4. scoped audit visibility
  5. project/tenant governance semantics

The platform database is the product IAM authority.

Canonical Objects

1. Principals

Canonical product actors.

Principal types:

  1. human
  2. service_account
  3. group

Future:

  1. external group references
  2. workload identities if they become first-class beyond service accounts

2. External Identity Bindings

Authentication anchors attached to principals.

Examples:

  1. OIDC issuer + subject
  2. local password credential
  3. tenant federation provider binding

These are authn bindings, not authorization truth.

3. Memberships

Memberships place a principal into tenant/project scope.

Current conceptual model:

  1. tenant membership
  2. project membership

Memberships are the current authorization root for tenant/project access.

4. Role Bindings

Role bindings attach a subject to a role bundle at a scope.

Scope hierarchy:

  1. platform
  2. tenant
  3. project

5. Role Bundles / Capability Sets

Roles should be treated as named bundles over capabilities, not the base model.

Examples of capability families:

  1. iam.*
  2. billing.*
  3. ops.*
  4. project.*
  5. resource.*
  6. audit.*

6. Invitations

Invitation flows should be first-class IAM objects, not implicit user creation side effects.

Examples:

  1. tenant invite
  2. project invite
  3. tenant-admin invite

7. Integration References

External tenant-owned systems should appear as integrations, not as platform-owned user stores.

Examples:

  1. tenant Kubernetes cluster integration
  2. tenant database integration
  3. tenant external IdP configuration

The platform may store integration metadata and delegated credential references, but it should not mirror the external system's full user/role model.

Resource Hierarchy

Canonical hierarchy:

  1. platform
  2. tenant
  3. project

This hierarchy defines where authority is bound.

Important rule:

Hierarchy does not imply universal content visibility.

Example:

  1. a tenant_admin may be allowed to create/delete projects
  2. that does not automatically mean they can inspect all data/content inside every child project

Management rights and content visibility must remain separable.

Subject Model

Subjects are the principals or group-like identities that receive bindings.

Initial subject types:

  1. user
  2. service account
  3. group

The platform should not assume a single global human username namespace as the main identity boundary.

Safer model:

  1. immutable principal identity
  2. tenant membership as the real product access boundary
  3. project membership nested under tenant/project scope

Default Role Families

These are default product bundles, not the full permission grammar.

Platform scope

  1. platform_admin
  2. platform_ops
  3. platform_viewer
  4. platform_iam_admin
  5. platform_billing_admin

Tenant scope

  1. tenant_owner
  2. tenant_admin
  3. tenant_ops
  4. tenant_viewer
  5. tenant_iam_admin
  6. tenant_billing_admin

Project scope

  1. project_owner
  2. project_admin
  3. project_operator
  4. project_member
  5. project_viewer

Important rule:

These defaults should be built from capability bundles and kept small. The platform should not expose an AWS-style explosion of role labels as the primary mental model.

Capability Separation Rules

The model must support these separations:

  1. read vs mutate
  2. management rights vs content visibility
  3. IAM authority vs billing authority vs ops authority
  4. tenant-wide governance vs project-local authoring

Examples:

  1. platform_ops may investigate incidents and use admin read surfaces without having full IAM mutation rights.
  2. tenant_admin may manage tenant users and projects without automatically seeing all project content.
  3. tenant_billing_admin may view or manage billing without holding general tenant IAM authority.
  4. project_admin may manage project members and service accounts without tenant-level user governance.

External System Identity Boundary

For tenant-owned infra systems such as Kubernetes or databases:

  1. if the external system has its own SSO or IAM model, that remains the tenant's responsibility
  2. GPUaaS may store an integration reference or delegated credential/configuration if needed
  3. GPUaaS should not try to become the canonical IAM model for tenant-owned external systems

So:

  1. platform IAM owns platform principals, memberships, and platform-managed identities
  2. tenant-owned infra IAM stays external

What Exists Today

Already implemented or partially implemented

  1. users
  2. stores product users
  3. includes oidc_issuer and oidc_subject
  4. still has transitional role and org_id fields

  5. tenant_memberships

  6. tenant-scoped membership baseline exists

  7. project_memberships

  8. project-scoped membership baseline exists

  9. tenant_identity_providers

  10. tenant OIDC/SAML provider config exists in schema

  11. tenant_federation_domain_bindings

  12. tenant federation domain binding exists in schema

  13. auth_federation_states

  14. provider/org-bound auth flow state exists

  15. role_definitions

  16. role_definition_versions
  17. platform_role_bindings
  18. tenant_role_bindings
  19. project_role_bindings
  20. the platform already has the skeleton for a richer scoped role-binding model

  21. service_accounts

  22. project-scoped service-account model exists today
  23. tenant-owned shared runtimes still need a separate delegated machine-identity model

  24. scoped access-credential model

  25. useful as a related platform primitive, but separate from IAM role design

Existing documented direction

Relevant docs already move in this direction:

  1. Role_and_Policy_Lifecycle_Model.md
  2. User_Onboarding_Model.md
  3. ADR-008-tenant-project-ownership-baseline.md
  4. ADR-010-tenant-federation-sso-model.md

Current Gaps / Mismatches

1. Global username uniqueness is still too strong

Current schema:

  1. users.username text not null unique
  2. partial MVP constraint on tenant_memberships(user_id) also still enforces single-tenant active membership

This is too restrictive for the intended tenant-scoped identity model.

2. users.role is still transitional and too coarse

Current role field only supports:

  1. user
  2. admin

This is insufficient for:

  1. platform read-only admin visibility
  2. ops-only investigation
  3. tenant IAM admin
  4. tenant billing admin
  5. project admin/operator separation

3. Role-binding model exists but is not authoritative

The schema supports richer role bindings, but most runtime behavior still depends on:

  1. membership tables
  2. coarse users.role
  3. endpoint-specific assumptions

4. No first-class invitation model yet

IAM needs invitation and delegated onboarding as explicit objects/workflows.

5. No first-class group model yet

Groups are implied by future need but not implemented as platform IAM primitives.

6. No formal external identity binding object beyond current OIDC fields

Current oidc_issuer / oidc_subject fields work, but richer multi-provider identity binding will need a clearer model.

7. Audit visibility is not yet fully scope-aware

Scoped audit is required for:

  1. platform admin
  2. tenant admin
  3. project admin
  4. future cross-project sharing/grant visibility

What Must Be Built or Modified

Phase 1: clarify authority and remove bad assumptions

  1. document that platform DB, not Keycloak, is IAM authority
  2. make role-binding/capability model the target authority in docs
  3. keep Keycloak as auth/federation component only

Phase 2: fix identity scope assumptions

  1. remove or relax single-tenant active-membership constraint when multi-tenant user support is enabled
  2. revisit global username uniqueness
  3. separate principal identity from tenant membership more clearly in read/write paths

Phase 3: make scoped role bindings real

  1. move runtime authorization toward role bindings + capability evaluation
  2. reduce users.role to compatibility/read-model only
  3. introduce default role bundles for:
  4. platform
  5. tenant
  6. project

Phase 4: add missing IAM primitives

  1. invitations
  2. groups
  3. richer external identity bindings
  4. scoped audit presentation
  5. cross-project sharing/grants

Design Constraints To Preserve

  1. do not make Keycloak the canonical product user store
  2. do not require global human-readable username uniqueness as the long-term tenant model
  3. do not conflate admin page visibility with mutation authority
  4. do not assume parent-scope admin implies child-scope content visibility
  5. do not turn tenant-owned external infra IAM into platform-owned IAM

This document should be followed by:

  1. cross-project access/sharing model
  2. scoped audit model
  3. IAM API/resource contract slices
  4. UX IA alignment for platform/tenant/project admin modes
  5. delegated shared-runtime operator authz model for tenant-owned shared app runtimes