Skip to content

IAM Token Issuer v1

Purpose

Define a single control-plane model for short-lived token and credential issuance before adding more machine identities for WEKA, runtime controllers, node tasks, and operator automation.

The current repo already has two machine-token paths:

  • project-scoped service-account tokens from POST /api/v1/auth/service-account/token
  • delegated shared-runtime operator tokens from POST /api/v1/auth/shared-runtime-operator/token

Those paths are useful first slices, but they duplicate issuer logic and do not yet provide one policy/audit surface for all future token types. IAM Token Issuer v1 is the consolidation contract.

Scope

In scope:

  • GPUaaS-issued short-lived access tokens for machine actors.
  • Provider credential brokering requests that return one-time short-lived credential material, such as storage S3/STS credentials.
  • Common policy checks for TTL, audience, actor class, resource binding, and allowed scopes.
  • Common audit/event metadata for every issuance and denial.
  • A small package/interface that existing token endpoints can delegate to.

Out of scope for v1:

  • Long-lived API keys.
  • Browser session cookies and platform-proxy browser sessions.
  • Terminal single-use tokens.
  • Node bootstrap enrollment tokens.
  • Replacing Keycloak human-user auth.
  • Storing raw provider credentials in read models.

Those token types may later share validation or audit conventions, but they should not be forced through the first IAM token issuer package.

Current State

Service-account tokens

packages/services/auth/service_accounts.go validates a project-scoped service account credential, reads auth.service_account_token_ttl_seconds, and signs an HS256 JWT with these claims:

  • sub: service account ID
  • actor_type: service_account
  • org_id
  • project_id
  • scope
  • aud, iss, iat, exp, jti

Shared-runtime operator tokens

packages/services/auth/shared_runtime_operator_tokens.go validates a tenant-owned shared-runtime operator credential, reads the same TTL policy, and signs an HS256 JWT with these claims:

  • sub: shared runtime ID
  • actor_type: shared_runtime_operator
  • org_id
  • shared_runtime_id
  • scope
  • aud, iss, iat, exp, jti

Validation

packages/shared/middleware/auth.go has separate resolvers for service-account and shared-runtime operator tokens. Both validate HS256 signatures using the envelope key material and then enforce actor_type.

Core Model

IAM Token Issuer is an issuer, not a permissions bypass.

Every issuance starts with a requested actor, resource binding, audience, and scope. The issuer verifies that the requested token is allowed, normalizes the claim set, signs or brokers the credential, and writes an audit row. It does not decide endpoint authorization at request time; endpoint authorization still belongs to middleware and handler/service policy checks.

caller credential
  -> IAM token issuer request
  -> issuer policy evaluation
  -> credential verification or delegated actor authorization
  -> token/credential material
  -> audit + optional issuance evidence row

Actor Classes

Actor class Binding Primary use
service_account org_id, project_id, service_account_id Project automation and project-owned app workers.
shared_runtime_operator org_id, shared_runtime_id Tenant-owned shared runtime workers.
workload org_id, project_id, workload_id or app_instance_id Runtime-local workload identity for storage and app integrations.
node_agent node_id, mTLS subject Node task polling and host-local execution.
platform_operator platform role plus operation target Operator automation and recovery workflows.
provider_session provider backend plus GPUaaS grant/session ID Short-lived external provider credentials such as WEKA S3/STS.

Only the first two actor classes are implemented today.

Issuance Contract

Common request fields:

  • actor_type
  • actor_id
  • org_id
  • project_id when project-scoped
  • resource_type
  • resource_id
  • audience
  • scope
  • ttl_seconds optional request hint
  • reason
  • source_workflow_id optional
  • idempotency_key for API-backed mutating issuance

Common response fields:

  • credential_type: jwt_bearer, provider_session, vault_wrapped_secret
  • expires_in_seconds
  • expires_at
  • token_type when applicable
  • access_token only for bearer tokens
  • delivery only for wrapped or provider credentials
  • issued_claims_summary
  • credential_session_id when an evidence row is created

Responses that contain credential material are one-time responses and must not be stored in read models, browser local storage, logs, or traces.

Claim Baseline

All GPUaaS-issued bearer tokens should include:

  • sub
  • actor_type
  • iss
  • aud
  • iat
  • exp
  • jti
  • scope

Scoped claims are actor-specific:

  • project actors include org_id and project_id
  • tenant-shared runtime actors include org_id and shared_runtime_id
  • workload actors include org_id, project_id, and workload/app identifiers
  • node actors include node_id and certificate-bound identity evidence when token issuance is added for them

The UI and clients must treat token claims as opaque except where the public API contract explicitly documents them.

Policy Authority

IAM Token Issuer must read policy through PolicyClient; no token TTL or scope limit should be hardcoded in issuer code.

Initial policy keys:

  • auth.service_account_token_ttl_seconds: existing TTL authority for service-account and shared-runtime operator tokens.

Proposed follow-on keys:

  • auth.iam_token_issuer.default_ttl_seconds
  • auth.iam_token_issuer.max_ttl_seconds
  • auth.iam_token_issuer.allowed_audiences
  • auth.iam_token_issuer.allowed_scopes
  • auth.iam_token_issuer.provider_session_ttl_seconds

Do not add these keys until the OpenAPI/service slice needs them. Existing implemented paths should continue using auth.service_account_token_ttl_seconds until a migration task updates both seed data and docs.

Audit And Evidence

Every successful token or provider credential issuance must write audit_logs. Denials should also be auditable when a privileged caller or reusable credential attempts an invalid issuance.

Reserved audit actions:

  • auth.token.issue
  • auth.token.deny
  • auth.token.revoke
  • auth.provider_credential.issue
  • auth.provider_credential.deny
  • auth.provider_credential.revoke

Minimum audit metadata:

  • actor_type
  • actor_id
  • requested_actor_type
  • requested_actor_id
  • audience
  • scope
  • resource_type
  • resource_id
  • expires_at
  • credential_type
  • source_workflow_id when present

Audit metadata must never include raw tokens, client secrets, private keys, provider secret keys, wrapped tokens, or refresh tokens.

Evidence rows are optional for bearer tokens in v1 because audit may be enough. Provider credential sessions should have durable evidence rows because storage and WEKA operations need revocation posture, expiry, and diagnostics without storing secret material.

Storage provider credential evidence is owned by the storage domain in storage_credential_sessions. IAM Token Issuer records the common issuance metadata and audit action, while storage records bucket/grant scope and provider-session posture. The evidence row is user-safe operational data and must contain only:

  • credential_session_id
  • GPUaaS actor and requested provider-session actor summary
  • bucket_id, prefixes, permissions, and provider backend
  • credential type, client kind, issued/expiry timestamps
  • provider session/policy references that are not themselves credentials
  • revocation state and source workflow/correlation metadata

The evidence row must not contain raw provider access keys, secret keys, session tokens, wrapped-token bytes, provider admin credentials, raw provider policy JSON, or any material that can be replayed as a credential.

Storage And WEKA

IAM Token Issuer should not know WEKA policy internals. Storage remains the owning domain for grant checks and provider adapter behavior.

For direct S3/SDK credentials:

storage service
  -> validates bucket/grant/project access
  -> calls IAM token issuer provider-session issuer
  -> provider adapter issues short-lived credential or wrapped delivery
  -> storage_credential_sessions evidence row
  -> audit action auth.provider_credential.issue + storage.credential.issue
  -> one-time response with credential material + credential_session_id

For WEKAFS/POSIX mounts:

  • IAM Token Issuer may issue workload/service-account tokens used by workers or controllers to fetch mount plans.
  • The node-agent should still receive typed node tasks, not raw provider admin credentials.
  • Mount plans and provider endpoints are user-safe operational data; mount secrets or provider admin credentials must remain in Vault or wrapped delivery.

Endpoint Strategy

Do not add a broad public /iam-token-issuer endpoint in v1.

Preferred path:

  1. Add an internal Go package/interface used by existing auth endpoints.
  2. Migrate service-account token issuance to the issuer.
  3. Migrate shared-runtime operator issuance to the issuer.
  4. Add narrowly scoped API endpoints only when a domain needs them, for example:
  5. existing /api/v1/auth/service-account/token
  6. existing /api/v1/auth/shared-runtime-operator/token
  7. existing/planned storage credential endpoint /api/v1/v3/storage/{bucket_id}/credentials
  8. Keep provider credential issuance behind the owning domain API, not a generic user-facing token endpoint.

Package Boundary

Proposed package:

packages/services/auth/iamtokenissuer

Initial interface:

type Issuer interface {
    IssueServiceAccountToken(ctx context.Context, input ServiceAccountTokenInput) (*BearerToken, error)
    IssueSharedRuntimeOperatorToken(ctx context.Context, input SharedRuntimeOperatorTokenInput) (*BearerToken, error)
}

Follow-on provider credential interface should be added only after the storage adapter needs it:

type ProviderCredentialIssuer interface {
    IssueProviderCredential(ctx context.Context, input ProviderCredentialInput) (*ProviderCredential, error)
}

The issuer may live inside packages/services/auth at first if that avoids a premature package split. The key invariant is a single issuer path and single audit/policy contract.

Security Rules

  1. No token or credential material in query parameters.
  2. No credential material in logs, traces, read models, or durable UI state.
  3. Every issuer path uses constant-time credential comparison where secrets are verified.
  4. Every issuer path has a max TTL enforced by policy.
  5. Every issuer path has an explicit audience.
  6. Scope normalization is centralized.
  7. Actor/resource binding is server-authoritative; clients cannot mint cross-project or cross-tenant tokens by choosing claims.
  8. Token signing keys must remain platform custody. Production hardening should move from local envelope-key HS256 to KMS/Vault/JWKS-backed signing.

Migration Plan

Phase 1: Extract Current Issuers

  • Add an IAM token issuer service path used by both current token endpoints.
  • Keep wire contracts unchanged.
  • Preserve current TTL policy and claim shapes.
  • Add unit tests proving both old endpoints emit equivalent claims.
  • Add audit rows for successful and failed token issuance if not already present.

Phase 2: Normalize Validation And Claims

  • Consolidate duplicated middleware resolver logic where safe.
  • Keep actor-specific validation checks.
  • Document endpoint allowlists for every machine actor.
  • Add denylist or credential-state validation strategy for revocation-sensitive privileged tokens.

Phase 3: Storage Provider Sessions

  • Integrate with POST /api/v1/v3/storage/{bucket_id}/credentials.
  • Store provider credential session evidence without raw secret material.
  • Add WEKA/S3 capability checks before exposing issuance in dev.

Phase 4: Workload-Bound Identity

  • Add workload/app-instance actor class.
  • Bind workload tokens to project, app instance, and storage attachment/mount context.
  • Use for runtime controllers and external workers where project service accounts are too broad.

Open Questions

  1. Should GPUaaS continue HS256 envelope-key signing for all machine tokens, or move IAM token issuer signing to asymmetric JWKS-backed keys before more actors are added?
  2. Should successful bearer-token issuance have a durable evidence table, or is audit_logs sufficient until revocation/inspection needs grow?
  3. Should provider credential issuance audit live only under storage actions, or should auth also own a parallel auth.provider_credential.* audit action?
  4. What is the first workload-bound actor: app instance, allocation, allocation group, or storage attachment?

First Implementation Tasks

  1. Extract service-account and shared-runtime operator issuer logic behind one IAM token issuer interface without changing public API contracts.
  2. Add issuance audit for current token endpoints.
  3. Define storage provider credential session evidence before enabling WEKA S3 credential issuance.
  4. Decide signing-key hardening path before adding workload-bound tokens.