IAM Token Issuer v1¶
Purpose¶
Define a single control-plane model for short-lived token and credential issuance before adding more machine identities for WEKA, runtime controllers, node tasks, and operator automation.
The current repo already has two machine-token paths:
- project-scoped service-account tokens from
POST /api/v1/auth/service-account/token - delegated shared-runtime operator tokens from
POST /api/v1/auth/shared-runtime-operator/token
Those paths are useful first slices, but they duplicate issuer logic and do not yet provide one policy/audit surface for all future token types. IAM Token Issuer v1 is the consolidation contract.
Scope¶
In scope:
- GPUaaS-issued short-lived access tokens for machine actors.
- Provider credential brokering requests that return one-time short-lived credential material, such as storage S3/STS credentials.
- Common policy checks for TTL, audience, actor class, resource binding, and allowed scopes.
- Common audit/event metadata for every issuance and denial.
- A small package/interface that existing token endpoints can delegate to.
Out of scope for v1:
- Long-lived API keys.
- Browser session cookies and platform-proxy browser sessions.
- Terminal single-use tokens.
- Node bootstrap enrollment tokens.
- Replacing Keycloak human-user auth.
- Storing raw provider credentials in read models.
Those token types may later share validation or audit conventions, but they should not be forced through the first IAM token issuer package.
Current State¶
Service-account tokens¶
packages/services/auth/service_accounts.go validates a project-scoped service
account credential, reads auth.service_account_token_ttl_seconds, and signs an
HS256 JWT with these claims:
sub: service account IDactor_type:service_accountorg_idproject_idscopeaud,iss,iat,exp,jti
Shared-runtime operator tokens¶
packages/services/auth/shared_runtime_operator_tokens.go validates a
tenant-owned shared-runtime operator credential, reads the same TTL policy, and
signs an HS256 JWT with these claims:
sub: shared runtime IDactor_type:shared_runtime_operatororg_idshared_runtime_idscopeaud,iss,iat,exp,jti
Validation¶
packages/shared/middleware/auth.go has separate resolvers for service-account
and shared-runtime operator tokens. Both validate HS256 signatures using the
envelope key material and then enforce actor_type.
Core Model¶
IAM Token Issuer is an issuer, not a permissions bypass.
Every issuance starts with a requested actor, resource binding, audience, and scope. The issuer verifies that the requested token is allowed, normalizes the claim set, signs or brokers the credential, and writes an audit row. It does not decide endpoint authorization at request time; endpoint authorization still belongs to middleware and handler/service policy checks.
caller credential
-> IAM token issuer request
-> issuer policy evaluation
-> credential verification or delegated actor authorization
-> token/credential material
-> audit + optional issuance evidence row
Actor Classes¶
| Actor class | Binding | Primary use |
|---|---|---|
service_account |
org_id, project_id, service_account_id |
Project automation and project-owned app workers. |
shared_runtime_operator |
org_id, shared_runtime_id |
Tenant-owned shared runtime workers. |
workload |
org_id, project_id, workload_id or app_instance_id |
Runtime-local workload identity for storage and app integrations. |
node_agent |
node_id, mTLS subject |
Node task polling and host-local execution. |
platform_operator |
platform role plus operation target | Operator automation and recovery workflows. |
provider_session |
provider backend plus GPUaaS grant/session ID | Short-lived external provider credentials such as WEKA S3/STS. |
Only the first two actor classes are implemented today.
Issuance Contract¶
Common request fields:
actor_typeactor_idorg_idproject_idwhen project-scopedresource_typeresource_idaudiencescopettl_secondsoptional request hintreasonsource_workflow_idoptionalidempotency_keyfor API-backed mutating issuance
Common response fields:
credential_type:jwt_bearer,provider_session,vault_wrapped_secretexpires_in_secondsexpires_attoken_typewhen applicableaccess_tokenonly for bearer tokensdeliveryonly for wrapped or provider credentialsissued_claims_summarycredential_session_idwhen an evidence row is created
Responses that contain credential material are one-time responses and must not be stored in read models, browser local storage, logs, or traces.
Claim Baseline¶
All GPUaaS-issued bearer tokens should include:
subactor_typeissaudiatexpjtiscope
Scoped claims are actor-specific:
- project actors include
org_idandproject_id - tenant-shared runtime actors include
org_idandshared_runtime_id - workload actors include
org_id,project_id, and workload/app identifiers - node actors include
node_idand certificate-bound identity evidence when token issuance is added for them
The UI and clients must treat token claims as opaque except where the public API contract explicitly documents them.
Policy Authority¶
IAM Token Issuer must read policy through PolicyClient; no token TTL or scope
limit should be hardcoded in issuer code.
Initial policy keys:
auth.service_account_token_ttl_seconds: existing TTL authority for service-account and shared-runtime operator tokens.
Proposed follow-on keys:
auth.iam_token_issuer.default_ttl_secondsauth.iam_token_issuer.max_ttl_secondsauth.iam_token_issuer.allowed_audiencesauth.iam_token_issuer.allowed_scopesauth.iam_token_issuer.provider_session_ttl_seconds
Do not add these keys until the OpenAPI/service slice needs them. Existing
implemented paths should continue using auth.service_account_token_ttl_seconds
until a migration task updates both seed data and docs.
Audit And Evidence¶
Every successful token or provider credential issuance must write audit_logs.
Denials should also be auditable when a privileged caller or reusable credential
attempts an invalid issuance.
Reserved audit actions:
auth.token.issueauth.token.denyauth.token.revokeauth.provider_credential.issueauth.provider_credential.denyauth.provider_credential.revoke
Minimum audit metadata:
actor_typeactor_idrequested_actor_typerequested_actor_idaudiencescoperesource_typeresource_idexpires_atcredential_typesource_workflow_idwhen present
Audit metadata must never include raw tokens, client secrets, private keys, provider secret keys, wrapped tokens, or refresh tokens.
Evidence rows are optional for bearer tokens in v1 because audit may be enough. Provider credential sessions should have durable evidence rows because storage and WEKA operations need revocation posture, expiry, and diagnostics without storing secret material.
Storage provider credential evidence is owned by the storage domain in
storage_credential_sessions. IAM Token Issuer records the common issuance
metadata and audit action, while storage records bucket/grant scope and
provider-session posture. The evidence row is user-safe operational data and
must contain only:
credential_session_id- GPUaaS actor and requested provider-session actor summary
bucket_id, prefixes, permissions, and provider backend- credential type, client kind, issued/expiry timestamps
- provider session/policy references that are not themselves credentials
- revocation state and source workflow/correlation metadata
The evidence row must not contain raw provider access keys, secret keys, session tokens, wrapped-token bytes, provider admin credentials, raw provider policy JSON, or any material that can be replayed as a credential.
Storage And WEKA¶
IAM Token Issuer should not know WEKA policy internals. Storage remains the owning domain for grant checks and provider adapter behavior.
For direct S3/SDK credentials:
storage service
-> validates bucket/grant/project access
-> calls IAM token issuer provider-session issuer
-> provider adapter issues short-lived credential or wrapped delivery
-> storage_credential_sessions evidence row
-> audit action auth.provider_credential.issue + storage.credential.issue
-> one-time response with credential material + credential_session_id
For WEKAFS/POSIX mounts:
- IAM Token Issuer may issue workload/service-account tokens used by workers or controllers to fetch mount plans.
- The node-agent should still receive typed node tasks, not raw provider admin credentials.
- Mount plans and provider endpoints are user-safe operational data; mount secrets or provider admin credentials must remain in Vault or wrapped delivery.
Endpoint Strategy¶
Do not add a broad public /iam-token-issuer endpoint in v1.
Preferred path:
- Add an internal Go package/interface used by existing auth endpoints.
- Migrate service-account token issuance to the issuer.
- Migrate shared-runtime operator issuance to the issuer.
- Add narrowly scoped API endpoints only when a domain needs them, for example:
- existing
/api/v1/auth/service-account/token - existing
/api/v1/auth/shared-runtime-operator/token - existing/planned storage credential endpoint
/api/v1/v3/storage/{bucket_id}/credentials - Keep provider credential issuance behind the owning domain API, not a generic user-facing token endpoint.
Package Boundary¶
Proposed package:
Initial interface:
type Issuer interface {
IssueServiceAccountToken(ctx context.Context, input ServiceAccountTokenInput) (*BearerToken, error)
IssueSharedRuntimeOperatorToken(ctx context.Context, input SharedRuntimeOperatorTokenInput) (*BearerToken, error)
}
Follow-on provider credential interface should be added only after the storage adapter needs it:
type ProviderCredentialIssuer interface {
IssueProviderCredential(ctx context.Context, input ProviderCredentialInput) (*ProviderCredential, error)
}
The issuer may live inside packages/services/auth at first if that avoids a
premature package split. The key invariant is a single issuer path and single
audit/policy contract.
Security Rules¶
- No token or credential material in query parameters.
- No credential material in logs, traces, read models, or durable UI state.
- Every issuer path uses constant-time credential comparison where secrets are verified.
- Every issuer path has a max TTL enforced by policy.
- Every issuer path has an explicit audience.
- Scope normalization is centralized.
- Actor/resource binding is server-authoritative; clients cannot mint cross-project or cross-tenant tokens by choosing claims.
- Token signing keys must remain platform custody. Production hardening should move from local envelope-key HS256 to KMS/Vault/JWKS-backed signing.
Migration Plan¶
Phase 1: Extract Current Issuers¶
- Add an IAM token issuer service path used by both current token endpoints.
- Keep wire contracts unchanged.
- Preserve current TTL policy and claim shapes.
- Add unit tests proving both old endpoints emit equivalent claims.
- Add audit rows for successful and failed token issuance if not already present.
Phase 2: Normalize Validation And Claims¶
- Consolidate duplicated middleware resolver logic where safe.
- Keep actor-specific validation checks.
- Document endpoint allowlists for every machine actor.
- Add denylist or credential-state validation strategy for revocation-sensitive privileged tokens.
Phase 3: Storage Provider Sessions¶
- Integrate with
POST /api/v1/v3/storage/{bucket_id}/credentials. - Store provider credential session evidence without raw secret material.
- Add WEKA/S3 capability checks before exposing issuance in dev.
Phase 4: Workload-Bound Identity¶
- Add workload/app-instance actor class.
- Bind workload tokens to project, app instance, and storage attachment/mount context.
- Use for runtime controllers and external workers where project service accounts are too broad.
Open Questions¶
- Should GPUaaS continue HS256 envelope-key signing for all machine tokens, or move IAM token issuer signing to asymmetric JWKS-backed keys before more actors are added?
- Should successful bearer-token issuance have a durable evidence table, or is
audit_logssufficient until revocation/inspection needs grow? - Should provider credential issuance audit live only under storage actions, or
should auth also own a parallel
auth.provider_credential.*audit action? - What is the first workload-bound actor: app instance, allocation, allocation group, or storage attachment?
First Implementation Tasks¶
- Extract service-account and shared-runtime operator issuer logic behind one IAM token issuer interface without changing public API contracts.
- Add issuance audit for current token endpoints.
- Define storage provider credential session evidence before enabling WEKA S3 credential issuance.
- Decide signing-key hardening path before adding workload-bound tokens.