Storage Sharing and IAM Model v1¶
Date: 2026-04-27
Purpose¶
Define the storage ownership, sharing, and IAM model before integrating WEKA as the first production storage provider. The model must support project-scoped storage by default while allowing controlled sharing of datasets, checkpoints, model artifacts, and workspaces across projects.
This document is provider-neutral. WEKA is the first backend, but the product model should also work for VAST or S3-compatible providers later.
Concrete user, workload, CLI, S3-client, sharing, and revocation scenarios are
documented in doc/architecture/Storage_IAM_User_Flows_v1.md.
Lessons from the earlier Scality-oriented IAM project are captured in
doc/architecture/Storage_IAM_External_Reference_Lessons_v1.md. The key
carry-forward is to keep GPUaaS IAM as source of truth, compile grants into
provider policy, and avoid creating long-lived provider users for every human.
WEKA-specific feasibility, limits, and validation items are tracked in
doc/architecture/Storage_WEKA_Capability_Assessment_v1.md.
Important WEKA constraint: the integration is dual-protocol. WEKAFS/POSIX is the primary high-performance workload data path for training, notebooks, Kubernetes PV/PVC, and apps that need filesystem semantics. S3/STS should be enabled and validated for bucket/object workflows, SDK access, external clients, and apps that expect S3 semantics.
Core Decision¶
Storage ownership is project-first:
tenant
project
bucket / namespace
shared/
users/<user-id>/
workloads/<workload-id>/
datasets/<dataset-id>/
checkpoints/<workload-id>/
artifacts/
Users are actors inside one or more projects. A user's personal files are scoped
to a project, for example Research/users/subash/ and
Sandbox/users/subash/, not a global users/subash/projects/* hierarchy.
Deferred Decision: User Home Storage¶
Project-owned storage and user home storage are related, but they are not the same product primitive.
The first WEKAFS implementation should focus on project storage objects that are explicitly attached to workloads and apps. Before adding persistent home directories, revisit the user-home model as a separate design decision.
Open questions:
- whether every project member receives a separate project-scoped home
directory, for example
projects/{project_id}/users/{user_id}/home - whether a shared project home is ever appropriate, and if so whether it is a
separate shared storage object rather than
/home - whether the default workload mount should include only the launching user's home, all project member homes, or no home mount unless requested
- what happens when a user is added to or removed from a project after a workload has already been allocated
- whether a running multi-user workload can safely reconcile OS users, home directories, filesystem ACLs, and mount visibility without restart
- how shared app runtimes such as notebooks, training dashboards, or inference workbenches distinguish the launching user from later collaborators
- whether user-home storage is quota-attributed to the user, project, tenant, or a combination of project and user
Default posture until this is designed:
- do not mount one writable project-wide home directory for every user
- do not automatically grant newly added project members access to already running workloads
- treat shared datasets, artifacts, checkpoints, and app data as project storage objects with explicit grants
- treat per-user persistent home directories as a future storage/IAM feature requiring explicit UX, backend state, node-agent behavior, and audit events
This prevents the first WEKAFS integration from accidentally baking in a single-home or all-users-visible model that would be hard to unwind later.
Why Project-First¶
- GPUaaS IAM, service accounts, app launches, workloads, quotas, and accounting are project-scoped.
- A user can have different roles in different projects.
- Datasets and model artifacts are usually project assets, not personal assets.
- Releasing a workload or deleting a project has a clear cleanup boundary.
- Provider policies can be compiled from
project + principal + prefix.
Ownership vs Access¶
Every bucket or storage namespace has one owning project.
The owning project controls:
- quota attribution
- billing attribution
- lifecycle policy
- deletion authority
- default access policy
- provider placement and capability selection
Access can be granted outside the owning project through explicit grants.
Sharing Model¶
A bucket or prefix can grant access to:
- another project
- a user within a project
- a GPUaaS service account
- a workload or app instance
- a tenant-managed shared dataset group, once that exists
Example:
owner project: Training
bucket: training:imagenet
grants:
Training project: read/write
Inference project: read
Sandbox project: read until 2026-05-31
sa_training_pipeline: read/write checkpoints/
sa_vllm_inference: read artifacts/model/
Default sharing posture:
- no cross-project access unless explicitly granted
- read-only is the default cross-project grant
- cross-project write requires project admin or tenant admin authority
- grants may target a full bucket or a prefix
- grants may have expiration
- grants must be audited
Principal Mapping¶
Human users do not become long-lived WEKA users by default.
Human user
-> GPUaaS auth/session
-> GPUaaS IAM check
-> optional short-lived provider credentials only when direct S3/client access is enabled
Workload/app runtime
-> GPUaaS service account
-> GPUaaS-generated WEKAFS/POSIX mount plan for filesystem workloads
-> optional provider-derived S3 service account or STS credential for object access
Principal classes:
| GPUaaS principal | Provider credential posture |
|---|---|
| Human user | Short-lived scoped STS/session credentials only when requested for direct S3 client use. |
| Workload/app instance | Project-scoped service account with a GPUaaS-controlled mount plan for WEKAFS/POSIX; provider-derived credentials only when object/S3 access is enabled. |
| Project automation | Explicit project service account with scoped provider credentials. |
| Platform operations | Platform storage-admin credential in platform custody, never exposed to users/workloads. |
Direct S3 Client Flow¶
Users may need to use aws s3, Python SDKs, data loaders, or other S3 clients.
This requires temporary credentials, not long-lived provider users.
This flow is separate from WEKAFS/POSIX mounts. It is required when the selected storage object is exposed through S3 semantics or when a user/app needs direct object-client access.
Flow:
- User authenticates to GPUaaS.
- User requests storage credentials for a project, bucket, and optional prefix.
- GPUaaS checks project role and storage grants.
- GPUaaS asks the provider adapter to issue temporary credentials or a short-lived scoped equivalent.
- GPUaaS returns endpoint, access key, secret key, session token when applicable, expiration, and the allowed scope summary.
- GPUaaS writes an audit log entry.
Rules:
- default TTL should be short, for example 1 hour
- no long-lived human provider keys
- no provider credentials stored in browser local storage
- revocation of project membership or grant stops future issuance
- existing temporary credentials expire naturally unless provider-side session invalidation is available
Workload/App Credential Flow¶
Workloads should not use human credentials.
Flow:
- User launches a workload or app.
- GPUaaS authorizes the user and resolves requested storage mounts.
- GPUaaS creates or selects a project-scoped service account for the runtime.
- GPUaaS compiles bucket/prefix policy from workload intent and grants.
- Provider adapter creates provider-derived credentials.
- Credential material is stored in Vault or wrapped delivery, not plaintext DB.
- Node-agent/app controller receives sanitized mount instructions or wrapped credential delivery.
- On release/decommission, GPUaaS revokes or disables provider-derived access.
WEKA First Backend Mapping¶
WEKA integration should follow this mapping:
| GPUaaS concept | WEKA concept |
|---|---|
| Owning project bucket/namespace | WEKA filesystem directory namespace or project prefix for WEKAFS/POSIX; WEKA S3 bucket when S3 is enabled. |
| GPUaaS service account | Workload/app identity that receives a GPUaaS-generated mount plan; optional WEKA S3 service account for object access. |
| GPUaaS storage grant | Mount authorization and read-only/read-write mode for WEKAFS/POSIX; optional WEKA IAM/bucket policy for S3. |
| Direct user S3 access | Optional short-lived WEKA STS/session credentials if S3 is enabled for the selected mode. |
| Provider admin operations | Platform-owned WEKA admin/API credential in Vault custody. |
Provider-specific details must stay behind packages/services/storage.
Read models expose only user-safe capability hints and access summaries.
Policy Compiler Inputs¶
The storage policy compiler needs:
- tenant ID
- owning project ID
- requesting project ID
- principal type: user, service account, workload, platform
- principal ID
- bucket ID or provider namespace
- prefix list
- operations: list, read, write, delete, mount, admin
- expiration
- reason / source workflow
Compiler output should be provider-specific policy material plus a user-safe summary for audit and read models.
Contract Baseline¶
The first API contract slice is provider-neutral and uses GPUaaS grants as the source of truth:
GET /api/v1/v3/storage/{bucket_id}/grantslists user-safe grant posture.POST /api/v1/v3/storage/{bucket_id}/grantscreates a bucket or prefix grant.DELETE /api/v1/v3/storage/{bucket_id}/grants/{grant_id}revokes a grant.POST /api/v1/v3/storage/{bucket_id}/credentialsissues short-lived direct S3 credentials for CLI/SDK/workload mount use after IAM checks.
Storage grants compile into provider policy:
storage_grants
id
owner_project_id
bucket_id
prefixes[]
subject_kind: project | user | service_account | workload | tenant_group
subject_id
subject_project_id
permissions[]: list | read | write | delete | mount | admin
provider_backend: weka | vast | s3_compatible | ...
provider_policy_ref
expires_at
revoked_at
Credential sessions are operational evidence, not durable secrets:
storage_credential_sessions
id
org_id
project_id
grant_id
subject_kind
subject_id
subject_project_id
bucket_id
client_kind: s3_cli | sdk | workload_mount
credential_type: s3_session | provider_session | vault_wrapped_secret
prefixes[]
permissions[]
provider_backend
provider_session_ref
provider_policy_ref
status: active | expired | revocation_pending | revoked | failed
issued_by_user_id or issued_by_service_account_id
issued_at
expires_at
revoked_at
revoked_by_user_id
revoke_reason
source_workflow_id
idempotency_key
audit_log_id
metadata
Raw access keys, secret keys, session tokens, provider admin credentials, and provider policy JSON must not be stored in read models. If any credential material is persisted for workload delivery, it must be Vault-wrapped or stored through the platform secret path, never plaintext in Postgres.
The storage_credential_sessions row is the durable evidence boundary for a
one-time credential response. It may store provider session references,
policy references, scope summaries, issuer identity, expiry, revocation posture,
and diagnostic metadata. It must not store access_key_id, secret_access_key,
session_token, wrapped-token bytes, provider admin credentials, raw provider
policy JSON, or any recoverable credential material.
POST /api/v1/v3/storage/{bucket_id}/credentials creates or reuses the
idempotent evidence row before returning the one-time credential material. The
response includes credential_session_id and a user-safe session evidence
summary so operators and clients can correlate later audit/history without
needing the secret-bearing response again.
The service adapter boundary lives under packages/services/storage:
ProviderAdapter.CompileGrantPolicycompiles GPUaaS grants into provider policy material.ProviderAdapter.ApplyGrantandRevokeGrantreconcile provider policy.ProviderAdapter.IssueCredentialreturns one-time short-lived credential material for direct S3/SDK/workload use.ProviderAdapter.RevokeCredentialis best-effort because active STS session revocation depends on provider capability.
Audit Requirements¶
Audit every privileged storage/IAM mutation:
storage.bucket.createstorage.bucket.deletestorage.grant.createstorage.grant.revokestorage.credential.issuestorage.credential.rotatestorage.credential.revokestorage.mount.attachstorage.mount.detach
Audit rows must preserve:
- actor user or service account
- owning project
- granted principal
- bucket/prefix
- permissions
- expiration
- provider backend type
- credential session ID when provider credential material is issued
- correlation ID
Do not log raw provider credentials.
Provider credential issuance also uses IAM Token Issuer audit actions:
- successful direct provider credential issuance writes
auth.provider_credential.issueandstorage.credential.issue - denied issuance writes
auth.provider_credential.denywhen the caller or reusable credential identity can be safely attributed - provider-side or evidence-only revocation writes
auth.provider_credential.revokeandstorage.credential.revoke
V3 UI Implications¶
Storage workbench needs to show:
- owned buckets
- shared-with-this-project buckets
- bucket owner project
- grants and audiences
- attached workloads
- credential posture, without revealing secrets
- provider capability hints
Launch flows need inline bucket selection across:
- owned project buckets
- buckets shared with the active project
- datasets/artifacts that are read-only
- output/checkpoint buckets that are writable
Direct S3 credential issuance should be a deliberate action with visible TTL and scope summary.
Non-Goals For First Slice¶
- global user home buckets outside projects
- public buckets
- anonymous S3 access
- broad tenant-wide write grants
- exposing WEKA admin concepts directly in the UI
- making WEKA the source of truth for GPUaaS IAM
First Implementation Slice¶
- Project-owned buckets/namespaces.
- Prefix-capable grants between projects.
- Workload/app service-account credentials for mounts and S3 access.
- Short-lived user credentials for direct S3 client access.
- WEKA provider adapter hidden behind storage interfaces.
- v3 read models for owner, shared audiences, mounts, flags, and provider capability hints.