Skip to content

Storage Sharing and IAM Model v1

Date: 2026-04-27

Purpose

Define the storage ownership, sharing, and IAM model before integrating WEKA as the first production storage provider. The model must support project-scoped storage by default while allowing controlled sharing of datasets, checkpoints, model artifacts, and workspaces across projects.

This document is provider-neutral. WEKA is the first backend, but the product model should also work for VAST or S3-compatible providers later.

Concrete user, workload, CLI, S3-client, sharing, and revocation scenarios are documented in doc/architecture/Storage_IAM_User_Flows_v1.md.

Lessons from the earlier Scality-oriented IAM project are captured in doc/architecture/Storage_IAM_External_Reference_Lessons_v1.md. The key carry-forward is to keep GPUaaS IAM as source of truth, compile grants into provider policy, and avoid creating long-lived provider users for every human.

WEKA-specific feasibility, limits, and validation items are tracked in doc/architecture/Storage_WEKA_Capability_Assessment_v1.md.

Important WEKA constraint: the integration is dual-protocol. WEKAFS/POSIX is the primary high-performance workload data path for training, notebooks, Kubernetes PV/PVC, and apps that need filesystem semantics. S3/STS should be enabled and validated for bucket/object workflows, SDK access, external clients, and apps that expect S3 semantics.

Core Decision

Storage ownership is project-first:

tenant
  project
    bucket / namespace
      shared/
      users/<user-id>/
      workloads/<workload-id>/
      datasets/<dataset-id>/
      checkpoints/<workload-id>/
      artifacts/

Users are actors inside one or more projects. A user's personal files are scoped to a project, for example Research/users/subash/ and Sandbox/users/subash/, not a global users/subash/projects/* hierarchy.

Deferred Decision: User Home Storage

Project-owned storage and user home storage are related, but they are not the same product primitive.

The first WEKAFS implementation should focus on project storage objects that are explicitly attached to workloads and apps. Before adding persistent home directories, revisit the user-home model as a separate design decision.

Open questions:

  • whether every project member receives a separate project-scoped home directory, for example projects/{project_id}/users/{user_id}/home
  • whether a shared project home is ever appropriate, and if so whether it is a separate shared storage object rather than /home
  • whether the default workload mount should include only the launching user's home, all project member homes, or no home mount unless requested
  • what happens when a user is added to or removed from a project after a workload has already been allocated
  • whether a running multi-user workload can safely reconcile OS users, home directories, filesystem ACLs, and mount visibility without restart
  • how shared app runtimes such as notebooks, training dashboards, or inference workbenches distinguish the launching user from later collaborators
  • whether user-home storage is quota-attributed to the user, project, tenant, or a combination of project and user

Default posture until this is designed:

  • do not mount one writable project-wide home directory for every user
  • do not automatically grant newly added project members access to already running workloads
  • treat shared datasets, artifacts, checkpoints, and app data as project storage objects with explicit grants
  • treat per-user persistent home directories as a future storage/IAM feature requiring explicit UX, backend state, node-agent behavior, and audit events

This prevents the first WEKAFS integration from accidentally baking in a single-home or all-users-visible model that would be hard to unwind later.

Why Project-First

  • GPUaaS IAM, service accounts, app launches, workloads, quotas, and accounting are project-scoped.
  • A user can have different roles in different projects.
  • Datasets and model artifacts are usually project assets, not personal assets.
  • Releasing a workload or deleting a project has a clear cleanup boundary.
  • Provider policies can be compiled from project + principal + prefix.

Ownership vs Access

Every bucket or storage namespace has one owning project.

The owning project controls:

  • quota attribution
  • billing attribution
  • lifecycle policy
  • deletion authority
  • default access policy
  • provider placement and capability selection

Access can be granted outside the owning project through explicit grants.

Sharing Model

A bucket or prefix can grant access to:

  • another project
  • a user within a project
  • a GPUaaS service account
  • a workload or app instance
  • a tenant-managed shared dataset group, once that exists

Example:

owner project: Training
bucket: training:imagenet

grants:
  Training project: read/write
  Inference project: read
  Sandbox project: read until 2026-05-31
  sa_training_pipeline: read/write checkpoints/
  sa_vllm_inference: read artifacts/model/

Default sharing posture:

  • no cross-project access unless explicitly granted
  • read-only is the default cross-project grant
  • cross-project write requires project admin or tenant admin authority
  • grants may target a full bucket or a prefix
  • grants may have expiration
  • grants must be audited

Principal Mapping

Human users do not become long-lived WEKA users by default.

Human user
  -> GPUaaS auth/session
  -> GPUaaS IAM check
  -> optional short-lived provider credentials only when direct S3/client access is enabled

Workload/app runtime
  -> GPUaaS service account
  -> GPUaaS-generated WEKAFS/POSIX mount plan for filesystem workloads
  -> optional provider-derived S3 service account or STS credential for object access

Principal classes:

GPUaaS principal Provider credential posture
Human user Short-lived scoped STS/session credentials only when requested for direct S3 client use.
Workload/app instance Project-scoped service account with a GPUaaS-controlled mount plan for WEKAFS/POSIX; provider-derived credentials only when object/S3 access is enabled.
Project automation Explicit project service account with scoped provider credentials.
Platform operations Platform storage-admin credential in platform custody, never exposed to users/workloads.

Direct S3 Client Flow

Users may need to use aws s3, Python SDKs, data loaders, or other S3 clients. This requires temporary credentials, not long-lived provider users.

This flow is separate from WEKAFS/POSIX mounts. It is required when the selected storage object is exposed through S3 semantics or when a user/app needs direct object-client access.

Flow:

  1. User authenticates to GPUaaS.
  2. User requests storage credentials for a project, bucket, and optional prefix.
  3. GPUaaS checks project role and storage grants.
  4. GPUaaS asks the provider adapter to issue temporary credentials or a short-lived scoped equivalent.
  5. GPUaaS returns endpoint, access key, secret key, session token when applicable, expiration, and the allowed scope summary.
  6. GPUaaS writes an audit log entry.

Rules:

  • default TTL should be short, for example 1 hour
  • no long-lived human provider keys
  • no provider credentials stored in browser local storage
  • revocation of project membership or grant stops future issuance
  • existing temporary credentials expire naturally unless provider-side session invalidation is available

Workload/App Credential Flow

Workloads should not use human credentials.

Flow:

  1. User launches a workload or app.
  2. GPUaaS authorizes the user and resolves requested storage mounts.
  3. GPUaaS creates or selects a project-scoped service account for the runtime.
  4. GPUaaS compiles bucket/prefix policy from workload intent and grants.
  5. Provider adapter creates provider-derived credentials.
  6. Credential material is stored in Vault or wrapped delivery, not plaintext DB.
  7. Node-agent/app controller receives sanitized mount instructions or wrapped credential delivery.
  8. On release/decommission, GPUaaS revokes or disables provider-derived access.

WEKA First Backend Mapping

WEKA integration should follow this mapping:

GPUaaS concept WEKA concept
Owning project bucket/namespace WEKA filesystem directory namespace or project prefix for WEKAFS/POSIX; WEKA S3 bucket when S3 is enabled.
GPUaaS service account Workload/app identity that receives a GPUaaS-generated mount plan; optional WEKA S3 service account for object access.
GPUaaS storage grant Mount authorization and read-only/read-write mode for WEKAFS/POSIX; optional WEKA IAM/bucket policy for S3.
Direct user S3 access Optional short-lived WEKA STS/session credentials if S3 is enabled for the selected mode.
Provider admin operations Platform-owned WEKA admin/API credential in Vault custody.

Provider-specific details must stay behind packages/services/storage. Read models expose only user-safe capability hints and access summaries.

Policy Compiler Inputs

The storage policy compiler needs:

  • tenant ID
  • owning project ID
  • requesting project ID
  • principal type: user, service account, workload, platform
  • principal ID
  • bucket ID or provider namespace
  • prefix list
  • operations: list, read, write, delete, mount, admin
  • expiration
  • reason / source workflow

Compiler output should be provider-specific policy material plus a user-safe summary for audit and read models.

Contract Baseline

The first API contract slice is provider-neutral and uses GPUaaS grants as the source of truth:

  • GET /api/v1/v3/storage/{bucket_id}/grants lists user-safe grant posture.
  • POST /api/v1/v3/storage/{bucket_id}/grants creates a bucket or prefix grant.
  • DELETE /api/v1/v3/storage/{bucket_id}/grants/{grant_id} revokes a grant.
  • POST /api/v1/v3/storage/{bucket_id}/credentials issues short-lived direct S3 credentials for CLI/SDK/workload mount use after IAM checks.

Storage grants compile into provider policy:

storage_grants
  id
  owner_project_id
  bucket_id
  prefixes[]
  subject_kind: project | user | service_account | workload | tenant_group
  subject_id
  subject_project_id
  permissions[]: list | read | write | delete | mount | admin
  provider_backend: weka | vast | s3_compatible | ...
  provider_policy_ref
  expires_at
  revoked_at

Credential sessions are operational evidence, not durable secrets:

storage_credential_sessions
  id
  org_id
  project_id
  grant_id
  subject_kind
  subject_id
  subject_project_id
  bucket_id
  client_kind: s3_cli | sdk | workload_mount
  credential_type: s3_session | provider_session | vault_wrapped_secret
  prefixes[]
  permissions[]
  provider_backend
  provider_session_ref
  provider_policy_ref
  status: active | expired | revocation_pending | revoked | failed
  issued_by_user_id or issued_by_service_account_id
  issued_at
  expires_at
  revoked_at
  revoked_by_user_id
  revoke_reason
  source_workflow_id
  idempotency_key
  audit_log_id
  metadata

Raw access keys, secret keys, session tokens, provider admin credentials, and provider policy JSON must not be stored in read models. If any credential material is persisted for workload delivery, it must be Vault-wrapped or stored through the platform secret path, never plaintext in Postgres.

The storage_credential_sessions row is the durable evidence boundary for a one-time credential response. It may store provider session references, policy references, scope summaries, issuer identity, expiry, revocation posture, and diagnostic metadata. It must not store access_key_id, secret_access_key, session_token, wrapped-token bytes, provider admin credentials, raw provider policy JSON, or any recoverable credential material.

POST /api/v1/v3/storage/{bucket_id}/credentials creates or reuses the idempotent evidence row before returning the one-time credential material. The response includes credential_session_id and a user-safe session evidence summary so operators and clients can correlate later audit/history without needing the secret-bearing response again.

The service adapter boundary lives under packages/services/storage:

  • ProviderAdapter.CompileGrantPolicy compiles GPUaaS grants into provider policy material.
  • ProviderAdapter.ApplyGrant and RevokeGrant reconcile provider policy.
  • ProviderAdapter.IssueCredential returns one-time short-lived credential material for direct S3/SDK/workload use.
  • ProviderAdapter.RevokeCredential is best-effort because active STS session revocation depends on provider capability.

Audit Requirements

Audit every privileged storage/IAM mutation:

  • storage.bucket.create
  • storage.bucket.delete
  • storage.grant.create
  • storage.grant.revoke
  • storage.credential.issue
  • storage.credential.rotate
  • storage.credential.revoke
  • storage.mount.attach
  • storage.mount.detach

Audit rows must preserve:

  • actor user or service account
  • owning project
  • granted principal
  • bucket/prefix
  • permissions
  • expiration
  • provider backend type
  • credential session ID when provider credential material is issued
  • correlation ID

Do not log raw provider credentials.

Provider credential issuance also uses IAM Token Issuer audit actions:

  • successful direct provider credential issuance writes auth.provider_credential.issue and storage.credential.issue
  • denied issuance writes auth.provider_credential.deny when the caller or reusable credential identity can be safely attributed
  • provider-side or evidence-only revocation writes auth.provider_credential.revoke and storage.credential.revoke

V3 UI Implications

Storage workbench needs to show:

  • owned buckets
  • shared-with-this-project buckets
  • bucket owner project
  • grants and audiences
  • attached workloads
  • credential posture, without revealing secrets
  • provider capability hints

Launch flows need inline bucket selection across:

  • owned project buckets
  • buckets shared with the active project
  • datasets/artifacts that are read-only
  • output/checkpoint buckets that are writable

Direct S3 credential issuance should be a deliberate action with visible TTL and scope summary.

Non-Goals For First Slice

  • global user home buckets outside projects
  • public buckets
  • anonymous S3 access
  • broad tenant-wide write grants
  • exposing WEKA admin concepts directly in the UI
  • making WEKA the source of truth for GPUaaS IAM

First Implementation Slice

  1. Project-owned buckets/namespaces.
  2. Prefix-capable grants between projects.
  3. Workload/app service-account credentials for mounts and S3 access.
  4. Short-lived user credentials for direct S3 client access.
  5. WEKA provider adapter hidden behind storage interfaces.
  6. v3 read models for owner, shared audiences, mounts, flags, and provider capability hints.