Skip to content

Storage IAM User Flows v1

Date: 2026-04-27

Purpose

Explain how GPUaaS storage access works from the user, workload, CLI, and provider perspectives. This document answers the recurring questions:

  • Does every GPUaaS user become a WEKA user?
  • Does every project use one shared service account?
  • How does a direct S3 client get access?
  • How are shared datasets and artifacts used across projects?
  • Who enforces access: GPUaaS or WEKA?

Short answer:

GPUaaS decides access intent.
GPUaaS compiles provider policy.
WEKA issues/enforces scoped provider credentials.

Human users stay in GPUaaS IAM. WEKA identities are provider-derived enforcement principals, usually service accounts or STS sessions.

Core Vocabulary

Term Meaning
GPUaaS user Human identity authenticated by GPUaaS/OIDC.
GPUaaS service account Project-scoped machine identity used by workloads, apps, and automation.
WEKA principal Provider-side S3 user, service account, or STS session used for enforcement.
Storage grant GPUaaS-owned truth that says a principal may access a bucket/prefix with specific actions.
Policy compiler GPUaaS component that converts storage grants into WEKA IAM/bucket/session policy.
STS credentials Temporary S3 credentials whose provider session carries or references a scoped policy.
Owning project Project that owns quota, billing, lifecycle, deletion authority, and default policy for a bucket.
Shared-with project Project that has explicit read/write access to someone else's bucket or prefix.

Authority Boundary

GPUaaS IAM
  - users
  - project membership
  - service accounts
  - storage grants
  - audit and policy intent

WEKA IAM / S3
  - provider principal/session
  - attached or session policy
  - S3 request enforcement
  - credential expiration

WEKA does not need to know GPUaaS users. It only needs a scoped provider principal or STS session with an enforceable policy.

Storage Hierarchy

Storage is project-owned by default:

tenant
  project
    bucket / namespace
      shared/
      users/<user-id>/
      workloads/<workload-id>/
      datasets/<dataset-id>/
      checkpoints/<workload-id>/
      artifacts/

Example:

Research/users/subash/
Research/users/priya/
Research/datasets/imagenet/
Research/artifacts/llama-3-70b/

A user has separate project-scoped personal areas:

Research/users/subash/
Sandbox/users/subash/

There is no global users/subash/projects/* ownership hierarchy.

Scenario 1: User Opens Storage In The UI

  1. User logs into GPUaaS.
  2. User selects tenant, project, and region.
  3. UI calls v3 storage read models.
  4. GPUaaS returns:
  5. buckets owned by the active project
  6. buckets shared with the active project
  7. owner project
  8. grants/audiences summary
  9. attached workloads
  10. provider capability hints
  11. UI does not receive WEKA credentials.

Expected UI labels:

  • Owned by Research
  • Shared from Training
  • Read-only dataset
  • Writable checkpoint output
  • Provider: WEKA

Scenario 2: User Creates A Bucket In The UI

  1. User opens Storage → New bucket.
  2. User selects:
  3. project
  4. purpose: workspace, dataset, checkpoint, artifact, generic
  5. quota
  6. lifecycle
  7. access defaults
  8. GPUaaS checks the user's project role.
  9. GPUaaS creates a project-owned bucket/namespace.
  10. Provider adapter creates the WEKA-side bucket/filesystem/prefix as needed.
  11. GPUaaS records storage bucket metadata and provider reference.
  12. GPUaaS writes audit:
  13. storage.bucket.create

No human WEKA user is created.

Scenario 3: User Shares A Dataset With Another Project

Example:

Training project owns: training:imagenet
Inference project needs read access.

Flow:

  1. Training project admin opens bucket detail.
  2. Admin adds grant:
  3. target: Inference project
  4. prefix: datasets/imagenet/
  5. permissions: read/list
  6. expiration: optional
  7. GPUaaS writes a storage grant.
  8. GPUaaS compiles WEKA policy updates for future provider credentials.
  9. GPUaaS writes audit:
  10. storage.grant.create

The shared project does not own the bucket. It only has access according to the grant.

Scenario 4: User Uses A Shared Bucket In A Launch Wizard

Example:

Inference project launches vLLM using model artifacts shared from Training.

Flow:

  1. User opens app/compute launch wizard in the Inference project.
  2. Storage picker shows:
  3. Inference-owned writable buckets
  4. Training-shared read-only artifacts/datasets
  5. User selects training:artifacts/llama-3-70b/ as read-only input.
  6. GPUaaS checks:
  7. user can launch in Inference
  8. Inference has read grant to the Training prefix
  9. GPUaaS creates/selects a workload service account.
  10. GPUaaS compiles WEKA policy:
  11. read/list on shared Training artifact prefix
  12. write only to Inference output/checkpoint prefix if requested
  13. Provider credentials are delivered to the workload, not the human user.
  14. Audit records both the user-caused launch and storage credential issuance.

Scenario 5: Workload Uses Storage

Workloads should not run with human credentials.

Flow:

  1. User launches workload.
  2. GPUaaS creates a workload-bound service account, for example:
  3. sa_jupyter_wl_123
  4. GPUaaS attaches explicit storage grants to that runtime:
  5. Research/users/subash/wl_123/* read/write
  6. Research/datasets/imagenet/* read-only
  7. GPUaaS asks WEKA for provider credentials or creates a provider service account with that policy.
  8. Node-agent or app controller receives only scoped mount/S3 instructions.
  9. On workload release, GPUaaS revokes or disables provider access.

Important rule:

project-scoped service account = may exist in project
storage access = explicit bucket/prefix grants only

A project service account does not automatically see every user's data.

Scenario 6: User Uses GPUaaS CLI For Storage

Example command shape:

gpuaas storage ls --project Research research:users/subash/
gpuaas storage cp ./data.csv research:users/subash/data.csv

Flow:

  1. CLI uses GPUaaS auth token.
  2. CLI calls GPUaaS storage APIs.
  3. GPUaaS checks user IAM and storage grants.
  4. GPUaaS performs the operation through the storage service or returns a controlled transfer path.
  5. User never handles WEKA credentials directly.

This is the safest UX for common users because GPUaaS mediates the operation.

Scenario 7: User Uses Any S3 Client

Users may need aws s3, boto3, PyTorch data loaders, or other S3-compatible clients.

Example request:

gpuaas storage credentials issue \
  --project Research \
  --bucket research \
  --prefix users/subash/ \
  --mode read-write \
  --ttl 1h

Flow:

  1. User authenticates to GPUaaS.
  2. User requests temporary S3 credentials for bucket/prefix.
  3. GPUaaS checks:
  4. project membership
  5. storage grants
  6. requested mode
  7. TTL policy
  8. GPUaaS compiles a scoped WEKA policy.
  9. GPUaaS calls WEKA STS or provider API.
  10. WEKA issues temporary credentials whose session carries or references the policy.
  11. GPUaaS returns:
  12. endpoint
  13. access key
  14. secret key
  15. session token when applicable
  16. expiration
  17. allowed bucket/prefix summary
  18. User configures any S3 client with those credentials.
  19. WEKA enforces the scoped policy on every S3 request.

The user does not pass policy to the S3 client. The credentials are already bound to the provider session/policy.

Scenario 8: User Uses WEKA/S3 CLI Directly

The preferred path is still through GPUaaS-issued temporary credentials.

Example:

export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_SESSION_TOKEN=...

aws --endpoint-url https://s3.example.internal \
  s3 ls s3://research/users/subash/

WEKA sees only the temporary provider session. GPUaaS knows why it was issued because it records:

credential_issuance_id
user_id
project_id
bucket
prefixes
permissions
expires_at
provider_session_id
policy_hash
correlation_id

Scenario 9: Project Automation Uses Storage

Project automation should use GPUaaS service accounts, not human users.

Flow:

  1. Project admin creates service account:
  2. sa_training_pipeline
  3. Admin grants:
  4. read on datasets/imagenet/
  5. write on checkpoints/pipeline/
  6. Automation exchanges service-account credentials for a GPUaaS token.
  7. Automation requests temporary provider credentials or uses GPUaaS storage APIs.
  8. GPUaaS logs actor as service account.

Scenario 10: Grant Revocation

Example:

Training revokes Inference read access to training:imagenet.

Flow:

  1. Admin revokes the storage grant.
  2. GPUaaS marks grant inactive.
  3. GPUaaS prevents future credential issuance for that grant.
  4. GPUaaS disables or updates provider principals where practical.
  5. Existing STS credentials expire at TTL unless WEKA supports provider-side session invalidation.
  6. GPUaaS writes audit:
  7. storage.grant.revoke
  8. storage.credential.revoke when provider access is actively revoked

Policy recommendation:

  • keep user direct-access STS TTL short
  • use workload-bound credentials that can be revoked on release
  • rotate provider credentials on suspicious activity

Scenario 11: User Leaves Project

Flow:

  1. Tenant/project admin removes user from project.
  2. GPUaaS blocks future storage read/write/list credential issuance in that project.
  3. Any user-owned project prefixes remain project data unless lifecycle policy says otherwise.
  4. Existing direct STS credentials expire by TTL.
  5. Workload credentials are unaffected unless the workload was owned by that user and policy requires release/reassignment.

Scenario 12: Workload Is Released

Flow:

  1. Allocation/app release starts.
  2. GPUaaS detaches storage mounts.
  3. GPUaaS disables/deletes workload-bound provider principal or credential.
  4. Output/checkpoint data remains according to bucket lifecycle policy.
  5. GPUaaS writes audit:
  6. storage.mount.detach
  7. storage.credential.revoke

Scenario 13: Provider Drift Is Detected

Drift examples:

  • WEKA service account exists but GPUaaS grant is deleted.
  • WEKA policy hash differs from GPUaaS compiled policy.
  • Bucket exists in WEKA but not in GPUaaS.
  • GPUaaS bucket exists but provider object is missing.

Flow:

  1. Storage reconciler scans provider state.
  2. Reconciler compares provider principals/policies with GPUaaS grant truth.
  3. Drift is surfaced in v3 Storage:
  4. permission_drift
  5. failed_mount
  6. provider_missing
  7. Operator can reconcile from GPUaaS.

Scenario 14: Platform Operator Access

Platform operators may need provider-admin operations, but those credentials are not user/runtime credentials.

Flow:

  1. Platform stores WEKA admin/API credential in platform custody.
  2. Storage adapter uses it only for approved provider-admin workflows.
  3. Provider-admin secret is never returned to frontend, workload, or user CLI.
  4. Every provider-admin action is audited.

Policy Propagation Summary

For direct S3 clients:

GPUaaS user intent
  -> GPUaaS policy compiler
  -> WEKA STS request with scoped policy
  -> temporary credentials
  -> S3 client uses credentials
  -> WEKA enforces policy

For workloads:

GPUaaS launch intent
  -> workload service account
  -> GPUaaS policy compiler
  -> WEKA service account or STS credentials
  -> node-agent/app receives scoped access
  -> WEKA enforces policy

What We Must Not Do

  • Do not create long-lived WEKA users for every GPUaaS user by default.
  • Do not use one shared project-wide provider credential for all workloads.
  • Do not give a project service account implicit access to all project data.
  • Do not expose provider admin credentials to UI, CLI, workloads, or app code.
  • Do not make WEKA the source of truth for GPUaaS IAM.
  • Do not issue direct S3 credentials without an audit record and expiration.
  • doc/architecture/Storage_Sharing_and_IAM_Model_v1.md
  • doc/architecture/Storage_Provider_Capability_Model_v1.md
  • doc/architecture/Service_Account_Model.md
  • doc/architecture/Platform_Access_Credential_Model_v1.md
  • doc/product/V3_Mock_To_Production_Data_Parity_v1.md