Skip to content

Python SDK — gpuaas-sdk

Implemented

Source: sdk/python/ · package gpuaas_sdk · doc/architecture/CLI_PythonSDK_v1_Plan.md

Strict-typed (mypy --strict) Python client built on httpx. Same authentication paths as the CLI; same OpenAPI contract; same error envelope.

Install

pip install gpuaas-sdk
# or from the repo
cd sdk/python && pip install -e .[dev]

Requires Python 3.10+. Single runtime dependency: httpx.

Module map

mindmap
  root((gpuaas_sdk))
    client.py
      Client class
      sync + async
      retries
      idempotency keys
      pluggable auth
    auth.py
      OIDC PKCE
      dev-login
      service-account-token
      refresh
    catalog.py
      list_skus
      get_sku
    allocations.py
      list / get
      create / release / connect
      terminal_token
    apps_catalog.py
      list / get manifests
    apps_instances.py
      create / start / stop / release
      members
      events
    apps_artifacts.py
      list / promote
    apps_entitlements.py
      list / grant
    billing.py
      balance
      usage
      payment_sessions
      refunds
    storage.py
      list / upload / download
      rename / delete
      mkdir
    iam.py
      tenants / projects
      memberships
    service_accounts.py
      create / rotate / revoke
    nodes.py
      list / get (admin)
    projects.py
      list / create / memberships
    ops.py
      overview (admin)
    errors.py
      typed exceptions per error_code

Authentication

from gpuaas_sdk import Client
from gpuaas_sdk.auth import BrowserOIDCAuth, DevPasswordAuth, ServiceAccountAuth

# Browser PKCE (primary for humans)
client = Client(
    base_url="https://api.gpuaas.example.com",
    auth=BrowserOIDCAuth(tenant_hint="acme"),
)

# Dev-only password flow
client = Client(
    base_url="http://localhost:8080",
    auth=DevPasswordAuth(username="dev-user", password="dev123"),
)

# Automation: service-account signing key
client = Client(
    base_url="https://api.gpuaas.example.com",
    auth=ServiceAccountAuth(
        sa_id="sa-1234",
        key_id="key-2024-01",
        signing_key_pem=open("/etc/gpuaas/sa.key").read(),
    ),
)

Auth handlers transparently:

  • Refresh tokens within 60s of expiry
  • Rotate refresh tokens
  • Surface token_expired / token_invalid as typed exceptions

Auth refresh flow

sequenceDiagram
    autonumber
    participant App as Your code
    participant SDK as gpuaas_sdk.Client
    participant AUTH as auth handler
    participant API as cmd/api

    App->>SDK: client.allocations.list()
    SDK->>AUTH: get_access_token()
    AUTH->>AUTH: check exp - now < 60s?
    alt token near expiry
        AUTH->>API: POST /auth/token/refresh
        API-->>AUTH: new tokens (rotated refresh)
        AUTH->>AUTH: persist
    end
    AUTH-->>SDK: access_token
    SDK->>API: GET /allocations Authorization: Bearer
    API-->>SDK: response
    SDK-->>App: typed model: list[Allocation]

Quick examples

Provision and wait for active

from gpuaas_sdk import Client

client = Client.from_default_credentials()

alloc = client.allocations.create(
    sku="h200-sxm-slice",
    gpus_total=1,
    region_code="us-buffalo-1",
    ssh_key_ids=["7f2e..."],
    idempotency_key="my-run-001",
)
print(f"allocation {alloc.id} status={alloc.status}")

# poll until active or terminal
final = client.allocations.wait_for(alloc.id, states={"active", "failed"}, timeout=600)
print(f"final state: {final.status}, private_ip={final.connection.private_ip}")

Mint a terminal token + connect

tok = client.allocations.mint_terminal_token(alloc.id)
print(f"ws_url={tok.ws_url}  expires_in={tok.expires_in}s")
# Then open the WS yourself, or use the connect helper:
# client.allocations.connect(alloc.id, mode="terminal")  # opens browser

Streaming usage and balance

balance = client.billing.balance()
print(f"balance: {balance.amount_minor / 100:.2f} {balance.currency}")

for record in client.billing.usage(since="2026-04-01T00:00:00Z"):
    print(record.interval_start, record.allocation_id, record.cost_minor)

App-instance lifecycle

inst = client.apps.instances.create(
    app_slug="jupyter-cuda-dev",
    project_id=client.context.project_id,
    target_allocation_id=alloc.id,
)
client.apps.instances.wait_for(inst.id, states={"running", "failed"})

# add a worker member (e.g. Slurm)
client.apps.instances.members.add(
    instance_id=inst.id,
    allocation_id=alloc2.id,
    role="worker",
)

# stream events
for ev in client.apps.instances.events(inst.id, follow=True):
    print(ev.event_type, ev.payload)

Storage CRUD

client.storage.mkdir(path="/datasets/imagenet/")
client.storage.upload(local_path="train.parquet", remote_path="/datasets/imagenet/train.parquet")
for obj in client.storage.list(prefix="/datasets/"):
    print(obj.path, obj.size_bytes, obj.modified_at)

Error handling

The SDK maps the API error catalog to typed exceptions:

flowchart LR
    API[API error_code] --> EX{Map to exception}
    EX --> E1[TokenExpired<br/>TokenInvalid<br/>InsufficientPermissions]
    EX --> E2[ValidationError<br/>InvalidRequest]
    EX --> E3[AllocationNotFound<br/>AllocationNotActive<br/>SKUUnavailable]
    EX --> E4[InsufficientBalance<br/>RefundWindowExceeded]
    EX --> E5[RateLimitExceeded]
    EX --> E6[UpstreamError<br/>ServiceUnavailable<br/>InternalError]

    classDef tx fill:#f8d7da,stroke:#42101e
    class E1,E2,E3,E4,E5,E6 tx
from gpuaas_sdk.errors import (
    SDKError, TokenExpired, AllocationNotFound, RateLimitExceeded,
    InsufficientBalance,
)

try:
    client.allocations.release(alloc_id)
except AllocationNotFound:
    print("already gone")
except InsufficientBalance as e:
    print(f"balance too low: {e.message}")
except RateLimitExceeded as e:
    print(f"slow down — retry after {e.retry_after}s")
except SDKError as e:
    # last-resort fallback; includes e.code, e.correlation_id, e.details
    raise

Every exception carries code, message, correlation_id, details — the full API error envelope.

Async usage

import asyncio
from gpuaas_sdk import AsyncClient

async def main():
    async with AsyncClient.from_default_credentials() as client:
        allocs = await client.allocations.list()
        results = await asyncio.gather(*(client.allocations.get(a.id) for a in allocs))
        for r in results:
            print(r.id, r.status)

asyncio.run(main())

Same module shape; every call has an async def twin.

Testing your code

The SDK ships with respx fixtures for tests:

import pytest
from gpuaas_sdk.testing import mock_client

def test_provisioning(mock_client):
    mock_client.allocations.fake_create("alloc-1", status="active")
    result = mock_client.allocations.create(sku="h200-sxm-slice", gpus_total=1)
    assert result.id == "alloc-1"
    assert result.status == "active"

The SDK's own tests live under sdk/python/tests/test_allocations.py, test_apps.py, test_auth.py, test_billing.py, test_catalog.py, test_errors.py, test_iam.py, test_service_accounts.py, test_storage.py.

Where to look next