End-to-end quick start¶

Implemented

Get from zero to "a working allocation with a terminal" in under 5 minutes. Three variants — CLI, Python SDK, curl — for the same flow.

The path¶

flowchart LR
    A[1. Login] --> B[2. Browse SKUs]
    B --> C[3. Register SSH key]
    C --> D[4. Create allocation]
    D --> E[5. Wait until active]
    E --> F[6. Open terminal /<br/>SSH in]
    F --> G[7. Release]

    classDef step fill:#e8f5e9,stroke:#2e7d32
    class A,B,C,D,E,F,G step

Prereqs¶

A GPUaaS tenancy (your account or dev environment)
For CLI: gpuaas binary — make build-cli then ./bin/gpuaas
For SDK: pip install gpuaas-sdk
For curl: jq recommended
An SSH public key handy (~/.ssh/id_ed25519.pub)

Variant A — CLI¶

# 1. login (opens browser)
gpuaas auth login --tenant-hint acme

# 2. browse SKUs
gpuaas catalog list

# 3. register your SSH key (one-time)
gpuaas iam ssh-keys add \
  --title "laptop-ed25519" \
  --key "$(cat ~/.ssh/id_ed25519.pub)"
# → saves a key id, e.g. ssh-key-7f2e...

# 4. create allocation (1 GPU H200 slice)
ALLOC=$(gpuaas allocations create \
  --sku h200-sxm-slice \
  --gpus 1 \
  --region us-buffalo-1 \
  --ssh-key-id ssh-key-7f2e... \
  --idempotency-key quickstart-001 \
  --output json | jq -r .id)
echo "allocation: $ALLOC"

# 5. wait until active
gpuaas allocations get $ALLOC --watch

# 6. open browser terminal (mints token + opens browser)
gpuaas allocations connect $ALLOC --mode terminal

#    or get the SSH command
gpuaas allocations connect $ALLOC --mode ssh
# → prints something like: ssh ubuntu@10.100.0.10

# 7. release when done
gpuaas allocations release $ALLOC

Variant B — Python SDK¶

from gpuaas_sdk import Client

# 1. login (browser PKCE) and persist credentials
client = Client.from_default_credentials()  # uses ~/.gpuaas/credentials.json
# or fresh login: Client(base_url=..., auth=BrowserOIDCAuth(tenant_hint="acme"))

# 2. browse SKUs
for sku in client.catalog.list():
    print(f"{sku.sku:30} gpus_total={sku.gpus_total} price/hr={sku.hourly_price_minor/100}")

# 3. register SSH key (one-time)
key = client.iam.ssh_keys.add(
    title="laptop-ed25519",
    key_body=open("~/.ssh/id_ed25519.pub").read(),
)
print(f"ssh key id: {key.id}")

# 4. create allocation
alloc = client.allocations.create(
    sku="h200-sxm-slice",
    gpus_total=1,
    region_code="us-buffalo-1",
    ssh_key_ids=[key.id],
    idempotency_key="quickstart-001",
)
print(f"allocation: {alloc.id} status={alloc.status}")

# 5. wait until active
final = client.allocations.wait_for(alloc.id, states={"active", "failed"}, timeout=600)
print(f"final: {final.status}, ip={final.connection.private_ip}")

# 6. open browser terminal (mints token + opens browser)
client.allocations.connect(alloc.id, mode="terminal")
#    or get the WS URL + token for your own client:
tok = client.allocations.mint_terminal_token(alloc.id)
print(f"ws_url={tok.ws_url}  token={tok.token[:8]}...  expires_in={tok.expires_in}s")

# 7. release
client.allocations.release(alloc.id)

Variant C — curl¶

# 1. dev login (works against local stack)
TOKEN=$(curl -s -X POST http://localhost:8080/realms/gpuaas/protocol/openid-connect/token \
  -d "grant_type=password&client_id=gpuaas-api&client_secret=dev-client-secret&username=dev-user&password=dev123" \
  | jq -r .access_token)

API=http://localhost:8443/api/v1

# 2. browse SKUs
curl -sH "Authorization: Bearer $TOKEN" $API/catalog | jq '.items[].sku'

# 3. register SSH key
KEY_ID=$(curl -s -X POST $API/iam/ssh-keys \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "{\"title\":\"laptop-ed25519\",\"key_body\":\"$(cat ~/.ssh/id_ed25519.pub)\"}" \
  | jq -r .id)
echo "ssh key id: $KEY_ID"

# 4. create allocation
ALLOC=$(curl -s -X POST $API/allocations \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "X-Idempotency-Key: quickstart-001" \
  -d "{
    \"sku\":\"h200-sxm-slice\",
    \"gpus_total\":1,
    \"region_code\":\"us-buffalo-1\",
    \"ssh_key_ids\":[\"$KEY_ID\"]
  }" | jq -r .id)
echo "allocation: $ALLOC"

# 5. poll until active
while true; do
  STATUS=$(curl -sH "Authorization: Bearer $TOKEN" "$API/allocations/$ALLOC" | jq -r .status)
  echo "status=$STATUS"
  [ "$STATUS" = "active" ] && break
  [ "$STATUS" = "failed" ] && { echo "alloc failed"; exit 1; }
  sleep 5
done

# 6. mint terminal token + show WS URL
curl -s -X POST "$API/allocations/$ALLOC/terminal-token" \
  -H "Authorization: Bearer $TOKEN" | jq

# 7. release
curl -s -X POST "$API/allocations/$ALLOC/release" \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Idempotency-Key: quickstart-release-001" | jq

What just happened (under the hood)¶

sequenceDiagram
    autonumber
    participant U as you
    participant API as cmd/api
    participant ORCH as orchestrator
    participant DB as Postgres
    participant OR as outbox-relay
    participant NATS as NATS
    participant PW as provisioning-worker
    participant NA as node-agent
    participant VM as Slice VM
    participant BW as billing-worker

    U->>API: POST /allocations
    API->>ORCH: place
    ORCH->>DB: reserve N slots + insert allocation + outbox row<br/>(one tx)
    DB-->>ORCH: ok
    ORCH-->>API: allocation_id
    API-->>U: 201 status=requested

    OR->>DB: poll outbox
    OR->>NATS: provisioning.requested
    NATS-->>PW: deliver
    PW->>NA: slice.vm_provision (mTLS)
    NA->>VM: 17 phases — virt-install + cloud-init
    VM-->>NA: SSH + readiness marker
    NA-->>PW: result
    PW->>DB: status=active + outbox: provisioning.active
    OR->>NATS: publish
    NATS-->>BW: start accrual

    U->>API: POST /terminal-token
    API-->>U: token + ws_url
    U->>API: WSS upgrade
    API->>VM: relay (via terminal-gateway)

    Note over U,VM: tenant works

    U->>API: POST /release
    API->>PW: dispatch slice.vm_release
    NA->>VM: shutdown + destroy + wipe
    PW->>DB: status=released
    OR->>NATS: provisioning.releasing.completed
    NATS-->>BW: stop accrual

Common gotchas¶

Symptom	Reason	Fix
`sku_unavailable` on create	No node has enough free slots in your region	Different SKU, different region, or wait
`insufficient_balance` on create	Account balance < expected hourly cost	Top up via `gpuaas billing topup` or Stripe portal
Allocation stuck in `requested` for >2 min	Worker pile-up or node-agent disconnected	Check `gpuaas allocations get --watch`; see Provisioning Workflow Stuck Runbook
Allocation `failed` immediately	Image / VFIO / readiness issue	Use `gpuaas allocations get $ID --output json` — look at `failure_reason`
Terminal WS closes immediately	Token expired or single-use already validated	Mint a fresh one
429 rate_limit_exceeded	Per-user RPM cap	Honor `Retry-After`; reduce concurrent calls

Where to look next¶

CLI — full command coverage
Python SDK — typed surface
Direct REST API — patterns for any language
App SDK — build apps that run on GPUaaS
Slice trail — the 14-step deep dive into what happens behind allocations create