End-to-end quick start¶
Implemented
Get from zero to "a working allocation with a terminal" in under 5 minutes. Three variants — CLI, Python SDK, curl — for the same flow.
The path¶
flowchart LR
A[1. Login] --> B[2. Browse SKUs]
B --> C[3. Register SSH key]
C --> D[4. Create allocation]
D --> E[5. Wait until active]
E --> F[6. Open terminal /<br/>SSH in]
F --> G[7. Release]
classDef step fill:#e8f5e9,stroke:#2e7d32
class A,B,C,D,E,F,G step
Prereqs¶
- A GPUaaS tenancy (your account or dev environment)
- For CLI:
gpuaasbinary —make build-clithen./bin/gpuaas - For SDK:
pip install gpuaas-sdk - For curl:
jqrecommended - An SSH public key handy (
~/.ssh/id_ed25519.pub)
Variant A — CLI¶
# 1. login (opens browser)
gpuaas auth login --tenant-hint acme
# 2. browse SKUs
gpuaas catalog list
# 3. register your SSH key (one-time)
gpuaas iam ssh-keys add \
--title "laptop-ed25519" \
--key "$(cat ~/.ssh/id_ed25519.pub)"
# → saves a key id, e.g. ssh-key-7f2e...
# 4. create allocation (1 GPU H200 slice)
ALLOC=$(gpuaas allocations create \
--sku h200-sxm-slice \
--gpus 1 \
--region us-buffalo-1 \
--ssh-key-id ssh-key-7f2e... \
--idempotency-key quickstart-001 \
--output json | jq -r .id)
echo "allocation: $ALLOC"
# 5. wait until active
gpuaas allocations get $ALLOC --watch
# 6. open browser terminal (mints token + opens browser)
gpuaas allocations connect $ALLOC --mode terminal
# or get the SSH command
gpuaas allocations connect $ALLOC --mode ssh
# → prints something like: ssh ubuntu@10.100.0.10
# 7. release when done
gpuaas allocations release $ALLOC
Variant B — Python SDK¶
from gpuaas_sdk import Client
# 1. login (browser PKCE) and persist credentials
client = Client.from_default_credentials() # uses ~/.gpuaas/credentials.json
# or fresh login: Client(base_url=..., auth=BrowserOIDCAuth(tenant_hint="acme"))
# 2. browse SKUs
for sku in client.catalog.list():
print(f"{sku.sku:30} gpus_total={sku.gpus_total} price/hr={sku.hourly_price_minor/100}")
# 3. register SSH key (one-time)
key = client.iam.ssh_keys.add(
title="laptop-ed25519",
key_body=open("~/.ssh/id_ed25519.pub").read(),
)
print(f"ssh key id: {key.id}")
# 4. create allocation
alloc = client.allocations.create(
sku="h200-sxm-slice",
gpus_total=1,
region_code="us-buffalo-1",
ssh_key_ids=[key.id],
idempotency_key="quickstart-001",
)
print(f"allocation: {alloc.id} status={alloc.status}")
# 5. wait until active
final = client.allocations.wait_for(alloc.id, states={"active", "failed"}, timeout=600)
print(f"final: {final.status}, ip={final.connection.private_ip}")
# 6. open browser terminal (mints token + opens browser)
client.allocations.connect(alloc.id, mode="terminal")
# or get the WS URL + token for your own client:
tok = client.allocations.mint_terminal_token(alloc.id)
print(f"ws_url={tok.ws_url} token={tok.token[:8]}... expires_in={tok.expires_in}s")
# 7. release
client.allocations.release(alloc.id)
Variant C — curl¶
# 1. dev login (works against local stack)
TOKEN=$(curl -s -X POST http://localhost:8080/realms/gpuaas/protocol/openid-connect/token \
-d "grant_type=password&client_id=gpuaas-api&client_secret=dev-client-secret&username=dev-user&password=dev123" \
| jq -r .access_token)
API=http://localhost:8443/api/v1
# 2. browse SKUs
curl -sH "Authorization: Bearer $TOKEN" $API/catalog | jq '.items[].sku'
# 3. register SSH key
KEY_ID=$(curl -s -X POST $API/iam/ssh-keys \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "{\"title\":\"laptop-ed25519\",\"key_body\":\"$(cat ~/.ssh/id_ed25519.pub)\"}" \
| jq -r .id)
echo "ssh key id: $KEY_ID"
# 4. create allocation
ALLOC=$(curl -s -X POST $API/allocations \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-H "X-Idempotency-Key: quickstart-001" \
-d "{
\"sku\":\"h200-sxm-slice\",
\"gpus_total\":1,
\"region_code\":\"us-buffalo-1\",
\"ssh_key_ids\":[\"$KEY_ID\"]
}" | jq -r .id)
echo "allocation: $ALLOC"
# 5. poll until active
while true; do
STATUS=$(curl -sH "Authorization: Bearer $TOKEN" "$API/allocations/$ALLOC" | jq -r .status)
echo "status=$STATUS"
[ "$STATUS" = "active" ] && break
[ "$STATUS" = "failed" ] && { echo "alloc failed"; exit 1; }
sleep 5
done
# 6. mint terminal token + show WS URL
curl -s -X POST "$API/allocations/$ALLOC/terminal-token" \
-H "Authorization: Bearer $TOKEN" | jq
# 7. release
curl -s -X POST "$API/allocations/$ALLOC/release" \
-H "Authorization: Bearer $TOKEN" \
-H "X-Idempotency-Key: quickstart-release-001" | jq
What just happened (under the hood)¶
sequenceDiagram
autonumber
participant U as you
participant API as cmd/api
participant ORCH as orchestrator
participant DB as Postgres
participant OR as outbox-relay
participant NATS as NATS
participant PW as provisioning-worker
participant NA as node-agent
participant VM as Slice VM
participant BW as billing-worker
U->>API: POST /allocations
API->>ORCH: place
ORCH->>DB: reserve N slots + insert allocation + outbox row<br/>(one tx)
DB-->>ORCH: ok
ORCH-->>API: allocation_id
API-->>U: 201 status=requested
OR->>DB: poll outbox
OR->>NATS: provisioning.requested
NATS-->>PW: deliver
PW->>NA: slice.vm_provision (mTLS)
NA->>VM: 17 phases — virt-install + cloud-init
VM-->>NA: SSH + readiness marker
NA-->>PW: result
PW->>DB: status=active + outbox: provisioning.active
OR->>NATS: publish
NATS-->>BW: start accrual
U->>API: POST /terminal-token
API-->>U: token + ws_url
U->>API: WSS upgrade
API->>VM: relay (via terminal-gateway)
Note over U,VM: tenant works
U->>API: POST /release
API->>PW: dispatch slice.vm_release
NA->>VM: shutdown + destroy + wipe
PW->>DB: status=released
OR->>NATS: provisioning.releasing.completed
NATS-->>BW: stop accrual
Common gotchas¶
| Symptom | Reason | Fix |
|---|---|---|
sku_unavailable on create |
No node has enough free slots in your region | Different SKU, different region, or wait |
insufficient_balance on create |
Account balance < expected hourly cost | Top up via gpuaas billing topup or Stripe portal |
Allocation stuck in requested for >2 min |
Worker pile-up or node-agent disconnected | Check gpuaas allocations get --watch; see Provisioning Workflow Stuck Runbook |
Allocation failed immediately |
Image / VFIO / readiness issue | Use gpuaas allocations get $ID --output json — look at failure_reason |
| Terminal WS closes immediately | Token expired or single-use already validated | Mint a fresh one |
| 429 rate_limit_exceeded | Per-user RPM cap | Honor Retry-After; reduce concurrent calls |
Where to look next¶
- CLI — full command coverage
- Python SDK — typed surface
- Direct REST API — patterns for any language
- App SDK — build apps that run on GPUaaS
- Slice trail — the 14-step deep dive into what happens behind
allocations create