Skip to content

Direct REST API

Contract

Source: doc/api/openapi.draft.yaml (33,132 lines) · doc/api/asyncapi.draft.yaml (2,296 lines) · packages/shared/errors

When you don't have a language SDK, talk to the REST API directly. Same authentication model, same error envelope. This page shows the minimum patterns.

Authentication overview

flowchart TB
    Q{Auth mode}
    Q --> A1[Human user — browser OIDC PKCE]
    Q --> A2[Dev / local — password grant]
    Q --> A3[Automation — service account signed assertion]

    A1 --> P1[Best for users + their own scripts]
    A2 --> P2[Local dev only; never production]
    A3 --> P3[CI / agents / cron]

    classDef ok fill:#d1e7dd,stroke:#0a3622
    class P1,P2,P3 ok

Dev login (fastest for testing locally)

TOKEN=$(curl -s -X POST http://localhost:8080/realms/gpuaas/protocol/openid-connect/token \
  -d "grant_type=password" \
  -d "client_id=gpuaas-api" \
  -d "client_secret=dev-client-secret" \
  -d "username=dev-user" \
  -d "password=dev123" | jq -r .access_token)

curl -sH "Authorization: Bearer $TOKEN" http://localhost:8443/api/v1/me | jq .

For dev users: dev-user / dev123 (role: user) · dev-admin / admin123 (role: user + admin).

Browser OIDC PKCE (production human flow)

sequenceDiagram
    autonumber
    participant C as Your client
    participant LH as localhost callback
    participant BR as Browser
    participant API as cmd/api
    participant IDP as IdP

    C->>C: generate verifier + S256 challenge
    C->>LH: start ephemeral callback listener
    C->>API: GET /auth/oidc/authorize?redirect_uri&code_challenge&method=S256
    API-->>C: {authorize_url, state}
    C->>BR: open authorize_url
    BR->>IDP: user logs in
    IDP-->>BR: redirect to localhost with code+state
    BR->>LH: GET /callback?code&state
    LH->>C: capture code, validate state
    C->>API: POST /auth/oidc/exchange {code, verifier, redirect_uri}
    API-->>C: access_token + refresh_token + exp
    C->>LH: stop listener

Endpoints: GET /api/v1/auth/oidc/authorize, POST /api/v1/auth/oidc/exchange.

Service-account flow (automation)

# 1. admin pre-provisions the SA and gives you (sa_id, key_id, signing_key.pem)
# 2. assert + exchange every TTL (default 900s)

ASSERTION=$(python3 -c "
import jwt, time
payload = {
  'iss': 'sa-1234', 'sub': 'sa-1234',
  'aud': 'gpuaas-api',
  'iat': int(time.time()),
  'exp': int(time.time()) + 60,
  'kid': 'key-2024-01'
}
key = open('/etc/gpuaas/sa.key').read()
print(jwt.encode(payload, key, algorithm='RS256', headers={'kid': 'key-2024-01'}))
")

TOKEN=$(curl -s -X POST https://api.gpuaas.example.com/api/v1/auth/sa/token \
  -H 'Content-Type: application/json' \
  -d "{\"sa_id\":\"sa-1234\",\"assertion\":\"$ASSERTION\"}" \
  | jq -r .access_token)

Request shape

# Standard authenticated GET
curl -s "https://api.gpuaas.example.com/api/v1/catalog" \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Correlation-Id: my-trace-001"

# Idempotent mutation
curl -s -X POST "https://api.gpuaas.example.com/api/v1/allocations" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "X-Idempotency-Key: my-run-001" \
  -d '{
    "sku": "h200-sxm-slice",
    "gpus_total": 1,
    "region_code": "us-buffalo-1",
    "ssh_key_ids": ["7f2e..."]
  }'

Required / important headers:

Header When Why
Authorization: Bearer <token> Every authenticated request JWT verified against cached JWKS
X-Correlation-Id Optional Carries through audit + traces + logs; useful for debugging
X-Idempotency-Key All mutations Safe retries; exception: terminal token mint (single-use)
Content-Type: application/json Mutations with body Standard
Stripe-Signature Stripe webhook Verified on raw body before parse

Error envelope

Every error response follows this shape:

{
  "code": "<catalog_code>",
  "message": "Human-readable text",
  "correlation_id": "uuid-of-this-request",
  "details": { /* required for validation_error */ }
}
flowchart LR
    R[REST call] --> ERR{Status code}
    ERR -- 200/201/202/204 --> OK[Success body]
    ERR -- 400/401/403/404/409/429 --> CLIENT[Client error<br/>code from catalog]
    ERR -- 500/502/503/504 --> SERVER[Server error<br/>code = internal_error<br/>or upstream_error]

    CLIENT --> CAT[Match against<br/>doc/architecture/Error_Code_Catalog.md]
    SERVER --> CAT

    classDef ok fill:#d1e7dd,stroke:#0a3622
    classDef warn fill:#fff3cd,stroke:#332701
    classDef bad fill:#f8d7da,stroke:#42101e
    class OK ok
    class CLIENT warn
    class SERVER bad

→ Full catalog: Error codes

Rate limit headers

429 responses include:

Header Meaning
X-RateLimit-Limit Configured limit in this window
X-RateLimit-Remaining Calls left in the current window
Retry-After Seconds to wait before retrying
while true; do
  resp=$(curl -sw "\n%{http_code}" "$URL" -H "Authorization: Bearer $TOKEN")
  body=$(echo "$resp" | head -n -1)
  code=$(echo "$resp" | tail -n 1)
  if [ "$code" = "429" ]; then
    delay=$(curl -sI "$URL" -H "Authorization: Bearer $TOKEN" | awk '/^Retry-After:/{print $2}' | tr -d '\r')
    sleep "${delay:-5}"
  else
    echo "$body"
    break
  fi
done

Provisioning flow (end-to-end via curl)

sequenceDiagram
    autonumber
    participant U as your script
    participant API as cmd/api

    U->>API: POST /api/v1/allocations<br/>{sku, gpus_total, region, ssh_key_ids}<br/>X-Idempotency-Key
    API-->>U: 201 {id, status=requested}
    loop poll
        U->>API: GET /api/v1/allocations/{id}
        API-->>U: {status, connection?}
    end
    Note over U,API: status=active → ready
    U->>API: POST /api/v1/allocations/{id}/terminal-token
    API-->>U: {token, ws_url, expires_in: 300}
    U->>API: connect WS to ws_url<br/>Sec-WebSocket-Protocol: token
    Note over U,API: terminal session
    U->>API: POST /api/v1/allocations/{id}/release<br/>X-Idempotency-Key
    API-->>U: 202 {status=releasing}

WebSocket terminal (browser-style)

The auth token rides in the WebSocket subprotocol, never in the URL:

const ws = new WebSocket(tokenResponse.ws_url, [tokenResponse.token]);
// ↑ second arg = Sec-WebSocket-Protocol — that's how auth is delivered
ws.onmessage = (e) => xterm.write(e.data);
xterm.onData((d) => ws.send(d));

Server-side rules:

  • No ?token= in URL — would leak through logs/proxies.
  • Token is single-use, 300s TTL, deleted on first validation.
  • Rate-limited per user via rate_limit.terminal_token_requests_per_minute (default 10).

Stripe webhook (raw-body-first)

Receiving end (if you build something webhook-shaped):

# Pseudo-code for the inbound side of a Stripe-style webhook on YOUR system.
@app.route("/webhook", methods=["POST"])
def webhook():
    raw_body = request.get_data()  # capture BEFORE any JSON parse
    signature = request.headers["Stripe-Signature"]
    # verify on EXACT bytes
    if not stripe.Webhook.verify_signature(raw_body, signature, secret):
        return ("bad signature", 400)
    event = json.loads(raw_body)
    # dedupe by event["id"] before any side effect
    ...

This is what cmd/api's /api/v1/payments/webhook does — Coding_Standards.md §7.

NATS / event subscription (server-to-server)

If you operate a downstream system that needs platform events, subscribe via the AsyncAPI contract (provisioning.*, billing.*, payments.*):

flowchart LR
    PROD[GPUaaS outbox-relay] --> NATS[(NATS JetStream)]
    NATS --> YC[Your consumer]
    YC --> YC2[Process envelope:<br/>event_id, event_type, occurred_at,<br/>version, correlation_id, payload]
    YC2 --> YC3[Dedupe by event_id<br/>idempotent handler]

    classDef plat fill:#e3f2fd,stroke:#1565c0
    classDef yours fill:#e8f5e9,stroke:#2e7d32
    class PROD,NATS plat
    class YC,YC2,YC3 yours

→ Full subject + payload reference: NATS subjects. Live spec render: AsyncAPI explorer.

Live API explorer

The full REST API is rendered with Swagger UI at:

Where to look next