CLI + Python SDK v1 Plan¶
Status: CLI v1 baseline implemented; Python SDK v1 baseline implemented and expanded for app-platform shared runtime workflows
1. Scope¶
This plan covers:
- gpuaas CLI v1
- Go SDK baseline used by the CLI
- Python SDK v1
Out of scope:
- MaaS integration (deferred)
- CLI v2 agent-operable control-plane direction (tracked separately in doc/architecture/CLI_Agent_Operable_Control_Plane_v2.md)
2. Why this order¶
- CLI first:
- fastest path to demonstrate full platform workflows
-
validates contracts, auth, and error handling end-to-end
-
Python SDK second:
- highest utility for AI/ML users
- can reuse validated CLI/API flow decisions
3. CLI v1 command surface (implemented)¶
gpuaas auth login [--provider huggingface|github|google] [--tenant-hint <tenant>] [--identity-hint <email>] [--no-browser] [--base-url <url>]gpuaas auth dev-login --username <u> --password <p> [--base-url <url>]gpuaas auth keycloak-login --username <u> --password <p> [--base-url <api>] [--kc-url <kc>] [--realm <realm>] [--client-id <id>] [--client-secret <secret>]gpuaas auth logoutgpuaas auth whoamigpuaas catalog list [--output table|csv|json] [--no-heading]gpuaas nodes list [--status <status>] [--output table|csv|json] [--no-heading]gpuaas projects list [--output table|csv|json] [--no-heading]gpuaas projects create --name <name> [--slug <slug>]gpuaas projects use --id <project_id>gpuaas allocations list [--status <status>] [--project-id <id>] [--output table|csv|json] [--no-heading]gpuaas allocations create [--scheduler-type bare_metal|slurm|k8s|ray] [--node-id <id>] [--project-id <id>]gpuaas allocations release --id <allocation_id> [--project-id <id>]gpuaas billing balancegpuaas apps shared-runtimes list|get|create|delete --org-id <org_id> ...gpuaas apps shared-runtimes attachments list|get|create|delete --org-id <org_id> --runtime-id <id> ...gpuaas apps shared-runtimes workers list|get --org-id <org_id> --runtime-id <id> ...gpuaas apps shared-runtimes worker-operations list|get|create --org-id <org_id> --runtime-id <id> ...gpuaas schema <resource>gpuaas explain <command>gpuaas mcp list-toolsgpuaas mcp serve
Required behavior:
- every API failure prints code, message, correlation_id
- deterministic non-zero exit codes by error class
- explicit project context flag (--project) plus active-default behavior
4. Python SDK v1 surface¶
Core client modules:
- auth
- catalog
- allocations
- terminal
- billing
- shared_runtimes
Required operations: - list catalog SKUs - create/list/release allocations - request terminal token - read balance - list/get/create/delete shared runtimes - list/get/create/delete shared runtime attachments - list/get shared runtime workers - list/get/create shared runtime worker operations
Status note: 1. Python SDK v1 baseline is now implemented. 2. Shared runtime and shared worker control-plane coverage is now implemented. 3. Remaining work is iterative expansion, not initial delivery.
Contract readiness guarantees for SDK generation:
- paginated list endpoints expose deterministic cursor + page_size parameters and stable envelope shapes
- mutation endpoints that are SDK-critical document Idempotency-Key behavior
- project-scoped endpoints require explicit X-Project-ID (no automatic default on the server side)
- canonical error envelope is preserved (code, message, correlation_id, optional details)
Design requirements:
- generated typed models from OpenAPI
- thin ergonomic wrappers for polling/wait helpers
- exceptions must expose error_code and correlation_id
Current app-platform note: - example apps are still API-first today - app developers should treat the public API as authoritative and the SDK as convenience - app-specific SDK/UI helper layers should only be standardized after the example-app workflow is stable
Companion reference:
- doc/architecture/Example_App_Developer_Reference_Workflow_v1.md
5. Auth model¶
MVP for CLI v1:
- personal flow: POST /api/v1/auth/personal/login for local/dev bootstrap
- OIDC flow: auth keycloak-login obtains Keycloak access/refresh token directly for operator/dev workflows
- no URL/query token transport
- refresh/session renewal uses POST /api/v1/auth/token/refresh semantics (CLI command wiring is a follow-up)
Decision: - device code flow is deferred; current CLI baseline is password-based (personal login + Keycloak token flow) to keep MVP deterministic in local and shared environments.
Future: - API keys / service-account credentials - OIDC device code flow for non-browser production CLI login
6. Observability and support requirements¶
Both CLI and SDK must:
- surface correlation_id for support triage
- preserve server error code values (no string rewriting)
- document standard troubleshooting flow to Loki/Tempo queries
7. Delivery order¶
- A-CLI-001 (backend readiness)
- B-CLI-001 (CLI implementation)
- C-CLI-OPS-001 (runbook/support)
- A-PYSDK-001 (backend + contract readiness)
- B-PYSDK-001 (Python SDK)
- C-PYSDK-OPS-001 (runbook/support)
8. Definition of done (for each delivery slice)¶
- contract-valid behavior (OpenAPI-consistent)
- canonical error envelope retained
- correlation-first troubleshooting path documented
- CI/tests pass for touched package(s)
9. Follow-on Direction¶
After v1, CLI evolution should follow:
- doc/architecture/CLI_Agent_Operable_Control_Plane_v2.md
That direction keeps: 1. curated workflow commands as the primary UX, 2. introspection and machine-readable behavior as first-class for agents, 3. role-gated ops/debug capabilities on the same control-plane client model.