CLI Agent-Operable Control Plane v2¶
As of: March 30, 2026
Purpose¶
Define the next direction for the GPUaaS CLI after the v1 baseline.
This document answers three questions: 1. What should a modern GPUaaS CLI optimize for? 2. How should the CLI evolve for agent use, not only human shell use? 3. How do we expose platform workflows safely to operators, support tooling, and future app-building agents?
Decision Summary¶
- GPUaaS CLI should evolve into an agent-operable control-plane client, not remain a thin REST wrapper.
- The primary mode remains curated workflows, not raw API-first command generation.
- Machine-readable output, deterministic errors, and explicit context must be first-class for every command.
- Debug/ops capability may be exposed through the same client model, but only through role-gated surfaces.
- The same control-plane client model should be usable by:
- humans at a terminal,
- automation pipelines,
- MCP/skill/LLM integrations,
- app-team operators using the App Control Plane.
Current Implementation Status¶
The direction in this document is no longer hypothetical only. Current implemented baseline now includes:
1. curated CLI workflow groups for auth, projects, allocations, billing, apps, service accounts, IAM, ops, storage, and context
2. explicit tenant-shared app runtime workflows under:
- gpuaas apps shared-runtimes
- gpuaas apps shared-runtimes attachments
- gpuaas apps shared-runtimes workers
- gpuaas apps shared-runtimes worker-operations
3. introspection commands:
- gpuaas schema <resource> backed by OpenAPI
- gpuaas explain <command> with JSON-capable command metadata
4. MCP transport baseline:
- gpuaas mcp list-tools
- gpuaas mcp serve over stdio
5. SDK parity for the same shared-runtime control-plane surface in:
- Go SDK
- Python SDK
Remaining agent-operability gaps are now mostly about richer typed MCP tool contracts and deeper command metadata, not missing tenant-shared app control-plane coverage.
Why This Direction¶
CLI expectations have changed: 1. Users expect workflows, not endpoint memorization. 2. Automation expects structured output and stable error contracts. 3. Agent systems expect introspection, deterministic behavior, and scoped capability exposure.
GPUaaS is in a strong position to adopt this because: 1. API contracts are already contract-first. 2. Canonical error envelopes and correlation IDs are already part of the platform. 3. Project-scoped and service-account-scoped operations already exist. 4. App Control Plane and service-account model already imply non-human operators as first-class actors.
Non-Goals¶
- Do not turn GPUaaS CLI into a fully dynamic discovery-only shell.
- Do not expose unrestricted debug or raw internal API surfaces to general users.
- Do not make CLI behavior depend on hidden server defaults for project/tenant context.
- Do not create a separate privileged agent-only control plane outside the normal IAM/policy model.
Core Design Principle¶
Use a hybrid model: 1. Curated workflow commands for high-value user and operator tasks. 2. Introspection and generic control-plane access for advanced automation and agent use.
This is better than a fully dynamic CLI for GPUaaS because the platform is workflow-heavy: 1. allocations have lifecycle semantics 2. billing has reconciliation semantics 3. apps have entitlement and runtime lifecycle semantics 4. nodes have lifecycle and support implications 5. IAM has scope and grant-ceiling semantics
These are better expressed as curated commands than as direct endpoint mirrors.
Command Model¶
1. Workflow-first command groups¶
These should remain the primary UX:
1. auth
2. projects
3. allocations
4. billing
5. apps
6. nodes
7. iam
8. service-accounts
Each command group should map to user intent, not HTTP nouns.
Examples:
1. gpuaas allocations create
2. gpuaas apps deploy
3. gpuaas apps instances rollback
4. gpuaas apps artifacts publish-intent
5. gpuaas apps artifacts register
6. gpuaas apps artifacts promote
4. gpuaas iam members add
5. gpuaas service-accounts token mint
2. Introspection and advanced automation layer¶
These are for agents, platform engineers, and advanced users:
1. gpuaas schema <resource>
2. gpuaas explain <command>
3. gpuaas api get|post|put|delete ...
4. gpuaas context show
5. gpuaas auth whoami --json
Rules:
1. api commands are secondary and explicit.
2. Raw API mode must still preserve canonical error output.
3. Introspection must expose enough information for agents to use the CLI without source-code scraping.
Human vs Agent UX Rules¶
Human mode¶
Optimize for: 1. readable default output 2. safe prompts/confirmations for destructive actions 3. clear next-step guidance 4. opinionated workflow verbs
Agent mode¶
Optimize for: 1. JSON-first output 2. no interactive prompts unless explicitly requested 3. stable exit codes 4. canonical error envelopes 5. explicit resource and context identity
Required flags/behavior:
1. --output json
2. --no-input
3. deterministic stderr/stdout contract
4. correlation_id surfaced on every failure
Role-Activated Capability Model¶
The CLI should expose more than end-user workflows, but only according to role and endpoint policy.
General rule¶
Capability activation is policy/IAM-gated, not client-gated.
The CLI may render or enable commands based on effective identity, but enforcement remains server-side.
Example capability tiers¶
project member/project admin- app, allocation, billing, project-scoped workflows
tenant admin- tenant membership, project administration, federation setup
platform ops- observability triage, support-oriented read paths, runbook lookup
platform superadmin- break-glass platform controls
service account- project-scoped automation only
Service-account token rules must stay explicit in the CLI surface: 1. token mint remains bounded by server-side TTL policy, 2. CLI must expose the effective expiry returned by the API, 3. any future scoped-down token request must be an explicit server contract, not a client-side convention.
Ops/debug extension¶
The same client model can expose support/debug workflows such as:
1. gpuaas ops incident lookup --correlation-id <id>
2. gpuaas ops trace open --trace-id <id>
3. gpuaas ops runbook show <runbook_id>
4. gpuaas ops app-instance inspect <id>
5. gpuaas ops fleet health
6. gpuaas ops node metrics <node_id>
Rules: 1. These must be implemented using public/admin contracts, not undocumented DB shortcuts. 2. Availability depends on role. 3. Output must remain safe for logs and copy/paste into tickets.
MCP / Skill / LLM Integration Direction¶
The CLI should become the stable execution substrate for MCP/skill/LLM integrations.
Why¶
- It centralizes auth, context handling, idempotency, and output normalization.
- It prevents each agent integration from re-implementing control-plane semantics.
- It makes support/debug actions auditable through the same platform paths.
Required properties¶
- every command has a stable JSON mode
- every error preserves
code,message,correlation_id - commands can run non-interactively
- output includes explicit identifiers:
org_idproject_idresource_nametrace_idwhere available
Recommended pattern¶
- agents call curated CLI workflows first
- agents fall back to introspection/generic API mode only when no curated workflow exists
- privileged ops commands are exposed only when caller role authorizes them
App Control Plane Implications¶
Another agent instructed by an app team should be able to build and operate software on GPUaaS through the App Control Plane.
That means the CLI model must support: 1. app catalog discovery 2. entitlement-aware deployment workflows 3. app instance lifecycle operations 4. service-account-centered automation 5. project boundary enforcement 6. artifact publication and promotion
Required app-team workflows¶
gpuaas apps catalog listgpuaas apps deploygpuaas apps instances list|get|upgrade|rollback|decommissiongpuaas service-accounts create|list|tokengpuaas iam project-members listgpuaas apps artifacts listgpuaas apps artifacts publish-intentgpuaas apps artifacts registergpuaas apps artifacts verify|revokegpuaas apps artifacts promote|deprecate|retiregpuaas apps shared-runtimes list|get|create|deletegpuaas apps shared-runtimes attachments list|get|create|deletegpuaas apps shared-runtimes workers list|getgpuaas apps shared-runtimes worker-operations list|get|create
Required artifact workflow shape¶
The CLI should treat artifact publication as a curated multi-step workflow, not just raw endpoint calls.
Minimum required commands:
1. gpuaas apps artifacts publish-intent
- returns repository or upload path plus credential-delivery metadata
2. gpuaas apps artifacts push-oci
- optional helper that consumes a wrapped credential and performs the registry push
3. gpuaas apps artifacts register
- registers the immutable digest or blob source URI with the control plane
4. gpuaas apps artifacts promote
- requires verified trust state and explicit channel/environment
Rules: 1. digest or canonical source URI must remain explicit in command output, 2. wrapped secrets must never be echoed in human-readable default output, 3. JSON mode must preserve the full publish-intent response for agent use, 4. the CLI must not invent hidden side-channel artifact flows outside the App Control Plane.
Required invariants¶
- app-building agents must use the same contracts as internal teams
- no hidden internal-app privilege path
- app-team automation must be service-account compatible
- app lifecycle failures must remain correlation-first and supportable
Data Contract Requirements for CLI v2¶
To make the CLI truly agent-operable, future commands should prefer envelopes with: 1. explicit scope metadata 2. stable pagination fields 3. operation status and async lifecycle hints 4. correlation and trace references where meaningful
Preferred command response shape:
1. primary resource/object payload
2. optional meta
3. optional links
4. optional next_actions
This does not replace current API contracts immediately. It is a direction for future CLI-facing surfaces.
What We Should Explicitly Not Do¶
- Do not build a fully dynamic Google-discovery-style CLI as the primary interface.
- Do not expose internal-only debugging primitives directly in end-user command groups.
- Do not let agent workflows bypass IAM by running with hidden shared credentials.
- Do not duplicate app-operator APIs in a separate CLI-only backend.
Proposed Next Iteration¶
After CLI v1 baseline, the next CLI work should be:
1. enrich mcp serve from CLI-proxy tool execution into richer typed tool metadata and safer structured tool arguments
2. expose more command metadata in explain --output json, including examples and required-flag hints
3. add shared-runtime contribution/operator workflows where API coverage exists
4. add full apps artifacts workflow coverage, including publish intent and promotion
5. continue role-gated ops/debug workflow expansion
Related Docs¶
doc/architecture/CLI_PythonSDK_v1_Plan.mddoc/architecture/App_Control_Plane_v1.mddoc/architecture/Build_an_App_for_GPUaaS_v1.mddoc/architecture/Scheduler_as_Platform_App_v1.mddoc/architecture/Service_Account_Model.md