CLI Agent-Operable Control Plane v2¶

As of: March 30, 2026

Purpose¶

Define the next direction for the GPUaaS CLI after the v1 baseline.

This document answers three questions: 1. What should a modern GPUaaS CLI optimize for? 2. How should the CLI evolve for agent use, not only human shell use? 3. How do we expose platform workflows safely to operators, support tooling, and future app-building agents?

Decision Summary¶

GPUaaS CLI should evolve into an agent-operable control-plane client, not remain a thin REST wrapper.
The primary mode remains curated workflows, not raw API-first command generation.
Machine-readable output, deterministic errors, and explicit context must be first-class for every command.
Debug/ops capability may be exposed through the same client model, but only through role-gated surfaces.
The same control-plane client model should be usable by:
humans at a terminal,
automation pipelines,
MCP/skill/LLM integrations,
app-team operators using the App Control Plane.

Current Implementation Status¶

The direction in this document is no longer hypothetical only. Current implemented baseline now includes: 1. curated CLI workflow groups for auth, projects, allocations, billing, apps, service accounts, IAM, ops, storage, and context 2. explicit tenant-shared app runtime workflows under: - gpuaas apps shared-runtimes - gpuaas apps shared-runtimes attachments - gpuaas apps shared-runtimes workers - gpuaas apps shared-runtimes worker-operations 3. introspection commands: - gpuaas schema <resource> backed by OpenAPI - gpuaas explain <command> with JSON-capable command metadata 4. MCP transport baseline: - gpuaas mcp list-tools - gpuaas mcp serve over stdio 5. SDK parity for the same shared-runtime control-plane surface in: - Go SDK - Python SDK

Remaining agent-operability gaps are now mostly about richer typed MCP tool contracts and deeper command metadata, not missing tenant-shared app control-plane coverage.

Why This Direction¶

CLI expectations have changed: 1. Users expect workflows, not endpoint memorization. 2. Automation expects structured output and stable error contracts. 3. Agent systems expect introspection, deterministic behavior, and scoped capability exposure.

GPUaaS is in a strong position to adopt this because: 1. API contracts are already contract-first. 2. Canonical error envelopes and correlation IDs are already part of the platform. 3. Project-scoped and service-account-scoped operations already exist. 4. App Control Plane and service-account model already imply non-human operators as first-class actors.

Non-Goals¶

Do not turn GPUaaS CLI into a fully dynamic discovery-only shell.
Do not expose unrestricted debug or raw internal API surfaces to general users.
Do not make CLI behavior depend on hidden server defaults for project/tenant context.
Do not create a separate privileged agent-only control plane outside the normal IAM/policy model.

Core Design Principle¶

Use a hybrid model: 1. Curated workflow commands for high-value user and operator tasks. 2. Introspection and generic control-plane access for advanced automation and agent use.

This is better than a fully dynamic CLI for GPUaaS because the platform is workflow-heavy: 1. allocations have lifecycle semantics 2. billing has reconciliation semantics 3. apps have entitlement and runtime lifecycle semantics 4. nodes have lifecycle and support implications 5. IAM has scope and grant-ceiling semantics

These are better expressed as curated commands than as direct endpoint mirrors.

Command Model¶

1. Workflow-first command groups¶

These should remain the primary UX: 1. auth 2. projects 3. allocations 4. billing 5. apps 6. nodes 7. iam 8. service-accounts

Each command group should map to user intent, not HTTP nouns.

Examples: 1. gpuaas allocations create 2. gpuaas apps deploy 3. gpuaas apps instances rollback 4. gpuaas apps artifacts publish-intent 5. gpuaas apps artifacts register 6. gpuaas apps artifacts promote 4. gpuaas iam members add 5. gpuaas service-accounts token mint

2. Introspection and advanced automation layer¶

These are for agents, platform engineers, and advanced users: 1. gpuaas schema <resource> 2. gpuaas explain <command> 3. gpuaas api get|post|put|delete ... 4. gpuaas context show 5. gpuaas auth whoami --json

Rules: 1. api commands are secondary and explicit. 2. Raw API mode must still preserve canonical error output. 3. Introspection must expose enough information for agents to use the CLI without source-code scraping.

Human vs Agent UX Rules¶

Human mode¶

Optimize for: 1. readable default output 2. safe prompts/confirmations for destructive actions 3. clear next-step guidance 4. opinionated workflow verbs

Agent mode¶

Optimize for: 1. JSON-first output 2. no interactive prompts unless explicitly requested 3. stable exit codes 4. canonical error envelopes 5. explicit resource and context identity

Required flags/behavior: 1. --output json 2. --no-input 3. deterministic stderr/stdout contract 4. correlation_id surfaced on every failure

Role-Activated Capability Model¶

The CLI should expose more than end-user workflows, but only according to role and endpoint policy.

General rule¶

Capability activation is policy/IAM-gated, not client-gated.

The CLI may render or enable commands based on effective identity, but enforcement remains server-side.

Example capability tiers¶

project member / project admin
app, allocation, billing, project-scoped workflows
tenant admin
tenant membership, project administration, federation setup
platform ops
observability triage, support-oriented read paths, runbook lookup
platform superadmin
break-glass platform controls
service account
project-scoped automation only

Service-account token rules must stay explicit in the CLI surface: 1. token mint remains bounded by server-side TTL policy, 2. CLI must expose the effective expiry returned by the API, 3. any future scoped-down token request must be an explicit server contract, not a client-side convention.

Ops/debug extension¶

The same client model can expose support/debug workflows such as: 1. gpuaas ops incident lookup --correlation-id <id> 2. gpuaas ops trace open --trace-id <id> 3. gpuaas ops runbook show <runbook_id> 4. gpuaas ops app-instance inspect <id> 5. gpuaas ops fleet health 6. gpuaas ops node metrics <node_id>

Rules: 1. These must be implemented using public/admin contracts, not undocumented DB shortcuts. 2. Availability depends on role. 3. Output must remain safe for logs and copy/paste into tickets.

MCP / Skill / LLM Integration Direction¶

The CLI should become the stable execution substrate for MCP/skill/LLM integrations.

Why¶

It centralizes auth, context handling, idempotency, and output normalization.
It prevents each agent integration from re-implementing control-plane semantics.
It makes support/debug actions auditable through the same platform paths.

Required properties¶

every command has a stable JSON mode
every error preserves code, message, correlation_id
commands can run non-interactively
output includes explicit identifiers:
org_id
project_id
resource_name
trace_id where available

Recommended pattern¶

agents call curated CLI workflows first
agents fall back to introspection/generic API mode only when no curated workflow exists
privileged ops commands are exposed only when caller role authorizes them

App Control Plane Implications¶

Another agent instructed by an app team should be able to build and operate software on GPUaaS through the App Control Plane.

That means the CLI model must support: 1. app catalog discovery 2. entitlement-aware deployment workflows 3. app instance lifecycle operations 4. service-account-centered automation 5. project boundary enforcement 6. artifact publication and promotion

Required app-team workflows¶

gpuaas apps catalog list
gpuaas apps deploy
gpuaas apps instances list|get|upgrade|rollback|decommission
gpuaas service-accounts create|list|token
gpuaas iam project-members list
gpuaas apps artifacts list
gpuaas apps artifacts publish-intent
gpuaas apps artifacts register
gpuaas apps artifacts verify|revoke
gpuaas apps artifacts promote|deprecate|retire
gpuaas apps shared-runtimes list|get|create|delete
gpuaas apps shared-runtimes attachments list|get|create|delete
gpuaas apps shared-runtimes workers list|get
gpuaas apps shared-runtimes worker-operations list|get|create

Required artifact workflow shape¶

The CLI should treat artifact publication as a curated multi-step workflow, not just raw endpoint calls.

Minimum required commands: 1. gpuaas apps artifacts publish-intent - returns repository or upload path plus credential-delivery metadata 2. gpuaas apps artifacts push-oci - optional helper that consumes a wrapped credential and performs the registry push 3. gpuaas apps artifacts register - registers the immutable digest or blob source URI with the control plane 4. gpuaas apps artifacts promote - requires verified trust state and explicit channel/environment

Rules: 1. digest or canonical source URI must remain explicit in command output, 2. wrapped secrets must never be echoed in human-readable default output, 3. JSON mode must preserve the full publish-intent response for agent use, 4. the CLI must not invent hidden side-channel artifact flows outside the App Control Plane.

Required invariants¶

app-building agents must use the same contracts as internal teams
no hidden internal-app privilege path
app-team automation must be service-account compatible
app lifecycle failures must remain correlation-first and supportable

Data Contract Requirements for CLI v2¶

To make the CLI truly agent-operable, future commands should prefer envelopes with: 1. explicit scope metadata 2. stable pagination fields 3. operation status and async lifecycle hints 4. correlation and trace references where meaningful

Preferred command response shape: 1. primary resource/object payload 2. optional meta 3. optional links 4. optional next_actions

This does not replace current API contracts immediately. It is a direction for future CLI-facing surfaces.

What We Should Explicitly Not Do¶

Do not build a fully dynamic Google-discovery-style CLI as the primary interface.
Do not expose internal-only debugging primitives directly in end-user command groups.
Do not let agent workflows bypass IAM by running with hidden shared credentials.
Do not duplicate app-operator APIs in a separate CLI-only backend.

Proposed Next Iteration¶

After CLI v1 baseline, the next CLI work should be: 1. enrich mcp serve from CLI-proxy tool execution into richer typed tool metadata and safer structured tool arguments 2. expose more command metadata in explain --output json, including examples and required-flag hints 3. add shared-runtime contribution/operator workflows where API coverage exists 4. add full apps artifacts workflow coverage, including publish intent and promotion 5. continue role-gated ops/debug workflow expansion

doc/architecture/CLI_PythonSDK_v1_Plan.md
doc/architecture/App_Control_Plane_v1.md
doc/architecture/Build_an_App_for_GPUaaS_v1.md
doc/architecture/Scheduler_as_Platform_App_v1.md
doc/architecture/Service_Account_Model.md