App Developer Starter Pack v1¶

As of: March 30, 2026

Purpose¶

Provide one entrypoint for an app developer who wants to build against the GPUaaS App Platform without reverse-engineering the rest of the repo.

This document answers: 1. what to read first, 2. which APIs are the source of truth, 3. how IAM and resource hierarchy work, 4. what SDK and CLI support exists today, 5. what is implemented now versus still directional.

This is the shortest document set that should be handed to: - an internal app team, - an external platform-app team, - an agent building or operating an app on behalf of a team.

Start Here¶

Read these in order:

API contract:
doc/api/openapi.draft.yaml
doc/api/asyncapi.draft.yaml
build path:
doc/architecture/Build_an_App_for_GPUaaS_v1.md
external integration boundary:
doc/architecture/External_App_Team_Integration_Guide_v1.md
app control plane:
doc/architecture/App_Control_Plane_v1.md
app worker contract direction:
doc/architecture/App_Runtime_External_Worker_Contract_v1.md
quickstart:
doc/architecture/App_Platform_Quickstart_v1.md
UI integration:
doc/architecture/App_UI_Extension_Model_v1.md
manifest and version onboarding:
doc/architecture/App_Manifest_Registration_Guide_v1.md

If you are building a clustered or scheduler-style app, also read: - doc/architecture/Example_App_Developer_Reference_Workflow_v1.md - doc/architecture/Slurm_Tenant_Scope_Semantics_v1.md - doc/architecture/App_Tenant_Shared_Attachment_Model_v1.md

What The Platform Gives You¶

GPUaaS gives an app developer: - identity and IAM - project and tenant resource hierarchy - app catalog and entitlement surfaces - app instance and shared runtime resource models - allocation and placement primitives - service-account and delegated machine identity - access-credential custody and delivery - app-managed bootstrap SSH trust reconcile for app-instance-bound node bootstrap - billing attribution primitives - audit and correlation surfaces - CLI and SDK clients over the same public API

The platform does not expect the app developer to: - access the database directly - patch platform-core code for every new runtime - rely on undocumented routes - build against internal Go package behavior as the contract - require operators to edit authorized_keys manually as part of the normal app bootstrap path

Platform Mental Model¶

GPUaaS is a control plane for infrastructure and platform apps.

That means: - the platform owns identity, IAM, resource ownership, allocation lifecycle, billing attribution, audit, and common UX shells - the app developer owns runtime-specific controller logic, runtime-specific bootstrap, runtime-specific health, and runtime-specific operational behavior

The easiest way to think about GPUaaS is:

Infrastructure control plane
  -> capacity, allocations, identity, billing, audit

App control plane
  -> catalog, entitlements, app instances, shared runtimes, operations

App-owned runtime logic
  -> install, configure, bootstrap, reconcile, recover, report

The app platform is not asking the app developer to invent tenancy, auth, billing, or secure credential delivery. It is asking the app developer to implement runtime intelligence on top of those primitives.

Resource Hierarchy¶

The core ownership hierarchy is:

Organization (tenant)
  -> Project
    -> Users and memberships
    -> Service accounts
    -> Project-owned app instances

For tenant-shared runtimes, the hierarchy extends to:

Organization (tenant)
  -> Shared app runtime
    -> Shared runtime attachments
      -> Attached consumer projects
    -> Shared workers
    -> Shared worker operations

Read: - doc/architecture/App_Control_Plane_v1.md - doc/architecture/App_Tenant_Shared_Attachment_Model_v1.md - doc/architecture/App_Tenant_Shared_Runtime_API_Direction_v1.md

Implementation Model¶

An app on GPUaaS is not just a UI card in a catalog.

The implementation model has four layers:

1. Catalog layer¶

Defines: - app slug - published versions - entitlement rules - optional version metadata and artifact references

2. Control-plane resource layer¶

Defines operator-facing resources such as: - project app instances - tenant-shared runtimes - attachments - workers - operations

These resources are what humans, agents, SDKs, and the platform shell interact with.

3. Worker/operator layer¶

This is app-owned runtime logic.

The worker/operator: - reads runtime state from public APIs - consumes placement and credentials - reconciles any app-owned bootstrap trust it needs onto the selected node user through the supported platform path - bootstraps or reconfigures the runtime - reports progress and failure back through public APIs

4. Runtime/data-plane layer¶

This is the actual software the app team cares about: - Slurm - Ray - MLflow - model gateways - other distributed runtimes

GPUaaS does not want to absorb that runtime-specific SME logic into the platform core unless it proves to be a reusable primitive.

Three-Axis Runtime Model¶

Every serious app team needs to understand these three fields because they describe how the runtime is actually deployed and governed.

1. `operating_mode`¶

This says who operates the service shape.

Current values: - tenant_dedicated - platform_managed

Meaning: - tenant_dedicated: the runtime is tenant-owned or tenant-isolated - platform_managed: future shared platform-operated service model

2. `control_plane_scope`¶

This says where the runtime control plane lives.

Current values: - project - tenant - platform

Meaning: - project: one project owns and operates its own runtime - tenant: one tenant-owned runtime may serve multiple attached projects - platform: future shared platform-operated runtime

3. `tenant_boundary_mode`¶

This says what isolation guarantee the runtime is expected to provide.

Current values are documented in the API/model docs and should be read back from the effective resource shape.

This field exists because "tenant scope" and "shared substrate" are not the same thing.

Product-Facing Placement And Ownership Modes¶

For practical conversation with app teams, these combinations are the useful shorthand:

Project-scoped mode¶

Usually means: - operating_mode = tenant_dedicated - control_plane_scope = project

Use when: - each project wants its own isolated runtime - cross-project sharing is not wanted - billing and ownership should stay simple

Tenant-owned shared mode¶

Usually means: - operating_mode = tenant_dedicated - control_plane_scope = tenant

Use when: - one tenant-owned control plane should serve multiple projects - sharing is explicit and policy-controlled - worker contribution or job submission may come from attached projects

Platform-managed shared mode¶

Usually means: - operating_mode = platform_managed - control_plane_scope = platform

Use when: - the runtime is eventually offered as a platform-operated shared service

This is modeled directionally but should not be treated as the default shipped path for new apps yet.

IAM And Machine Identity¶

App developers need two identity models today:

1. Project-scoped service account¶

Use for: - project-owned app instances - project-scoped automation - project-scoped access-credential delivery

Read: - doc/architecture/Service_Account_Model.md - doc/architecture/Role_and_Policy_Lifecycle_Model.md

2. Tenant-shared runtime operator identity¶

Use for: - tenant-owned shared runtimes - shared runtime read/report flows - shared worker and attachment reporting

Read: - doc/architecture/Tenant_Scoped_App_Machine_Identity_v1.md - doc/architecture/Shared_Runtime_Operator_Authz_Model_v1.md

API Surfaces App Developers Should Use¶

The source of truth is always: - doc/api/openapi.draft.yaml - doc/api/asyncapi.draft.yaml

The main API families relevant to app developers are:

Catalog and entitlement¶

GET /api/v1/apps/catalog
GET /api/v1/apps/catalog/{app_slug}/versions
GET /api/v1/projects/{project_id}/apps/entitlements
PUT /api/v1/projects/{project_id}/apps/entitlements/{app_slug}

Project-owned app instances¶

GET /api/v1/projects/{project_id}/app-instances
POST /api/v1/projects/{project_id}/app-instances
GET /api/v1/projects/{project_id}/app-instances/{app_instance_id}
DELETE /api/v1/projects/{project_id}/app-instances/{app_instance_id}
POST /api/v1/projects/{project_id}/app-instances/{app_instance_id}/upgrade
POST /api/v1/projects/{project_id}/app-instances/{app_instance_id}/rollback
POST /api/v1/projects/{project_id}/app-instances/{app_instance_id}/decommission

Generic clustered app member flows¶

GET /api/v1/projects/{project_id}/app-instances/{app_instance_id}/members
GET /api/v1/projects/{project_id}/app-instances/{app_instance_id}/members/{member_id}
POST /api/v1/projects/{project_id}/app-instances/{app_instance_id}/member-operations
GET /api/v1/projects/{project_id}/app-instances/{app_instance_id}/member-operations/{operation_id}

Tenant-shared runtimes¶

GET /api/v1/orgs/{org_id}/shared-app-runtimes
POST /api/v1/orgs/{org_id}/shared-app-runtimes
GET /api/v1/orgs/{org_id}/shared-app-runtimes/{shared_runtime_id}
DELETE /api/v1/orgs/{org_id}/shared-app-runtimes/{shared_runtime_id}

Tenant-shared attachments¶

GET /api/v1/orgs/{org_id}/shared-app-runtimes/{shared_runtime_id}/attachments
POST /api/v1/orgs/{org_id}/shared-app-runtimes/{shared_runtime_id}/attachments
GET /api/v1/orgs/{org_id}/shared-app-runtimes/{shared_runtime_id}/attachments/{attachment_id}
DELETE /api/v1/orgs/{org_id}/shared-app-runtimes/{shared_runtime_id}/attachments/{attachment_id}

Tenant-shared workers and operations¶

GET /api/v1/orgs/{org_id}/shared-app-runtimes/{shared_runtime_id}/workers
GET /api/v1/orgs/{org_id}/shared-app-runtimes/{shared_runtime_id}/workers/{worker_id}
POST /api/v1/orgs/{org_id}/shared-app-runtimes/{shared_runtime_id}/worker-operations
GET /api/v1/orgs/{org_id}/shared-app-runtimes/{shared_runtime_id}/worker-operations/{operation_id}

Project-contributed worker flow¶

POST /api/v1/projects/{project_id}/shared-runtime-attachments/{attachment_id}/worker-operations

Reporting and credential/placement support¶

service-account token mint
shared-runtime operator token mint
allocation reads
access-credential delivery

The exact request and response bodies live in OpenAPI and remain authoritative.

SDK And CLI Support¶

Current operator-facing tooling is real, not placeholder-only.

Go SDK¶

Use: - pkg/sdk

Current relevant support includes: - shared runtimes - attachments - shared workers - shared worker operations

Example: - pkg/sdk/shared_runtimes.go

Python SDK¶

Use: - packages/python-sdk

Current relevant support includes: - catalog - allocations - billing - terminal token minting - shared runtimes - attachments - shared workers - shared worker operations

Read: - packages/python-sdk/README.md

CLI¶

Use: - cmd/gpuaas-cli

Current relevant support includes: - gpuaas apps shared-runtimes ... - gpuaas schema <resource> - gpuaas explain <command> - gpuaas mcp serve

Read: - doc/architecture/CLI_Agent_Operable_Control_Plane_v2.md - doc/architecture/CLI_PythonSDK_v1_Plan.md

Implemented Now vs Still Directional¶

Implemented now¶

app catalog and entitlement APIs
project app instance lifecycle APIs
tenant-shared runtime, attachment, worker, and worker-operation APIs
service-account auth for project automation
shared-runtime operator auth for tenant-shared runtime automation
allocation read APIs for placement and bootstrap
access-credential delivery APIs
app shell extension seam
Go SDK and Python SDK coverage for shared runtimes
CLI coverage for shared runtimes and introspection

Still directional or incomplete¶

fully externalized app-worker delivery model
manifest-based third-party app registration flow
schema-backed app manifest validation and deploy-form generation as the primary onboarding path
final external app-worker packaging story
final public transport choice for app-worker delivery if NATS is exposed directly

Minimum Package To Hand To An App Team Today¶

If you had to hand an app team only one short package today, it should be:

doc/api/openapi.draft.yaml
doc/api/asyncapi.draft.yaml
doc/architecture/App_Developer_Starter_Pack_v1.md
doc/architecture/Build_an_App_for_GPUaaS_v1.md
doc/architecture/External_App_Team_Integration_Guide_v1.md
doc/architecture/App_Runtime_External_Worker_Contract_v1.md
packages/python-sdk/README.md
doc/architecture/App_Manifest_Registration_Guide_v1.md

For UI-heavy apps, also include: 9. doc/architecture/App_UI_Extension_Model_v1.md

Dry-Run Questions This Package Should Answer¶

An app developer should be able to answer these from the docs:

How do I authenticate my app worker?
Which API family do I use for project-owned versus tenant-shared runtimes?
How do I read placement and allocation data?
How do I get bootstrap credentials securely?
How do I report runtime status and operation outcomes?
Which SDK or CLI surface already exists for these flows?
Which parts are fully implemented versus still architectural direction?

If the package cannot answer one of those clearly, the docs are not ready yet.

Current Recommendation¶

Use this starter pack as the top-level handoff document for app-platform work until a richer external developer portal exists.

The next documentation step after this should be: - the canonical manifest schema and registration API/import contract, - once that contract is explicit enough to support real third-party onboarding.