Skip to content

Node Bootstrap Script and Token v1

As of: March 10, 2026

Purpose

Define the first operator-usable bootstrap flow for node-agent installation.

The goal is to stop exposing raw OCI/Vault bootstrap metadata directly to operators and instead provide: 1. a single-use bootstrap token, 2. a control-plane-brokered bootstrap script, 3. one command for manual onboarding, 4. a cloud-init-compatible path for unattended onboarding.

Problem Statement

The current onboarding panel is now structurally correct, but it still exposes low-level bootstrap package fields that are useful for debugging, not for routine operator execution.

Today an operator still has to reason about: 1. OCI package reference, 2. wrapped pull credential, 3. Vault unwrap location, 4. install root and env file paths, 5. CA bundle material.

That is too much operational surface for: 1. manual onboarding, 2. cloud-init, 3. MaaS-delivered bootstrap.

Core Model

Use a dedicated bootstrap token, not a normal service account.

The bootstrap token is: 1. node-bound, 2. short-lived, 3. single-use or tightly bounded use, 4. valid only for bootstrap endpoints, 5. invalid for general API access.

The control plane uses that token to broker: 1. bootstrap package reference, 2. pull credential material, 3. trust bundle, 4. verifier set, 5. install instructions.

The node never talks directly to Vault for first bootstrap.

Identity Separation

Three identities remain distinct:

1. Bootstrap token

  • one-time provisioning credential
  • used only to fetch bootstrap script or bootstrap package material

2. Node certificate identity

  • long-lived runtime identity after enrollment
  • used for poll, renew, task execution, terminal stream

3. Service account identity

  • used for app operators, CI/CD, and platform automation
  • not used for first-hop node bootstrap

Do not collapse bootstrap token into a general service account.

Required Endpoints

The first usable implementation should add a brokered bootstrap-script endpoint.

1. Bootstrap script issuance

POST /api/v1/admin/nodes/{node_id}/bootstrap-script

Purpose: - mint or refresh a node bootstrap token, - resolve the current bootstrap package, - render a ready-to-run shell script for the operator or cloud-init path.

Expected response shape:

{
  "node_id": "uuid",
  "bootstrap_token": "single-use-token",
  "expires_at": "RFC3339",
  "script": "#!/usr/bin/env bash\n...",
  "script_sha256": "hex",
  "mode": "manual|cloud_init",
  "metadata": {
    "trust_bundle_version": "v1",
    "package_ref": "registry.example/platform/node-agent-bootstrap",
    "package_digest": "sha256:..."
  }
}

2. Brokered bootstrap fetch

POST /api/v1/bootstrap/nodes/{bootstrap_token}/script

Purpose: - return the rendered install script using only the scoped bootstrap token.

This endpoint is the one cloud-init or a node-side bootstrap helper should call.

Rules: 1. token must be scoped to one node, 2. token TTL must be short, 3. token must not authorize any non-bootstrap route, 4. token consumption must be auditable.

Script Responsibilities

The bootstrap script should do the following, in order:

  1. create base directories:
  2. /etc/gpuaas
  3. install root

  4. write trust material:

  5. control-plane CA bundle
  6. node-cert CA bundle path stub if needed

  7. write environment file:

  8. node id
  9. direct node API URL
  10. task-signing pubkey
  11. cert/key paths
  12. install paths

  13. obtain the bootstrap OCI package

  14. using control-plane-brokered short-lived pull credentials
  15. by digest, not tag

  16. verify expected digest

  17. unpack/install:

  18. node-agent binary
  19. env template resolution
  20. systemd unit
  21. approved container runtime prerequisite for allocation-local OCI workloads
  22. registry host trust and Docker pull login when registry credentials are configured for the environment

  23. enable/start:

  24. systemctl daemon-reload
  25. systemctl enable --now gpuaas-node-agent

  26. print success/failure guidance

Manual Operator Flow

The operator should see one command, not raw OCI metadata.

Target UX:

curl -fsSL -X POST \
  -H "Authorization: Bearer <admin-token>" \
  https://api.example.com/api/v1/admin/nodes/<node_id>/bootstrap-script?mode=manual \
  | sudo bash

Or, if the script is returned as JSON:

curl -fsSL -X POST \
  -H "Authorization: Bearer <admin-token>" \
  https://api.example.com/api/v1/admin/nodes/<node_id>/bootstrap-script?mode=manual \
  | jq -r .script | sudo bash

The UI should primarily show: 1. copyable bootstrap command, 2. optional download script, 3. advanced/debug details hidden behind a disclosure.

Cloud-Init Flow

Cloud-init should not reconstruct bundle state.

It should only need: 1. bootstrap token, 2. bootstrap-script endpoint, 3. optional stable API URL.

First-hop bootstrap rule

The first bootstrap fetch is expected to be a trust-establishing entrypoint, not a fully trusted runtime call.

Required invariants: 1. manual bootstrap and cloud-init bootstrap use the same first-hop model, 2. the first hop may remain plain HTTP when the node does not yet trust the platform CA or does not yet have the required bootstrap host override, 3. the fetched bootstrap script is responsible for writing: - /etc/gpuaas/ca-bundle.crt - the bootstrap /etc/hosts override derived from NODE_BOOTSTRAP_RESOLVE_ADDRESS 4. only after those trust/bootstrap steps are in place should subsequent package or API fetches require HTTPS trust, 5. changing the first-hop URL shape or transport is an API/bootstrap contract change and must be updated in the bootstrap rendering path, docs, and tests together.

Example shape:

write_files:
  - path: /etc/gpuaas/bootstrap-token
    permissions: "0600"
    content: "<bootstrap-token>"
runcmd:
  - curl -fsSL https://api.example.com/api/v1/bootstrap/nodes/$(cat /etc/gpuaas/bootstrap-token)/script | bash

Or, if MaaS/custom userdata templates are used:

runcmd:
  - /usr/local/bin/gpuaas-node-bootstrap --token "<bootstrap-token>"

Cloud-init should default to installing the systemd unit.

Container Runtime Bootstrap

Launchable OCI workloads require an approved local container runtime on the target node. Until infra owns this in the site cloud-init package, the node-agent bootstrap installer is responsible for a minimal runtime prerequisite:

  1. check for docker, podman, or nerdctl,
  2. if none is present and runtime bootstrap is enabled, install the configured package with apt,
  3. default package: docker.io,
  4. enable/start the installed runtime service when systemd is available.
  5. configure Docker registry trust from /etc/gpuaas/ca-bundle.crt and perform docker login using a bootstrap-token-brokered registry credential when the control plane exposes one.

This runs only during fresh bootstrap or rebootstrap. App deploy must not install host packages as a side effect. Existing nodes need rebootstrap or an infra-managed runtime install before workload.oci_launch can succeed.

The bootstrap script fetches runtime registry credentials from:

GET /api/v1/bootstrap/nodes/{bootstrap_token}/registry-credential

The endpoint is bootstrap-token scoped. It returns 204 when the environment has no registry pull credential configured, and the installer proceeds without a registry login. The script writes credentials only to a temporary root-owned env file and removes that file after installer execution.

The bootstrap renderer passes:

GPUAAS_NODE_AGENT_INSTALL_CONTAINER_RUNTIME=true
GPUAAS_NODE_AGENT_CONTAINER_RUNTIME_PACKAGE=docker.io
GPUAAS_NODE_AGENT_REGISTRY_ENV_FILE=/tmp/gpuaas-registry.env
GPUAAS_NODE_AGENT_REGISTRY_CA_BUNDLE=/etc/gpuaas/ca-bundle.crt

The API process can disable or change this with:

NODE_BOOTSTRAP_INSTALL_CONTAINER_RUNTIME=false
NODE_BOOTSTRAP_CONTAINER_RUNTIME_PACKAGE=podman

Security Requirements

  1. bootstrap token must be node-bound
  2. bootstrap token must have short TTL
  3. bootstrap token must not be reusable indefinitely
  4. bootstrap token must not grant access to general admin/user routes
  5. script must pull OCI package by digest
  6. script must not embed long-lived registry credentials
  7. Vault remains source of truth; node bootstrap obtains registry credentials only through the bootstrap-token-brokered control-plane endpoint, not by calling Vault directly.

Delivery Modes

Manual

  • rendered shell script
  • operator runs with sudo bash

Cloud-Init

  • rendered shell script or helper invocation
  • systemd install is required

MaaS

  • same script/content injected into provider userdata
  • no separate bootstrap semantics

Bootstrap Rendering Ownership

The API owns bootstrap script rendering and bootstrap topology.

UI, cloud-init, and provider userdata integrations may: 1. present the already-owned bootstrap command or metadata in an operator-friendly form, 2. pass through the bootstrap-script endpoint and token, 3. add thin entrypoint wiring for their environment.

They must not: 1. reconstruct package URLs independently, 2. reconstruct CA state or host overrides independently, 3. invent a second bootstrap implementation with different first-hop semantics.

What the UI Should Show

Primary: 1. Run this on the node 2. Download bootstrap script

Secondary: 1. trust bundle version 2. package digest 3. expiry

Hidden under Advanced / debug: 1. raw OCI ref 2. wrapped pull metadata 3. full CA PEM 4. env file content

Immediate Implementation Tasks

  1. A-NODE-BOOTSTRAP-SCRIPT-ENDPOINT-001
  2. add bootstrap-script issuance and brokered token endpoint

  3. B-NODE-ONBOARDING-UX-BOOTSTRAP-COMMAND-001

  4. replace raw onboarding focus with primary bootstrap command UX

  5. A-NODE-BOOTSTRAP-CLOUD-INIT-001

  6. add cloud-init rendering mode using the same bootstrap token and script model
  1. doc/architecture/Node_Agent_OCI_Distribution_v1.md
  2. doc/architecture/Node_Bootstrap_Trust_Delivery_v1.md
  3. doc/architecture/Platform_Vault_Secrets_Baseline_v1.md
  4. doc/operations/runbooks/Node_Onboarding_Runbook.md