Node Bootstrap Script and Token v1¶
As of: March 10, 2026
Purpose¶
Define the first operator-usable bootstrap flow for node-agent installation.
The goal is to stop exposing raw OCI/Vault bootstrap metadata directly to operators and instead provide: 1. a single-use bootstrap token, 2. a control-plane-brokered bootstrap script, 3. one command for manual onboarding, 4. a cloud-init-compatible path for unattended onboarding.
Problem Statement¶
The current onboarding panel is now structurally correct, but it still exposes low-level bootstrap package fields that are useful for debugging, not for routine operator execution.
Today an operator still has to reason about: 1. OCI package reference, 2. wrapped pull credential, 3. Vault unwrap location, 4. install root and env file paths, 5. CA bundle material.
That is too much operational surface for: 1. manual onboarding, 2. cloud-init, 3. MaaS-delivered bootstrap.
Core Model¶
Use a dedicated bootstrap token, not a normal service account.
The bootstrap token is: 1. node-bound, 2. short-lived, 3. single-use or tightly bounded use, 4. valid only for bootstrap endpoints, 5. invalid for general API access.
The control plane uses that token to broker: 1. bootstrap package reference, 2. pull credential material, 3. trust bundle, 4. verifier set, 5. install instructions.
The node never talks directly to Vault for first bootstrap.
Identity Separation¶
Three identities remain distinct:
1. Bootstrap token¶
- one-time provisioning credential
- used only to fetch bootstrap script or bootstrap package material
2. Node certificate identity¶
- long-lived runtime identity after enrollment
- used for poll, renew, task execution, terminal stream
3. Service account identity¶
- used for app operators, CI/CD, and platform automation
- not used for first-hop node bootstrap
Do not collapse bootstrap token into a general service account.
Required Endpoints¶
The first usable implementation should add a brokered bootstrap-script endpoint.
1. Bootstrap script issuance¶
POST /api/v1/admin/nodes/{node_id}/bootstrap-script
Purpose: - mint or refresh a node bootstrap token, - resolve the current bootstrap package, - render a ready-to-run shell script for the operator or cloud-init path.
Expected response shape:
{
"node_id": "uuid",
"bootstrap_token": "single-use-token",
"expires_at": "RFC3339",
"script": "#!/usr/bin/env bash\n...",
"script_sha256": "hex",
"mode": "manual|cloud_init",
"metadata": {
"trust_bundle_version": "v1",
"package_ref": "registry.example/platform/node-agent-bootstrap",
"package_digest": "sha256:..."
}
}
2. Brokered bootstrap fetch¶
POST /api/v1/bootstrap/nodes/{bootstrap_token}/script
Purpose: - return the rendered install script using only the scoped bootstrap token.
This endpoint is the one cloud-init or a node-side bootstrap helper should call.
Rules: 1. token must be scoped to one node, 2. token TTL must be short, 3. token must not authorize any non-bootstrap route, 4. token consumption must be auditable.
Script Responsibilities¶
The bootstrap script should do the following, in order:
- create base directories:
/etc/gpuaas-
install root
-
write trust material:
- control-plane CA bundle
-
node-cert CA bundle path stub if needed
-
write environment file:
- node id
- direct node API URL
- task-signing pubkey
- cert/key paths
-
install paths
-
obtain the bootstrap OCI package
- using control-plane-brokered short-lived pull credentials
-
by digest, not tag
-
verify expected digest
-
unpack/install:
- node-agent binary
- env template resolution
- systemd unit
- approved container runtime prerequisite for allocation-local OCI workloads
-
registry host trust and Docker pull login when registry credentials are configured for the environment
-
enable/start:
systemctl daemon-reload-
systemctl enable --now gpuaas-node-agent -
print success/failure guidance
Manual Operator Flow¶
The operator should see one command, not raw OCI metadata.
Target UX:
curl -fsSL -X POST \
-H "Authorization: Bearer <admin-token>" \
https://api.example.com/api/v1/admin/nodes/<node_id>/bootstrap-script?mode=manual \
| sudo bash
Or, if the script is returned as JSON:
curl -fsSL -X POST \
-H "Authorization: Bearer <admin-token>" \
https://api.example.com/api/v1/admin/nodes/<node_id>/bootstrap-script?mode=manual \
| jq -r .script | sudo bash
The UI should primarily show: 1. copyable bootstrap command, 2. optional download script, 3. advanced/debug details hidden behind a disclosure.
Cloud-Init Flow¶
Cloud-init should not reconstruct bundle state.
It should only need: 1. bootstrap token, 2. bootstrap-script endpoint, 3. optional stable API URL.
First-hop bootstrap rule¶
The first bootstrap fetch is expected to be a trust-establishing entrypoint, not a fully trusted runtime call.
Required invariants:
1. manual bootstrap and cloud-init bootstrap use the same first-hop model,
2. the first hop may remain plain HTTP when the node does not yet trust the platform CA or does not yet have the required bootstrap host override,
3. the fetched bootstrap script is responsible for writing:
- /etc/gpuaas/ca-bundle.crt
- the bootstrap /etc/hosts override derived from NODE_BOOTSTRAP_RESOLVE_ADDRESS
4. only after those trust/bootstrap steps are in place should subsequent package or API fetches require HTTPS trust,
5. changing the first-hop URL shape or transport is an API/bootstrap contract change and must be updated in the bootstrap rendering path, docs, and tests together.
Example shape:
write_files:
- path: /etc/gpuaas/bootstrap-token
permissions: "0600"
content: "<bootstrap-token>"
runcmd:
- curl -fsSL https://api.example.com/api/v1/bootstrap/nodes/$(cat /etc/gpuaas/bootstrap-token)/script | bash
Or, if MaaS/custom userdata templates are used:
Cloud-init should default to installing the systemd unit.
Container Runtime Bootstrap¶
Launchable OCI workloads require an approved local container runtime on the target node. Until infra owns this in the site cloud-init package, the node-agent bootstrap installer is responsible for a minimal runtime prerequisite:
- check for
docker,podman, ornerdctl, - if none is present and runtime bootstrap is enabled, install the configured package with apt,
- default package:
docker.io, - enable/start the installed runtime service when systemd is available.
- configure Docker registry trust from
/etc/gpuaas/ca-bundle.crtand performdocker loginusing a bootstrap-token-brokered registry credential when the control plane exposes one.
This runs only during fresh bootstrap or rebootstrap. App deploy must not
install host packages as a side effect. Existing nodes need rebootstrap or an
infra-managed runtime install before workload.oci_launch can succeed.
The bootstrap script fetches runtime registry credentials from:
The endpoint is bootstrap-token scoped. It returns 204 when the environment has
no registry pull credential configured, and the installer proceeds without a
registry login. The script writes credentials only to a temporary root-owned env
file and removes that file after installer execution.
The bootstrap renderer passes:
GPUAAS_NODE_AGENT_INSTALL_CONTAINER_RUNTIME=true
GPUAAS_NODE_AGENT_CONTAINER_RUNTIME_PACKAGE=docker.io
GPUAAS_NODE_AGENT_REGISTRY_ENV_FILE=/tmp/gpuaas-registry.env
GPUAAS_NODE_AGENT_REGISTRY_CA_BUNDLE=/etc/gpuaas/ca-bundle.crt
The API process can disable or change this with:
Security Requirements¶
- bootstrap token must be node-bound
- bootstrap token must have short TTL
- bootstrap token must not be reusable indefinitely
- bootstrap token must not grant access to general admin/user routes
- script must pull OCI package by digest
- script must not embed long-lived registry credentials
- Vault remains source of truth; node bootstrap obtains registry credentials only through the bootstrap-token-brokered control-plane endpoint, not by calling Vault directly.
Delivery Modes¶
Manual¶
- rendered shell script
- operator runs with
sudo bash
Cloud-Init¶
- rendered shell script or helper invocation
- systemd install is required
MaaS¶
- same script/content injected into provider userdata
- no separate bootstrap semantics
Bootstrap Rendering Ownership¶
The API owns bootstrap script rendering and bootstrap topology.
UI, cloud-init, and provider userdata integrations may: 1. present the already-owned bootstrap command or metadata in an operator-friendly form, 2. pass through the bootstrap-script endpoint and token, 3. add thin entrypoint wiring for their environment.
They must not: 1. reconstruct package URLs independently, 2. reconstruct CA state or host overrides independently, 3. invent a second bootstrap implementation with different first-hop semantics.
What the UI Should Show¶
Primary:
1. Run this on the node
2. Download bootstrap script
Secondary: 1. trust bundle version 2. package digest 3. expiry
Hidden under Advanced / debug:
1. raw OCI ref
2. wrapped pull metadata
3. full CA PEM
4. env file content
Immediate Implementation Tasks¶
A-NODE-BOOTSTRAP-SCRIPT-ENDPOINT-001-
add bootstrap-script issuance and brokered token endpoint
-
B-NODE-ONBOARDING-UX-BOOTSTRAP-COMMAND-001 -
replace raw onboarding focus with primary bootstrap command UX
-
A-NODE-BOOTSTRAP-CLOUD-INIT-001 - add cloud-init rendering mode using the same bootstrap token and script model
Related Docs¶
doc/architecture/Node_Agent_OCI_Distribution_v1.mddoc/architecture/Node_Bootstrap_Trust_Delivery_v1.mddoc/architecture/Platform_Vault_Secrets_Baseline_v1.mddoc/operations/runbooks/Node_Onboarding_Runbook.md