Vault Bootstrap and Root Token Runbook¶
Purpose: - Define the one-time bootstrap, unseal, and immediate post-bootstrap handling for platform-control Vault.
Scope¶
- First initialization of persistent platform-control Vault
- Unseal key and root token handling
- Transition from bootstrap/root access to intended operational access
- Recovery checks after restart or host migration
Required Policy¶
- The generated Vault root token is break-glass material, not a steady-state runtime credential.
- The unseal key must be escrowed outside the cluster in an operator-controlled location.
- The root token must be recorded as bootstrap evidence, then rotated or replaced by a limited-scope operational token.
- Runtime services should consume only the minimum token/policy required for their Vault paths.
Bootstrap Procedure¶
- Confirm the persistent Vault pod is healthy and using persistent storage.
- Initialize Vault once.
- Escrow:
- unseal key
- initial root token
- Unseal Vault.
- Enable required mounts such as
kv/. - Seed required baseline secrets for the environment.
- Create or issue the intended limited-scope operational token/policy.
- Patch runtime secrets/config to use the operational token, not the root token.
- Verify restart behavior:
- pod restart
- unseal procedure
- mount availability
- secret readability
Recovery Checks¶
vault statusshowsInitialized=truevault statusshowsSealed=falseafter unseal- expected mounts exist (
kv/at minimum) - platform-control registry secrets readable
- MAAS site secret paths readable if configured
Deploy Preflight Behavior¶
scripts/ci/platform_control_deploy.sh performs a Vault readiness preflight before
Vault-backed registry secrets are reconciled. The deploy may continue only when
vault status reports Initialized=true and Sealed=false.
After registry secrets are reconciled, the deploy verifies the publisher and puller
Vault paths are readable. The preflight logs only safe status strings such as
Vault readiness: Initialized=true Sealed=false; it must never print unseal keys,
root tokens, operational tokens, registry passwords, or secret values.
Platform-Control Deploy Failure: Vault Sealed¶
Use this when a platform-control release or deploy fails while reconciling Vault-backed
secrets and the deploy trace contains Vault is sealed.
Typical symptoms:
- platform_control_deploy fails before Kubernetes manifests are applied or validated.
- The failing step is near reconcile platform-control Vault registry secrets.
- vault status reports Initialized=true and Sealed=true.
Safe status check:
ssh hpcadmin@100.90.157.34 \
'sudo k3s kubectl -n gpuaas-infra exec vault-0 -c vault -- sh -c "export VAULT_ADDR=http://127.0.0.1:8200; vault status -format=json"'
Break-glass unseal from the persistent Vault init material:
ssh hpcadmin@100.90.157.34 'sudo k3s kubectl -n gpuaas-infra exec -i vault-0 -c vault -- sh -eu' <<'EOF'
export VAULT_ADDR=http://127.0.0.1:8200
init_file=/vault/data/init.json
if [ ! -s "$init_file" ]; then
echo "init file missing" >&2
exit 1
fi
unseal_key=$(awk -F'"' '/"unseal_keys_b64"/ {getline; print $2; exit}' "$init_file")
if [ -z "$unseal_key" ]; then
echo "unseal key parse failed" >&2
exit 1
fi
vault operator unseal "$unseal_key" >/dev/null
vault status -format=json
EOF
After unseal:
1. Confirm Sealed=false.
2. Retry the failed GitLab deploy job instead of hand-applying manifests.
3. Confirm the release pipeline reaches remote validation.
4. Confirm platform-control pods are healthy.
Security notes:
- Do not print, copy, or paste the unseal key or root token into chat, CI logs, shell
history, or documentation.
- /vault/data/init.json is sensitive break-glass material. It is not a substitute
for operator-controlled escrow outside the cluster.
- The gpuaas-infra-secrets Kubernetes secret may contain bootstrap/root-token
material, but it must not be treated as the long-term unseal-key authority.
Follow-up Required After Any Manual Bootstrap¶
- Record the event in
doc/operations/evidence/secrets_key_ops.md. - Replace root-token runtime usage with the intended operational token.
- Confirm no cluster secret or runtime config still depends on the root token unless explicitly documented as break-glass.
- If this was triggered by a deployment, add evidence to the related release notes and verify the deploy job was retried from CI rather than manually bypassed.