Vault Bootstrap and Root Token Runbook¶

Purpose: - Define the one-time bootstrap, unseal, and immediate post-bootstrap handling for platform-control Vault.

Scope¶

First initialization of persistent platform-control Vault
Unseal key and root token handling
Transition from bootstrap/root access to intended operational access
Recovery checks after restart or host migration

Required Policy¶

The generated Vault root token is break-glass material, not a steady-state runtime credential.
The unseal key must be escrowed outside the cluster in an operator-controlled location.
The root token must be recorded as bootstrap evidence, then rotated or replaced by a limited-scope operational token.
Runtime services should consume only the minimum token/policy required for their Vault paths.

Bootstrap Procedure¶

Confirm the persistent Vault pod is healthy and using persistent storage.
Initialize Vault once.
Escrow:
unseal key
initial root token
Unseal Vault.
Enable required mounts such as kv/.
Seed required baseline secrets for the environment.
Create or issue the intended limited-scope operational token/policy.
Patch runtime secrets/config to use the operational token, not the root token.
Verify restart behavior:
pod restart
unseal procedure
mount availability
secret readability

Recovery Checks¶

vault status shows Initialized=true
vault status shows Sealed=false after unseal
expected mounts exist (kv/ at minimum)
platform-control registry secrets readable
MAAS site secret paths readable if configured

Deploy Preflight Behavior¶

scripts/ci/platform_control_deploy.sh performs a Vault readiness preflight before Vault-backed registry secrets are reconciled. The deploy may continue only when vault status reports Initialized=true and Sealed=false.

After registry secrets are reconciled, the deploy verifies the publisher and puller Vault paths are readable. The preflight logs only safe status strings such as Vault readiness: Initialized=true Sealed=false; it must never print unseal keys, root tokens, operational tokens, registry passwords, or secret values.

Platform-Control Deploy Failure: Vault Sealed¶

Use this when a platform-control release or deploy fails while reconciling Vault-backed secrets and the deploy trace contains Vault is sealed.

Typical symptoms: - platform_control_deploy fails before Kubernetes manifests are applied or validated. - The failing step is near reconcile platform-control Vault registry secrets. - vault status reports Initialized=true and Sealed=true.

Safe status check:

ssh hpcadmin@100.90.157.34 \
  'sudo k3s kubectl -n gpuaas-infra exec vault-0 -c vault -- sh -c "export VAULT_ADDR=http://127.0.0.1:8200; vault status -format=json"'

Break-glass unseal from the persistent Vault init material:

ssh hpcadmin@100.90.157.34 'sudo k3s kubectl -n gpuaas-infra exec -i vault-0 -c vault -- sh -eu' <<'EOF'
export VAULT_ADDR=http://127.0.0.1:8200
init_file=/vault/data/init.json
if [ ! -s "$init_file" ]; then
  echo "init file missing" >&2
  exit 1
fi
unseal_key=$(awk -F'"' '/"unseal_keys_b64"/ {getline; print $2; exit}' "$init_file")
if [ -z "$unseal_key" ]; then
  echo "unseal key parse failed" >&2
  exit 1
fi
vault operator unseal "$unseal_key" >/dev/null
vault status -format=json
EOF

After unseal: 1. Confirm Sealed=false. 2. Retry the failed GitLab deploy job instead of hand-applying manifests. 3. Confirm the release pipeline reaches remote validation. 4. Confirm platform-control pods are healthy.

Security notes: - Do not print, copy, or paste the unseal key or root token into chat, CI logs, shell history, or documentation. - /vault/data/init.json is sensitive break-glass material. It is not a substitute for operator-controlled escrow outside the cluster. - The gpuaas-infra-secrets Kubernetes secret may contain bootstrap/root-token material, but it must not be treated as the long-term unseal-key authority.

Follow-up Required After Any Manual Bootstrap¶

Record the event in doc/operations/evidence/secrets_key_ops.md.
Replace root-token runtime usage with the intended operational token.
Confirm no cluster secret or runtime config still depends on the root token unless explicitly documented as break-glass.
If this was triggered by a deployment, add evidence to the related release notes and verify the deploy job was retried from CI rather than manually bypassed.