Node Task Signing Lifecycle v1¶
As of: March 9, 2026
Purpose¶
Define the platform lifecycle for node task-signing material: 1. signer custody, 2. signer versioning, 3. verifier rollout, 4. rotation and rollback, 5. audit requirements.
This document makes node task signing a platform primitive instead of an environment-local secret.
Scope¶
In scope: 1. Ed25519 node task-signing key lifecycle, 2. active and staged verifier set model, 3. control-plane and node-agent rollout rules, 4. transition path from env-managed seeds to Vault-backed custody.
Out of scope: 1. full Vault implementation details, 2. bootstrap HTTPS CA delivery, 3. provider-specific PKI rollout.
Roles¶
Control plane¶
Owns: 1. signer generation, 2. signer custody, 3. task signing, 4. verifier set publication, 5. rotation scheduling, 6. rollback.
Node agent¶
Owns: 1. verifier loading, 2. signature verification, 3. old/new verifier grace handling, 4. reporting verifier version acceptance in audit/telemetry.
Vault / KMS¶
Owns: 1. signer private material custody, 2. future signing API or transit path, 3. rotation record source of truth later in the lifecycle.
Nodes never talk directly to Vault or KMS.
Baseline Model¶
Algorithm¶
- Ed25519 only in v1.
Signer identity¶
Each signer version has:
1. signer_version_id
2. public_key
3. state
4. published_at
5. effective_at
6. retires_at
State values:
1. staged
2. active
3. grace
4. retired
5. revoked
Verifier set¶
Node agents must not rely on one immutable verifier string forever.
The verifier model is: 1. one active signer version, 2. optional staged next signer version, 3. optional grace signer version for rollback/stragglers.
At steady state, nodes should accept: 1. the active signer public key, 2. a staged next signer public key during rollout, 3. a grace signer public key during rollback/straggler windows.
2026-05-03 implementation note: node-agent accepts verifier sets through
GPUAAS_TASK_SIGNING_PUBKEYS as a comma/space separated list. API bootstrap publication
uses NODE_BOOTSTRAP_TASK_SIGNING_PUBKEYS. Signer custody is still env/configmap based;
Vault/KMS custody and auditable rotation remain follow-up work.
Distribution Model¶
Current transition baseline¶
Verifier material may be delivered through: 1. onboarding/bootstrap configuration, 2. node runtime configuration refresh, 3. explicit signed rotate task.
This is acceptable only if: 1. verifier versions are explicit, 2. rollout is auditable, 3. normal rotation does not require binary rebuild, 4. per-node shell edits are not the normal rotation path.
Long-term direction¶
Normal verifier rollout should be platform-controlled through: 1. bootstrap config for first trust, 2. signed rotate task for lifecycle update, 3. config refresh only as safety/repair path.
Rotation Procedure¶
Phase 1: Generate and stage¶
- Generate new Ed25519 keypair under platform custody.
- Assign new
signer_version_id. - Mark it
staged. - Publish the new public verifier in the control-plane verifier set.
Phase 2: Roll out verifier¶
- Issue
node.rotate_signing_keyto enrolled nodes, signed by the current active signer. - Payload must include:
new_signer_version_idnew_public_keyeffective_atgrace_until- Nodes store the new verifier as staged and continue accepting the current active verifier until
effective_at.
Phase 3: Activate¶
- At
effective_at, control plane begins signing new tasks with the new private key. - Nodes must accept:
- new active signer,
- old signer until
grace_until.
Phase 4: Retire old signer¶
- After
grace_until, control plane retires the old signer. - Nodes must stop accepting the old signer.
- Old signer state becomes
retiredorrevoked.
Rollback Procedure¶
Rollback must be explicit and time-bounded.
Rules:
1. If rotation fails before effective_at, discard the staged signer and keep current active signer.
2. If rotation fails after effective_at but before grace_until, control plane may temporarily re-sign with the old signer and re-issue rollout.
3. After grace_until, rollback requires a new explicit rotation event, not silent indefinite reuse.
4. Emergency revocation is allowed, but it must be auditable and may temporarily disable nodes that do not have the replacement verifier.
Signed Rotate Task Contract¶
node.rotate_signing_key is itself a signed task and must be accepted only if signed by a currently trusted signer.
Minimum payload:
{
"task_type": "node.rotate_signing_key",
"params": {
"new_signer_version_id": "uuid-or-version-string",
"new_public_key": "base64url-ed25519-public-key",
"effective_at": "RFC3339",
"grace_until": "RFC3339"
}
}
Validation rules:
1. effective_at must be in the future by a minimum policy-controlled lead time.
2. grace_until must be later than effective_at.
3. new_public_key must parse as Ed25519 public key.
4. node agent must reject malformed or self-contradictory payloads.
Audit and Telemetry Requirements¶
Every signer lifecycle mutation must be auditable:
1. node_task_signer.create
2. node_task_signer.stage
3. node_task_signer.activate
4. node_task_signer.retire
5. node_task_signer.revoke
6. node_task_signer.rollback
Recorded fields: 1. actor 2. signer version id 3. previous signer version id when applicable 4. effective/grace timestamps 5. correlation id 6. result
Node-side reporting should include: 1. current accepted signer version set, 2. last successful rotate task, 3. verification failures by signer version when possible.
Transitional Reality¶
Current platform environments may temporarily still expose one verifier through environment/bootstrap configuration.
That is acceptable only if: 1. the canonical lifecycle is this document, 2. new work does not hard-code single-verifier forever semantics, 3. future agent/runtime changes move toward versioned verifier rollout rather than permanent manual edits.
Non-Negotiable Invariants¶
- private signer material remains platform-custodied only,
- node agents never receive signer private material,
- verifier rollout must not depend on binary rebuild as the normal path,
- active and grace windows must be explicit,
- rollback must be auditable,
- nodes do not talk directly to Vault or KMS.
Follow-on Work¶
A-NODE-BOOTSTRAP-TRUST-DELIVERY-001- Vault-backed signer custody implementation under
A-VAULT-PLATFORM-SECRETS-001follow-ons
Related Docs¶
doc/architecture/PKI_Spec.mddoc/architecture/Node_Agent_Spec.mddoc/architecture/Platform_Signing_and_Bootstrap_Trust_v1.mddoc/architecture/Platform_Vault_Secrets_Baseline_v1.md