Skip to content

Node Task Signing Lifecycle v1

As of: March 9, 2026

Purpose

Define the platform lifecycle for node task-signing material: 1. signer custody, 2. signer versioning, 3. verifier rollout, 4. rotation and rollback, 5. audit requirements.

This document makes node task signing a platform primitive instead of an environment-local secret.

Scope

In scope: 1. Ed25519 node task-signing key lifecycle, 2. active and staged verifier set model, 3. control-plane and node-agent rollout rules, 4. transition path from env-managed seeds to Vault-backed custody.

Out of scope: 1. full Vault implementation details, 2. bootstrap HTTPS CA delivery, 3. provider-specific PKI rollout.

Roles

Control plane

Owns: 1. signer generation, 2. signer custody, 3. task signing, 4. verifier set publication, 5. rotation scheduling, 6. rollback.

Node agent

Owns: 1. verifier loading, 2. signature verification, 3. old/new verifier grace handling, 4. reporting verifier version acceptance in audit/telemetry.

Vault / KMS

Owns: 1. signer private material custody, 2. future signing API or transit path, 3. rotation record source of truth later in the lifecycle.

Nodes never talk directly to Vault or KMS.

Baseline Model

Algorithm

  1. Ed25519 only in v1.

Signer identity

Each signer version has: 1. signer_version_id 2. public_key 3. state 4. published_at 5. effective_at 6. retires_at

State values: 1. staged 2. active 3. grace 4. retired 5. revoked

Verifier set

Node agents must not rely on one immutable verifier string forever.

The verifier model is: 1. one active signer version, 2. optional staged next signer version, 3. optional grace signer version for rollback/stragglers.

At steady state, nodes should accept: 1. the active signer public key, 2. a staged next signer public key during rollout, 3. a grace signer public key during rollback/straggler windows.

2026-05-03 implementation note: node-agent accepts verifier sets through GPUAAS_TASK_SIGNING_PUBKEYS as a comma/space separated list. API bootstrap publication uses NODE_BOOTSTRAP_TASK_SIGNING_PUBKEYS. Signer custody is still env/configmap based; Vault/KMS custody and auditable rotation remain follow-up work.

Distribution Model

Current transition baseline

Verifier material may be delivered through: 1. onboarding/bootstrap configuration, 2. node runtime configuration refresh, 3. explicit signed rotate task.

This is acceptable only if: 1. verifier versions are explicit, 2. rollout is auditable, 3. normal rotation does not require binary rebuild, 4. per-node shell edits are not the normal rotation path.

Long-term direction

Normal verifier rollout should be platform-controlled through: 1. bootstrap config for first trust, 2. signed rotate task for lifecycle update, 3. config refresh only as safety/repair path.

Rotation Procedure

Phase 1: Generate and stage

  1. Generate new Ed25519 keypair under platform custody.
  2. Assign new signer_version_id.
  3. Mark it staged.
  4. Publish the new public verifier in the control-plane verifier set.

Phase 2: Roll out verifier

  1. Issue node.rotate_signing_key to enrolled nodes, signed by the current active signer.
  2. Payload must include:
  3. new_signer_version_id
  4. new_public_key
  5. effective_at
  6. grace_until
  7. Nodes store the new verifier as staged and continue accepting the current active verifier until effective_at.

Phase 3: Activate

  1. At effective_at, control plane begins signing new tasks with the new private key.
  2. Nodes must accept:
  3. new active signer,
  4. old signer until grace_until.

Phase 4: Retire old signer

  1. After grace_until, control plane retires the old signer.
  2. Nodes must stop accepting the old signer.
  3. Old signer state becomes retired or revoked.

Rollback Procedure

Rollback must be explicit and time-bounded.

Rules: 1. If rotation fails before effective_at, discard the staged signer and keep current active signer. 2. If rotation fails after effective_at but before grace_until, control plane may temporarily re-sign with the old signer and re-issue rollout. 3. After grace_until, rollback requires a new explicit rotation event, not silent indefinite reuse. 4. Emergency revocation is allowed, but it must be auditable and may temporarily disable nodes that do not have the replacement verifier.

Signed Rotate Task Contract

node.rotate_signing_key is itself a signed task and must be accepted only if signed by a currently trusted signer.

Minimum payload:

{
  "task_type": "node.rotate_signing_key",
  "params": {
    "new_signer_version_id": "uuid-or-version-string",
    "new_public_key": "base64url-ed25519-public-key",
    "effective_at": "RFC3339",
    "grace_until": "RFC3339"
  }
}

Validation rules: 1. effective_at must be in the future by a minimum policy-controlled lead time. 2. grace_until must be later than effective_at. 3. new_public_key must parse as Ed25519 public key. 4. node agent must reject malformed or self-contradictory payloads.

Audit and Telemetry Requirements

Every signer lifecycle mutation must be auditable: 1. node_task_signer.create 2. node_task_signer.stage 3. node_task_signer.activate 4. node_task_signer.retire 5. node_task_signer.revoke 6. node_task_signer.rollback

Recorded fields: 1. actor 2. signer version id 3. previous signer version id when applicable 4. effective/grace timestamps 5. correlation id 6. result

Node-side reporting should include: 1. current accepted signer version set, 2. last successful rotate task, 3. verification failures by signer version when possible.

Transitional Reality

Current platform environments may temporarily still expose one verifier through environment/bootstrap configuration.

That is acceptable only if: 1. the canonical lifecycle is this document, 2. new work does not hard-code single-verifier forever semantics, 3. future agent/runtime changes move toward versioned verifier rollout rather than permanent manual edits.

Non-Negotiable Invariants

  1. private signer material remains platform-custodied only,
  2. node agents never receive signer private material,
  3. verifier rollout must not depend on binary rebuild as the normal path,
  4. active and grace windows must be explicit,
  5. rollback must be auditable,
  6. nodes do not talk directly to Vault or KMS.

Follow-on Work

  1. A-NODE-BOOTSTRAP-TRUST-DELIVERY-001
  2. Vault-backed signer custody implementation under A-VAULT-PLATFORM-SECRETS-001 follow-ons
  1. doc/architecture/PKI_Spec.md
  2. doc/architecture/Node_Agent_Spec.md
  3. doc/architecture/Platform_Signing_and_Bootstrap_Trust_v1.md
  4. doc/architecture/Platform_Vault_Secrets_Baseline_v1.md