Node-agent task catalog¶
Implemented
cmd/node-agent/agent.go:32-64 (task type registry) · cmd/node-agent/*.go (handlers)
The node-agent is a bounded typed-task executor. It is not a remote shell. Every task has a typed input, a structured output, and a signed contract. Tasks come from cmd/api over mTLS pull (GET /internal/v1/nodes/{id}/tasks/wait) and results are posted back (POST .../tasks/{id}/result).
sequenceDiagram
autonumber
participant W as cmd/provisioning-worker
participant A as cmd/api
participant PG as Postgres
participant N as cmd/node-agent
participant H as Host OS
W->>A: enqueueTask(node_id, type, params)
A->>PG: INSERT node_tasks (status='queued')
N->>A: GET /tasks/wait (long-poll, mTLS)
A->>PG: CTE: claim queued task → dispatched
A-->>N: {task_id, task_type, params, signature, expires_at}
N->>N: verify signature + expiry
N->>H: execute typed handler
N->>A: POST /tasks/{id}/result
A->>PG: UPDATE node_tasks status='completed'
W->>PG: poll node_tasks (250ms)
W-->>W: terminal → continue workflow
Task taxonomy¶
flowchart LR
classDef slice fill:#fff3e0,stroke:#e65100
classDef bm fill:#e3f2fd,stroke:#1565c0
classDef user fill:#e8f5e9,stroke:#2e7d32
classDef diag fill:#f3e5f5,stroke:#6a1b9a
subgraph Slice["GPU slice family"]
T1[slice.topology_discover]:::slice
T2[slice.vm_provision]:::slice
T3[slice.vm_release]:::slice
end
subgraph BM["Baremetal allocation"]
T4[allocation.provision_user]:::bm
T5[allocation.deprovision_user]:::bm
end
subgraph Diag["Diagnostics"]
T6[diag.health_probe]:::diag
end
Slice family¶
slice.topology_discover¶
Implemented
Operator-onboarding task. Scans the host (/sys/bus/pci/devices, /sys/block, OVS, IPoIB, kernel cmdline) and returns a candidate slot map plus host readiness blockers.
| Output key | Meaning |
|---|---|
prerequisites.kvm_available |
/dev/kvm exists |
prerequisites.iommu_group_count |
non-zero ⇒ VFIO usable |
prerequisites.iommu_kernel_args |
intel_iommu=on \| amd_iommu=on \| iommu=pt present |
prerequisites.cpu_virtualization |
{vmx, svm} flags |
prerequisites.loaded_modules |
vfio_pci, vfio_iommu_type1, etc. |
prerequisites.required_commands |
cloud-localds, qemu-img, virt-install, virsh, ovs-vsctl, findmnt |
prerequisites.slice_network |
OVS bridge, NAT, IP forwarding, IPoIB netplan checks |
prerequisites.reboot_required |
marker files present |
gpu_devices[] |
NVIDIA GPUs (vendor 0x10de) with PCI/IOMMU/NUMA |
fabric_devices[] |
Mellanox (vendor 0x15b3) with VF detection |
nvme_devices[] |
NVMe namespaces with stable /dev/disk/by-id/ path |
candidate_slots[] |
GPU↔NVMe↔fabric VF pairs, deterministic MAC + IP |
candidate_summary.blockers[] |
List of why slots are not schedulable |
Approval required: approval_required: true is always set on the output. Discovery never writes node_resource_slots rows.
Source: cmd/node-agent/slice_topology.go.
slice.vm_provision¶
Implemented
Provisions a slice VM with N slots passed through. Phase-timed; failure runs deferred cleanup.
flowchart TB
P0([slice.vm_provision])
P0 --> P1[1. lease_acquire<br/>per-slot exclusive JSON lease]
P1 --> P2[2. host_dependencies<br/>apt-install qemu-kvm/libvirt/ovs]
P2 --> P3[3. host_passthrough_check<br/>/dev/kvm + modprobe vfio-pci]
P3 --> P4[4. vfio_bind_check<br/>GPU + fabric VF → vfio-pci]
P4 --> P5[5. image_stat_download<br/>fetch from image_url if missing]
P5 --> P6[6. image_digest_verify]
P6 --> P7[7. cloud_init_dir]
P7 --> P8[8. terminal_key]
P8 --> P9[9. guest_telemetry_register]
P9 --> P10[10. cloud_init_seed_files]
P10 --> P11[11. cloud_localds]
P11 --> P12[12. runtime_validate<br/>OVS br-exists + NVMe unmounted]
P12 --> P13[13. dhcp_reservation]
P13 --> P14[14. image_write_convert<br/>qemu-img convert → boot NVMe]
P14 --> P15[15. virt_install]
P15 --> P16[16. readiness<br/>SSH + /var/lib/gpuaas/slice-ready]
P16 --> P17[17. performance_probe]
P17 --> OK([success])
P1 & P4 & P14 & P15 & P16 -.fail.-> CL[cleanup:<br/>drop DHCP / unregister telemetry /<br/>release leases / rm cloud_init_dir]
CL --> ERR([error])
Inputs (selected): allocation_id, vm_name, image_path, image_sha256, image_trusted, driver_strategy (cloud-init|preinstalled|none), default_username, ssh_public_keys[], slots[], ovs_bridge, graceful_timeout_seconds (30–900).
Each slot carries slot_id, slot_index, pci_address (GPU), fabric_device (VF PCI), nvme_device, numa_node, vcpu_count (default 12), memory_mib (default 65536), mac_address, private_ip.
Output includes vm_name, default_user, private_ip, ssh_port=22, slot_count, readiness, performance, timings.<phase>_ms, raw_vnc=false, console_model=gateway_required.
Source: cmd/node-agent/slice_vm.go:113-635 (handler + provision flow).
slice.vm_release¶
Implemented
flowchart TB
R0([slice.vm_release]) --> R1[virsh shutdown]
R1 --> R2{running after<br/>graceful_timeout?}
R2 -- yes --> R3[virsh destroy<br/>hard_stopped=true]
R2 -- no --> R4[virsh undefine<br/>--nvram fallback]
R3 --> R4
R4 --> R5[re-bind GPU + fabric VF<br/>to vfio-pci]
R5 --> R6[rm cloud_init_dir<br/>rm DHCP reservation]
R6 --> R7{wipe=true?}
R7 -- yes --> R8[zero each NVMe]
R7 -- no --> R9
R8 --> R9[release slot leases<br/>unregister guest telemetry]
R9 --> DONE([result])
Source: cmd/node-agent/slice_vm.go:1175-1221.
Baremetal family¶
allocation.provision_user¶
Implemented RCA
Creates the OS user on a bare-metal host, applies SSH authorized keys, verifies user/home presence.
Drove RCA: 2026-03-node-api-mtls-identity-handoff (DB recorded the task as completed while the node never executed it — fix made the task pull idempotent and identity-bound).
allocation.deprovision_user¶
Implemented
Inverse — removes user, revokes SSH access.
Diagnostics¶
diag.health_probe¶
Implemented
Periodic node health check. Reports kernel version, free RAM, KVM presence, VFIO module status, GPU detection, fabric link state.
Security model¶
| Property | Mechanism |
|---|---|
| Transport | mTLS — node identity is its enrollment cert (24 h TTL, X5C renewal via step-ca) |
| Task signing | Every task params payload is signed by cmd/api; node-agent verifies before execution |
| Expiry | Each task carries expires_at; node-agent rejects expired tasks |
| No remote shell | Only the registered handlers run. No exec.task or equivalent |
| Sandboxed paths | Image paths restricted to /var/lib/gpuaas/slice-images or /var/lib/libvirt/images; cloud-init dirs to /var/lib/gpuaas/slices/ |
| Single-use terminal keys | Each provision injects the host's per-instance ed25519 pubkey; the gateway uses it to broker SSH |
| Audit | API writes audit_logs for every task enqueue with actor, role, action, target, result |
Detail: Node Task Signing Lifecycle (source).
Where to look next¶
- GPU slice as-built → for end-to-end flow including the orchestrator side
- GPU slice runbooks → — manual bootstrap, image pipeline, cleanup-blocked
- Outbox & event flow → — how task completion maps back to allocation state