Glossary¶
Implemented
Every term used in this portal, defined once. Source-cited.
A¶
- Allocation
- A customer's lease on compute. The unit of billing and lifecycle. Realized as a whole node (
baremetal) or as one or more slots on a node (gpu_slice). Table:allocations. - Allocation resource claim
- A row in
allocation_resource_claimsbinding an allocation to either the whole node (claim_kind=node_exclusive) or specific slots (claim_kind=slot). The durable source of placement truth. - App instance
- A running app inside an allocation. Table:
app_instances. Lifecycle managed bycmd/app-runtime-worker. - Audit log
- An immutable row in
audit_logswritten for every privileged mutation. Required fields:actor_user_id,actor_role,action,target_type,target_id,result,correlation_id.
B¶
- Baremetal
- A
capacity_shapevalue meaning the allocation claims an entire physical node. - BFF
- Backend-for-frontend.
cmd/apiplays this role.
C¶
- Capacity shape
- The type of allocation realization. Today:
baremetalorgpu_slice. Reserved future shapes includegpu_partition,gpu_shared. - Cleanup-blocked
- Slot state meaning destructive cleanup failed (e.g., mounted host storage detected, wipe verification failed). Slot is not reusable until operator intervention. → Runbook.
- Cloud-init seed ISO
- A small ISO image (
seed.iso) attached to a slice VM at first boot. Containsuser-data.yaml(users, keys, runcmd) andmeta-data.yaml. Generated bycloud-localds. Located at/var/lib/gpuaas/slices/<allocation_id>/. - Contract
- An OpenAPI or AsyncAPI definition. Authoritative — code follows the contract, not the other way around.
- Correlation ID
- A UUID propagated across every request, span, log, audit row, and outbox event for one logical operation. Required in every error response.
E¶
- Envelope
- The standard event payload wrapper:
{event_id, event_type, occurred_at, version, correlation_id, payload}.
F¶
- Fabric VF
- A Mellanox SR-IOV virtual function passed through to a slice VM for InfiniBand/RoCE. Each slot must have its own. Identified by
capacity_metadata.fabric_vf_pci_address. - FOR UPDATE SKIP LOCKED
- Postgres locking mode used for slot reservation, outbox claim, and queued task claim. Ensures concurrent workers don't race.
G¶
gpu_slice- A
capacity_shapevalue. VM-based tenancy with N slots passed through.
I¶
- Idempotency key
X-Idempotency-Keyheader. Mutations are safe to retry with the same key. Exception: terminal token mint (single-use by design).- IPoIB
- IP-over-InfiniBand. Used by the host for east-west fabric on slice nodes. Configured by netplan
/etc/netplan/60-ipoib.yaml.
J¶
- JWKS
- JSON Web Key Set. Public keys for verifying Keycloak-issued JWTs. Cached in
cmd/apifor 5 minutes; no per-request Keycloak call.
L¶
- Lease (slot lease)
- A JSON file under
/var/lib/gpuaas/node-scheduler/leases/{slot_id}.jsonused by node-agent as a host-local mutex to prevent two provisioning tasks from racing on the same physical resources. - Ledger entry
- An immutable row in
ledger_entries. Never UPDATE, never DELETE. Balance is computed by summing entries. Corrections add a new entry withmetadata.corrects_entry_id.
M¶
- MAAS
- Canonical Metal as a Service. Used for bare-metal commissioning/deploy. Optional, gated by
MAAS_ENABLED. - mTLS
- Mutual TLS. Used for every internal call between node-agent and
cmd/api. Node identity = enrollment cert (24 h TTL, X5C renewal).
N¶
- NATS subject
- A topic on NATS JetStream. Examples:
provisioning.requested,billing.low_balance_warning,payments.balance_credited. - Node
- A physical host enrolled in GPUaaS. Table:
nodes. Hostscmd/node-agent. - Node-agent
cmd/node-agent. Pull-based typed-task executor on each host. mTLS. Not a remote shell.- Node task
- A typed unit of work for node-agent. Persisted in
node_taskswithparams,signature,expires_at. Examples:slice.vm_provision,slice.vm_release,allocation.provision_user. node_exclusiveclaim- A claim kind for baremetal allocations. Locks the entire node.
node_resource_slots- Table holding the approved slot map for slice-mode nodes. Schedulable only when full
capacity_metadatais present.
O¶
- OIDC
- OpenID Connect. Authentication protocol. Backed by Keycloak in dev; any compliant IdP in production.
- Outbox
outbox_eventstable. Written in the same DB transaction as a domain change, then published to NATS bycmd/outbox-relay. Solves the "wrote DB but failed to publish event" problem.- OVS
- Open vSwitch. Used as the host-side bridge for slice VM management plane. Bridge name:
ovsbr0.
P¶
- Policy client
packages/shared/policy.PolicyClient. The only legitimate source for runtime business policy values. Readspolicy_valuestable with scoped resolutionglobal → tenant → project.- Project
- Operational scope inside a tenant. Resources (allocations, app instances, storage) belong to a project. Default project auto-created at signup.
- Provisioning worker
cmd/provisioning-worker. Temporal worker that runs allocation/slice provisioning + release workflows.
R¶
- Read-model cache
packages/shared/readcache. Cross-domain UI read path.- Release-failed
- Allocation state meaning cleanup retries exhausted. Billing stops. Admin or user can retry.
- Resource identifier
- Canonical name:
core42:aicloud:{region}:{tenant_id}:{project_id}:{resource_type}:{resource_id}.
S¶
- Sanitize-first
- Mandatory rule: every log line and trace span passes through
middleware.Sanitize()before emission. Replaces blocklisted field values with[REDACTED]. - Service account
- A project-scoped non-human identity. Token TTL controlled by
auth.service_account_token_ttl_seconds. - Slice
- See
gpu_slice. A tenant VM with N slot bundles passed through. - Slot
- A row in
node_resource_slots. One GPU + one fabric VF + one NVMe + one private IP + deterministic MAC. Operator-approved. - Slot lease
- See Lease.
- SKU
- Sellable product. Table:
sku_catalog. Carriescapacity_shape,gpus_total,allowed_gpu_counts,resource_profile, hourly price. - SR-IOV VF
- Single Root IO Virtualization Virtual Function. Used for the per-slot fabric attachment.
- Step-ca
- Smallstep CA used today for node enrollment certs. Vault PKI migration path exists.
T¶
- Tenant
- The ownership root. Table:
organizations. Carriesstripe_customer_id. Survives user churn. - Terminal token
- Opaque 256-bit random, single-use, 300 s TTL. Stored in Redis as
terminal_token:{token}. Minted viaPOST /api/v1/allocations/{id}/terminal-token. - Topology discovery
- The
slice.topology_discovertask. Scans a host and returns a candidate slot map withapproval_required: true. Never inserts slot rows directly.
V¶
- V3 redesign
- The in-flight UX/route redesign.
/v3is the design mock;/v3-prodis the shipped implementation. v1 routes are a frozen demo/internal surface during migration. - VFIO
- Linux kernel virtual function I/O passthrough subsystem. Slice GPUs and fabric VFs are bound to
vfio-pciso they can be passed through to the VM with no host kernel access to the device.
W¶
- Wipe policy
capacity_metadata.destructive_wipe_policyvalue on a slot. Declares the erase contract on release. Required to be non-empty for the slot to be schedulable.