Skip to content

Glossary

Implemented

Every term used in this portal, defined once. Source-cited.

A

Allocation
A customer's lease on compute. The unit of billing and lifecycle. Realized as a whole node (baremetal) or as one or more slots on a node (gpu_slice). Table: allocations.
Allocation resource claim
A row in allocation_resource_claims binding an allocation to either the whole node (claim_kind=node_exclusive) or specific slots (claim_kind=slot). The durable source of placement truth.
App instance
A running app inside an allocation. Table: app_instances. Lifecycle managed by cmd/app-runtime-worker.
Audit log
An immutable row in audit_logs written for every privileged mutation. Required fields: actor_user_id, actor_role, action, target_type, target_id, result, correlation_id.

B

Baremetal
A capacity_shape value meaning the allocation claims an entire physical node.
BFF
Backend-for-frontend. cmd/api plays this role.

C

Capacity shape
The type of allocation realization. Today: baremetal or gpu_slice. Reserved future shapes include gpu_partition, gpu_shared.
Cleanup-blocked
Slot state meaning destructive cleanup failed (e.g., mounted host storage detected, wipe verification failed). Slot is not reusable until operator intervention. → Runbook.
Cloud-init seed ISO
A small ISO image (seed.iso) attached to a slice VM at first boot. Contains user-data.yaml (users, keys, runcmd) and meta-data.yaml. Generated by cloud-localds. Located at /var/lib/gpuaas/slices/<allocation_id>/.
Contract
An OpenAPI or AsyncAPI definition. Authoritative — code follows the contract, not the other way around.
Correlation ID
A UUID propagated across every request, span, log, audit row, and outbox event for one logical operation. Required in every error response.

E

Envelope
The standard event payload wrapper: {event_id, event_type, occurred_at, version, correlation_id, payload}.

F

Fabric VF
A Mellanox SR-IOV virtual function passed through to a slice VM for InfiniBand/RoCE. Each slot must have its own. Identified by capacity_metadata.fabric_vf_pci_address.
FOR UPDATE SKIP LOCKED
Postgres locking mode used for slot reservation, outbox claim, and queued task claim. Ensures concurrent workers don't race.

G

gpu_slice
A capacity_shape value. VM-based tenancy with N slots passed through.

I

Idempotency key
X-Idempotency-Key header. Mutations are safe to retry with the same key. Exception: terminal token mint (single-use by design).
IPoIB
IP-over-InfiniBand. Used by the host for east-west fabric on slice nodes. Configured by netplan /etc/netplan/60-ipoib.yaml.

J

JWKS
JSON Web Key Set. Public keys for verifying Keycloak-issued JWTs. Cached in cmd/api for 5 minutes; no per-request Keycloak call.

L

Lease (slot lease)
A JSON file under /var/lib/gpuaas/node-scheduler/leases/{slot_id}.json used by node-agent as a host-local mutex to prevent two provisioning tasks from racing on the same physical resources.
Ledger entry
An immutable row in ledger_entries. Never UPDATE, never DELETE. Balance is computed by summing entries. Corrections add a new entry with metadata.corrects_entry_id.

M

MAAS
Canonical Metal as a Service. Used for bare-metal commissioning/deploy. Optional, gated by MAAS_ENABLED.
mTLS
Mutual TLS. Used for every internal call between node-agent and cmd/api. Node identity = enrollment cert (24 h TTL, X5C renewal).

N

NATS subject
A topic on NATS JetStream. Examples: provisioning.requested, billing.low_balance_warning, payments.balance_credited.
Node
A physical host enrolled in GPUaaS. Table: nodes. Hosts cmd/node-agent.
Node-agent
cmd/node-agent. Pull-based typed-task executor on each host. mTLS. Not a remote shell.
Node task
A typed unit of work for node-agent. Persisted in node_tasks with params, signature, expires_at. Examples: slice.vm_provision, slice.vm_release, allocation.provision_user.
node_exclusive claim
A claim kind for baremetal allocations. Locks the entire node.
node_resource_slots
Table holding the approved slot map for slice-mode nodes. Schedulable only when full capacity_metadata is present.

O

OIDC
OpenID Connect. Authentication protocol. Backed by Keycloak in dev; any compliant IdP in production.
Outbox
outbox_events table. Written in the same DB transaction as a domain change, then published to NATS by cmd/outbox-relay. Solves the "wrote DB but failed to publish event" problem.
OVS
Open vSwitch. Used as the host-side bridge for slice VM management plane. Bridge name: ovsbr0.

P

Policy client
packages/shared/policy.PolicyClient. The only legitimate source for runtime business policy values. Reads policy_values table with scoped resolution global → tenant → project.
Project
Operational scope inside a tenant. Resources (allocations, app instances, storage) belong to a project. Default project auto-created at signup.
Provisioning worker
cmd/provisioning-worker. Temporal worker that runs allocation/slice provisioning + release workflows.

R

Read-model cache
packages/shared/readcache. Cross-domain UI read path.
Release-failed
Allocation state meaning cleanup retries exhausted. Billing stops. Admin or user can retry.
Resource identifier
Canonical name: core42:aicloud:{region}:{tenant_id}:{project_id}:{resource_type}:{resource_id}.

S

Sanitize-first
Mandatory rule: every log line and trace span passes through middleware.Sanitize() before emission. Replaces blocklisted field values with [REDACTED].
Service account
A project-scoped non-human identity. Token TTL controlled by auth.service_account_token_ttl_seconds.
Slice
See gpu_slice. A tenant VM with N slot bundles passed through.
Slot
A row in node_resource_slots. One GPU + one fabric VF + one NVMe + one private IP + deterministic MAC. Operator-approved.
Slot lease
See Lease.
SKU
Sellable product. Table: sku_catalog. Carries capacity_shape, gpus_total, allowed_gpu_counts, resource_profile, hourly price.
SR-IOV VF
Single Root IO Virtualization Virtual Function. Used for the per-slot fabric attachment.
Step-ca
Smallstep CA used today for node enrollment certs. Vault PKI migration path exists.

T

Tenant
The ownership root. Table: organizations. Carries stripe_customer_id. Survives user churn.
Terminal token
Opaque 256-bit random, single-use, 300 s TTL. Stored in Redis as terminal_token:{token}. Minted via POST /api/v1/allocations/{id}/terminal-token.
Topology discovery
The slice.topology_discover task. Scans a host and returns a candidate slot map with approval_required: true. Never inserts slot rows directly.

V

V3 redesign
The in-flight UX/route redesign. /v3 is the design mock; /v3-prod is the shipped implementation. v1 routes are a frozen demo/internal surface during migration.
VFIO
Linux kernel virtual function I/O passthrough subsystem. Slice GPUs and fabric VFs are bound to vfio-pci so they can be passed through to the VM with no host kernel access to the device.

W

Wipe policy
capacity_metadata.destructive_wipe_policy value on a slot. Declares the erase contract on release. Required to be non-empty for the slot to be schedulable.