GPUaaS Review Portal¶
A fact-based review surface for the Core42 GPUaaS platform. Curated for product, architecture, operations, security, and developer teams.
Every claim on this site is traceable to code (cmd/, packages/), contract (openapi.draft.yaml, asyncapi.draft.yaml), schema (db_schema_v1.sql), runbook (doc/operations/runbooks/), or RCA (doc/rca/). No roadmap, no aspirational content — only what exists today and what has been formally designed.
Pick your entry point¶
Product
PRD, personas, user journeys, UX information architecture, V3 redesign status.
Architecture
As-built C4, domain ownership, allocation lifecycle, outbox & events, GPU slice internals.
Developers
Repo layout, coding/test patterns, contract workflow, CI gates, quick-start.
Operations
Runbook index, observability, incident severity, lab topology, local-dev setup.
Security & Governance
Threat model, governance precedence, audit & compliance, sanitize-first rules.
What exists today
Fact dashboard: binaries, packages, SKUs, contracts, runbooks, RCAs.
Status legend¶
Every detail page carries one or more of these badges so reviewers instantly know what level of evidence backs the page.
Implemented
Backed by code in cmd/ or packages/. Callable today.
Contract
Defined in openapi.draft.yaml or asyncapi.draft.yaml.
Designed
A spec exists in doc/ but is not yet end-to-end in code.
Decided Decision recorded; implementation may be partial.
Runbook Has an operational runbook.
RCA Documented post-incident analysis.
Deprecated Being retired.
Portal map¶
flowchart TB
classDef section fill:#e3f2fd,stroke:#1565c0,color:#0d47a1
classDef trail fill:#fff3e0,stroke:#e65100,color:#bf360c
classDef ref fill:#f3e5f5,stroke:#6a1b9a,color:#4a148c
LP[Landing]
LP --> NOW[What exists today]:::section
LP --> ARCH[Architecture as-built]:::section
LP --> PROD[Product]:::section
LP --> OPS[Operations]:::section
LP --> DEV[Developers]:::section
LP --> SEC[Security & Governance]:::section
LP --> TR[Cross-cutting trails]:::trail
TR --> TR1[GPU Slice]:::trail
TR --> TR2[App platform]:::trail
TR --> TR3[Billing & payments]:::trail
TR --> TR4[IAM & tenancy]:::trail
TR --> TR5[Node & MAAS]:::trail
TR --> TR6[Terminal & sessions]:::trail
LP --> REF[Reference]:::ref
REF --> R1[Glossary]:::ref
REF --> R2[Error codes]:::ref
REF --> R3[Policy keys]:::ref
REF --> R4[NATS subjects]:::ref
REF --> R5[REST API explorer]:::ref
REF --> R6[AsyncAPI explorer]:::ref
What the platform is¶
Core42 GPUaaS is a contract-first GPU cloud control plane. Users discover GPU capacity, provision allocations (full bare-metal nodes or GPU slice VMs), open browser terminals or SSH into them, run platform apps (Jupyter, vLLM, Slurm, RKE2), and pay per usage. Admins manage inventory, audit, and refunds.
flowchart LR
USER([End User]):::actor
ADMIN([Admin]):::actor
OPS_USER([Billing Operator]):::actor
subgraph EDGE[Public Edge]
WAF[WAF + API Gateway]
end
subgraph CP[Control Plane]
BFF[cmd/api<br/>BFF + all REST routes]
TG[cmd/terminal-gateway<br/>WS terminal]
ORCH[Provisioning orchestrator]
BILL[Billing + ledger]
PAY[Payments / Stripe]
AUTH[OIDC / Keycloak]
ADM[Admin]
end
subgraph WK[Workers]
PW[provisioning-worker]
BW[billing-worker]
WW[webhook-worker]
NR[notification-relay]
OR[outbox-relay]
ARW[app-runtime-worker]
end
subgraph DATA[Data Plane]
PG[(PostgreSQL)]
REDIS[(Redis)]
NATS[(NATS JetStream)]
S3[(Object Storage)]
VAULT[(Vault / KMS)]
end
subgraph FLEET[Fleet]
NA[node-agent on each host]
MAAS[MAAS bare-metal]
STRIPE[Stripe]
end
USER --> WAF
ADMIN --> WAF
OPS_USER --> WAF
WAF --> BFF
WAF --> TG
BFF --> AUTH
BFF --> ORCH
BFF --> BILL
BFF --> PAY
BFF --> ADM
BFF --> PG
BFF --> REDIS
BFF --> S3
ORCH --> PG
ORCH --> NATS
BILL --> PG
PAY --> STRIPE
PW <--> NATS
BW <--> NATS
WW <--> NATS
OR --> NATS
NR --> NATS
ARW <--> NATS
PW <--> NA
TG <--> NA
NA --> MAAS
classDef actor fill:#fff8e1,stroke:#f57f17
See full system context → Browse what exists today →
Reading suggestions¶
| Audience | Suggested first pages |
|---|---|
| Product manager | PRD distilled → Personas & journeys → V3 redesign status |
| Architect | System context → Allocation lifecycle → GPU slice as-built |
| SRE / Ops | Runbook index → Observability stack → Incident severity model |
| Developer | Quick start → Coding patterns → Contract workflow |
| Security reviewer | Threat model → Sanitize-first rules → Audit & compliance |
| External reviewer | What exists today → Position vs other clouds → GPU slice trail |