NATS subjects¶
Contract
doc/api/asyncapi.draft.yaml · packages/shared/events · doc/architecture/NATS_Stream_Config.md
Streams¶
| Stream | Subject pattern | Retention | Purpose |
|---|---|---|---|
PROVISIONING |
provisioning.> |
Limits-based | Allocation lifecycle events |
BILLING |
billing.> |
Limits-based | Balance & billing events |
PAYMENTS |
payments.> |
Limits-based | Payment credit / refund events |
DLQ |
dlq.> |
Long retention | Poison messages from any consumer |
Stream init: packages/shared/events.InitStreams().
Subject matrix¶
flowchart LR
classDef prod fill:#e3f2fd,stroke:#1565c0
classDef sub fill:#fff3e0,stroke:#e65100
%% provisioning
P_REQ[provisioning.requested]:::sub --> C_PW[provisioning-worker]:::prod
P_ACT[provisioning.active]:::sub --> C_BW1[billing-worker]:::prod
P_ACT --> C_NR1[notification-relay]:::prod
P_FAIL[provisioning.failed]:::sub --> C_NR2[notification-relay]:::prod
P_REL[provisioning.releasing.requested]:::sub --> C_PW2[provisioning-worker]:::prod
P_RDONE[provisioning.releasing.completed]:::sub --> C_BW2[billing-worker]:::prod
P_RDONE --> C_NR3[notification-relay]:::prod
P_RFAIL[provisioning.release_failed]:::sub --> C_BW3[billing-worker]:::prod
P_RFAIL --> C_NR4[notification-relay]:::prod
P_FORCE[provisioning.force_release_requested]:::sub --> C_PW3[provisioning-worker]:::prod
%% billing
B_LOW[billing.low_balance_warning]:::sub --> C_NR5[notification-relay]:::prod
B_AUTO[billing.auto_release_pending]:::sub --> C_NR6[notification-relay]:::prod
B_DEP[billing.balance_depleted]:::sub --> C_NR7[notification-relay]:::prod
B_DEP --> C_PW4[provisioning-worker<br/>force-release]:::prod
%% payments
Y_CR[payments.balance_credited]:::sub --> C_BW4[billing-worker]:::prod
Y_CR --> C_NR8[notification-relay]:::prod
Subject contracts¶
provisioning.requested¶
| Field | Type | Notes |
|---|---|---|
allocation_id |
uuid | Allocation just created |
capacity_shape |
string | baremetal or gpu_slice |
sku |
string | Selected SKU |
node_id |
uuid | Chosen node |
slot_ids |
uuid[] | For gpu_slice; empty for baremetal |
requested_by_user_id |
uuid | |
org_id |
uuid | nullable |
Producer: orchestrator. Consumer: cmd/provisioning-worker (starts workflow provisioning-{event_id}).
provisioning.active¶
| Field | Type | Notes |
|---|---|---|
allocation_id |
uuid | |
ready_at |
RFC3339 | clock_timestamp() |
private_ip |
string | For terminal/SSH |
default_user |
string | For SSH command |
Producer: cmd/provisioning-worker (after node-agent reports readiness). Consumers: billing-worker (start accrual), notification-relay (WS notify tenant).
provisioning.failed¶
| Field | Type | Notes |
|---|---|---|
allocation_id |
uuid | |
failure_reason |
string | machine-readable code |
details |
object | optional context |
Producer: cmd/provisioning-worker. Consumers: notification-relay; metrics.
provisioning.releasing.requested¶
| Field | Type | Notes |
|---|---|---|
allocation_id |
uuid | |
requested_by |
string | user / admin / billing |
reason |
string | optional |
Producer: orchestrator. Consumer: provisioning-worker (release flow).
provisioning.releasing.completed¶
| Field | Type | Notes |
|---|---|---|
allocation_id |
uuid | |
released_at |
RFC3339 | clock_timestamp() |
hard_stopped |
bool | True if graceful shutdown timed out |
wiped |
bool | True if NVMe wipe was performed |
Producer: provisioning-worker. Consumers: billing-worker (stop accrual), notification-relay.
provisioning.release_failed¶
| Field | Type | Notes |
|---|---|---|
allocation_id |
uuid | |
error |
string | what went wrong |
last_attempt_at |
RFC3339 |
Producer: provisioning-worker. Consumers: billing-worker (stop accrual immediately), notification-relay.
provisioning.force_release_requested¶
| Field | Type | Notes |
|---|---|---|
allocation_id |
uuid | |
reason |
string | balance_depleted / admin_force / etc. |
Producer: billing-worker (auto), admin API (manual). Consumer: provisioning-worker.
billing.low_balance_warning¶
| Field | Type | Notes |
|---|---|---|
user_id |
uuid | |
balance_minor |
int | |
threshold_minor |
int | from policy |
currency |
string |
Producer: billing-worker. Consumer: notification-relay (WS + email).
billing.auto_release_pending¶
| Field | Type | Notes |
|---|---|---|
user_id |
uuid | |
projected_depletion_at |
RFC3339 |
Producer: billing-worker. Consumer: notification-relay.
billing.balance_depleted¶
| Field | Type | Notes |
|---|---|---|
user_id |
uuid | |
balance_minor |
int | typically ≤ 0 |
Producer: billing-worker. Consumers: notification-relay; provisioning-worker (triggers force-release).
payments.balance_credited¶
| Field | Type | Notes |
|---|---|---|
user_id |
uuid | |
amount_minor |
int | always positive |
currency |
string | |
source |
string | stripe / internal_credit / admin_grant |
provider_ref |
string | dedupe key |
Producer: webhook-worker / payments service. Consumers: billing-worker (record credit), notification-relay.
Envelope¶
All subjects use the same envelope:
{
"event_id": "uuid",
"event_type": "provisioning.active",
"occurred_at": "2026-05-12T12:34:56Z",
"version": "1.0",
"correlation_id": "uuid",
"payload": { /* per-subject schema above */ }
}
Idempotency¶
- Producer: outbox-relay uses
FOR UPDATE SKIP LOCKEDonoutbox_events. Multiple relay replicas don't double-publish. - Consumer: Temporal workflow id derived from
event_idfor natural idempotency. Other consumers use processed-events table.