Audit & compliance¶
Implemented
packages/services/admin/ · doc/architecture/Audit_Presentation_Model_v1.md · scripts/ci/audit_*.sh · doc/architecture/Encryption_Envelope_Spec.md · doc/architecture/Partitioning_and_Retention_Strategy.md
GPUaaS records an immutable audit row for every privileged mutation, with allowlisted metadata and per-tenant partitioning. This page covers what's audited, how the immutability is enforced at multiple layers, the metadata allowlist, the API surface, retention, and the compliance posture.
What gets audited¶
mindmap
root((Audited<br/>actions))
User mutations
user.create
user.balance.adjust
user.disable
user.delete
Allocation
allocation.create
allocation.release
allocation.force_release
allocation.restart
Node
node.create
node.delete
node.drain
node.slot.approve
node.slot.disable
Payments
refund.create
payment.reconcile
payment.session.cancel
Auth
service_account.create
service_account.token.revoke
session.revoke
Policy
policy.update
policy.bound_violation
Storage
storage.namespace.create
storage.object.delete (privileged path)
Privilege denials
authz.deny
rate_limit.block
Every action above writes a row to audit_logs in the same transaction as the domain change.
Row shape¶
erDiagram
audit_logs {
uuid id PK
uuid actor_user_id "claim sub or service_account.id"
text actor_role "user|admin|service_account|system"
text action "allocation.force_release"
text target_type "allocation|user|node|policy|..."
text target_id "canonical id"
text result "success|failure"
text correlation_id "uuid"
jsonb metadata "allowlisted keys only"
timestamp created_at
text actor_ip "may be null for system actions"
text user_agent
}
Hard column properties (no updated_at, no deleted_at):
-- per-service Postgres role grants:
-- GPUaaS audit table:
-- SELECT, INSERT → svc_admin
-- No UPDATE, no DELETE for any role
Immutability — multi-layer¶
flowchart LR
classDef block fill:#f8d7da,stroke:#42101e
classDef ok fill:#d1e7dd,stroke:#0a3622
UPDATE[UPDATE audit_logs] --> L1{Application<br/>code review}
L1 -- caught --> B1[PR blocked]:::ok
L1 -- missed --> L2{"ORM layer<br/>repo functions only<br/>expose Insert"}
L2 -- caught --> B2[Compile/runtime error]:::ok
L2 -- missed --> L3{DB grants}
L3 -- caught --> B3[Postgres deny]:::ok
L3 -- missed --> L4{Replica/<br/>partitioned table is read-only<br/>for older partitions}
L4 --> B4[Old partitions read-only<br/>via permissions]:::ok
DELETE[DELETE audit_logs] -.same layers.-> L1
Four layers (any one of which would block the attempt):
- Code review + CI gates (
scripts/ci/audit_mandatory_guard.sh,audit_presence_guard.sh) - ORM/repo layer — only
Insertis exposed - Postgres grants — no UPDATE / DELETE for any application role
- Partition table permissions — historical partitions are read-only
Metadata allowlist¶
audit_logs.metadata jsonb has an explicit allowlist. Unknown keys rejected at write time:
| Allowed key | Purpose |
|---|---|
reason |
Operator-supplied reason |
policy_key |
When action changes a policy value |
old_value |
Before-image (policy / balance / status) |
new_value |
After-image |
status_from |
Lifecycle transitions |
status_to |
Lifecycle transitions |
error_code |
When result=failure |
request_scope |
Resolved scope (tenant/project) |
idempotency_key_hash |
Hashed key, not raw |
provider_ref |
External id (Stripe payment id, etc.) |
allocation_id |
Cross-reference |
node_id |
Cross-reference |
Forbidden (never appears in audit_logs.metadata):
- Raw tokens (access / refresh / id)
- Raw credentials (passwords, API keys)
- SSH private or public key material
- Full request / response payload dumps
- Direct payment instrument data (PAN, CVV)
- End-user PII beyond stable IDs
Audit flow per mutation¶
sequenceDiagram
autonumber
participant U as User / Admin
participant API as Handler
participant SVC as Service
participant DB as Postgres
participant SAN as middleware.Sanitize
U->>API: privileged mutation request
API->>SAN: scrub PII/credentials from log line
SAN-->>API: sanitized
API->>SVC: domain call
SVC->>DB: BEGIN
SVC->>DB: domain mutation
SVC->>DB: INSERT audit_logs<br/>(actor, action, target, result, correlation_id, metadata)
SVC->>DB: INSERT outbox row
SVC->>DB: COMMIT
SVC-->>API: outcome
API-->>U: response with correlation_id
Note over API,DB: If COMMIT fails, NOTHING happened —<br/>audit + domain + outbox all rolled back.<br/>If COMMIT succeeds, ALL three durable.
This transactional triple-write is the single most important property of the audit subsystem.
CI enforcement¶
flowchart TB
PR[PR opened] --> GATE1[audit_mandatory_guard.sh]
GATE1 --> CHK1{Privileged handler<br/>writes audit_logs<br/>in same tx?}
CHK1 -- no --> BLOCK1[Block PR]
CHK1 -- yes --> GATE2[audit_presence_guard.sh]
GATE2 --> CHK2{Integration test<br/>asserts audit row?}
CHK2 -- no --> BLOCK2[Block PR]
CHK2 -- yes --> OK([gates pass])
classDef block fill:#f8d7da,stroke:#42101e
classDef ok fill:#d1e7dd,stroke:#0a3622
class BLOCK1,BLOCK2 block
class OK ok
Acceptance matrix (PRD)¶
| AT | Check |
|---|---|
| AT-080 | Privileged mutations (provision, release, refund, admin node ops) each produce a structured audit entry |
| AT-081 | Audit log entries contain actor_user_id, actor_role, action, target_type, target_id, result, correlation_id |
| AT-082 | Failed authorization attempts recorded with result=failure |
| AT-083 | Audit log entries immutable — no update/delete path exposed |
API surface¶
| Endpoint | Auth | Purpose |
|---|---|---|
GET /api/v1/admin/audit-logs |
admin | Paginated list with filter: actor, action, target_type, target_id, from, to, result, correlation_id |
GET /api/v1/admin/audit-logs/{id} |
admin | Single entry |
GET /api/v1/admin/audit-logs.csv |
admin | CSV export |
GET /api/v1/admin/audit-logs/by-correlation/{cid} |
admin | All rows sharing a correlation id (incident reconstruction) |
PRD §FR-11: admin can query and export audit logs for compliance and incident response.
Retention model¶
flowchart LR
classDef active fill:#d1e7dd,stroke:#0a3622
classDef hot fill:#fff3e0,stroke:#e65100
classDef cold fill:#e3f2fd,stroke:#1565c0
classDef archived fill:#eceff1,stroke:#455a64
M0[Current month<br/>audit_logs_y2026m05]:::active
M1[Last month]:::hot
M2[Up to 12 months ago]:::hot
M3[12-24 months]:::cold
M4[Archived to object storage<br/>read-only restore path]:::archived
M0 --> M1 --> M2 --> M3 --> M4
Partitioning model (see Partitioning_and_Retention_Strategy.md):
audit_logs— partitioned by month, retained long-term to meet compliance.usage_records— partitioned by month, medium retention.ledger_entries— partitioned by year, retained indefinitely (compliance).node_tasks— short retention; older than 30 days archived.
Compliance posture¶
| Property | How |
|---|---|
| Immutable financial ledger | ledger_entries never UPDATE/DELETE |
| Immutable audit | audit_logs never UPDATE/DELETE |
| Right-to-erasure (GDPR-style) | Tenant-level deletion + audit retention exemption — design captured in assumptions register, not implemented in MVP |
| Data residency | region_code first-class on resources; canonical resource identifier carries region |
| Encryption at rest | KMS-backed envelope encryption — Encryption_Envelope_Spec.md |
| Encryption in transit | TLS everywhere; mTLS internal |
| Access reviews | Project + tenant memberships have auditable lifecycle (grant + revoke + soft-delete) |
| Separation of duties | Admin actions auditable; payment refunds require dedicated API (not generic balance adjustment) |
| Webhook integrity | Stripe signature on raw body; dedupe by event_id; AT-053 |
Encryption envelope (high level)¶
flowchart LR
DATA[Field value<br/>e.g. SSH key, payment ref] --> WRAP[wrap with DEK<br/>data encryption key]
WRAP --> CIPH[ciphertext + IV + tag]
CIPH --> STORE[("Stored in DB:<br/>ciphertext + key_version")]
DEK[Per-record DEK] --> WRAP
KEK[KEK in KMS] -.wraps DEKs.-> DEK
KEK -.rotation.-> KMS[(KMS)]
classDef secret fill:#fff3e0,stroke:#e65100
class DEK,KEK,KMS secret
→ Detail: Encryption_Envelope_Spec.md. Key rotation runbook: Key Rotation and Compromise Response.
Operator queries (common patterns)¶
-- Who force-released which allocations in the last 24h?
SELECT created_at, actor_user_id, target_id, metadata->>'reason'
FROM audit_logs
WHERE action = 'allocation.force_release'
AND created_at > now() - interval '24 hours'
ORDER BY created_at DESC;
-- All actions in one incident
SELECT created_at, actor_user_id, action, target_type, target_id, result
FROM audit_logs
WHERE correlation_id = $1
ORDER BY created_at;
-- Failed authorization attempts on financial routes (per AT-082)
SELECT created_at, actor_user_id, action, metadata->>'error_code'
FROM audit_logs
WHERE result = 'failure'
AND target_type IN ('payment_session', 'refund_record', 'ledger_entry')
AND created_at > now() - interval '7 days';