Slurm Tenant Scope Semantics v1¶
Purpose¶
Define what a tenant-scoped Slurm product actually means on GPUaaS, how it differs from the current project-scoped reference path, and which platform/app responsibilities must exist before implementation starts.
This document exists because the code and contracts now prove the project-scoped Slurm example path, but tenant-scoped and multi-project Slurm behavior is still only implied across several documents.
Current Status¶
Tenant-scoped Slurm now has a real backend/control-plane path, but it is not yet a fully productized operator flow.
What exists today:
- project-scoped app instances,
- project-scoped service accounts,
- project-scoped access-credential custody and delivery,
- project-scoped allocation selection and worker placement,
- project-scoped Slurm controller discovery by app_slug,
- tenant-owned shared runtime resources and attachments,
- delegated shared-runtime operator identity,
- shared worker and shared worker-operation resources,
- tenant-shared Slurm controller reconcile for shared worker add/drain/remove,
- attached-project worker contribution request path.
Canonical attachment-model follow-on:
- doc/architecture/App_Tenant_Shared_Attachment_Model_v1.md
What does not exist today: - multi-project Slurm queue and account semantics, - tenant-scoped operator workflow in the platform shell, - submitted-job attribution and queue/account policy surfaced in product UI, - end-to-end tenant-shared parity/UI proof.
Definitions¶
Project-scoped Slurm¶
A Slurm instance belongs to one project and serves only that project.
Properties: - jobs are submitted within one project boundary, - controller and worker allocations are selected from that project, - operator service account is project-scoped, - SSH/bootstrap credential is project-scoped, - billing attribution is straightforwardly project-local.
Tenant-scoped Slurm¶
A Slurm control plane is shared across multiple projects inside one tenant.
Properties: - one Slurm control plane may serve more than one project, - cross-project submission is denied by default and must be enabled by explicit policy, - queue/partition/account visibility is app-owned policy on top of platform-owned tenant/project identity, - operator actions may be tenant-scoped even if individual consuming app instances remain project-owned.
Required Product Semantics¶
Before tenant-scoped Slurm can be called supported, the product must define:
- Control-plane ownership
- is the Slurm control plane represented by one tenant-scoped app instance,
-
or by multiple project-owned app instances attached to one tenant-scoped runtime?
-
Allocation ownership
- which projects may contribute controller/worker allocations,
- whether allocations stay project-owned while attached to a tenant-scoped scheduler,
-
how allocation visibility is presented to the operator.
-
Job submission policy
- whether project A may submit into project B owned queues,
- how project membership maps into Slurm accounts, partitions, QoS, or associations,
-
what the default deny rules are.
-
Identity model
- whether the Slurm controller runs under a tenant-scoped service account,
- or a project-owned service account with explicit tenant-wide grants,
-
how submitted jobs are attributed back to platform identities and projects.
-
Credential/data custody
- whether bootstrap and runtime credentials remain project-scoped,
- whether tenant-scoped schedulers require tenant-scoped custody,
-
how secrets and runtime config are separated between platform and app-owned storage.
-
Billing attribution
- whether controller costs are charged to one owner project,
- split across attached projects,
- or charged to a tenant-level shared cost center,
- and how worker/runtime costs are apportioned.
First implemented baseline: - tenant-owned controller and tenant-reserved capacity are charged to the tenant-shared runtime owner record, - project-contributed workers remain charged to the contributing source project.
Recommended First Product Shape¶
If tenant-scoped Slurm is implemented, the first productized version should be narrow:
- One tenant-scoped Slurm control plane per tenant environment.
- Controller node(s) chosen explicitly by a tenant admin.
- Worker allocations chosen explicitly from an allowlisted set of projects in that tenant.
- Cross-project submission disabled by default.
- Project-to-Slurm-account mapping stored as app-owned state.
- Billing policy explicit and visible before deploy.
This keeps the first tenant-scoped slice understandable and avoids pretending that every cross-project scheduler question is solved automatically by the project-scoped app model.
Platform Responsibilities¶
Platform must provide: - tenant/project identity and membership truth, - explicit authorization for tenant-scoped app/operator actions, - allocation read surfaces that can operate across eligible projects when policy allows, - service-account and access-credential custody semantics that match the chosen scope, - auditable read models for attached projects and placement choices.
Platform must not own: - Slurm account/partition/QoS mapping logic, - Slurm queue policy semantics, - runtime-specific scheduler recovery behavior.
App Responsibilities¶
The Slurm app must own: - mapping platform tenant/project identity into Slurm-native concepts, - runtime config generation for tenant-shared schedulers, - attached-project bookkeeping, - queue/account policy enforcement inside the Slurm runtime, - scheduler-native operational state.
This likely implies app-owned persistent state even for a tenant-dedicated scheduler product.
UI Expectations¶
Tenant-scoped Slurm should not reuse the current project-scoped UI unchanged.
The operator workflow will need explicit UI for: - choosing tenant-shared mode, - selecting attached projects, - selecting controller and worker allocations across those projects, - showing which projects may submit jobs, - showing billing/ownership consequences.
This should be built through the platform shell extension model, but it is a distinct product flow, not a hidden variant of the current single-project Slurm deploy form.
Readiness Judgment¶
Tenant-scoped Slurm is no longer only a design task.
The platform/backend path is now real enough to support: - tenant-owned shared runtime lifecycle, - attached-project contribution, - shared worker topology, - delegated operator reconcile.
What is still not finished is the product surface: - operator UI for tenant-shared deploy/attach/contribute, - clearer queue/account/job-submission policy presentation, - live parity validation through that UI path.
Immediate Next Decisions¶
- Decide whether tenant-scoped Slurm is modeled as:
- one tenant-owned scheduler control plane,
-
or project-owned app instances attached to a tenant-shared runtime.
-
Decide the first billing rule for a tenant-shared scheduler.
-
Decide the identity model for the app controller in tenant scope.
-
Decide the first attached-project and queue-visibility rules.
-
Only after those decisions, define the tenant-scoped deploy contract and UI.
Related Docs¶
doc/architecture/App_Tenant_Shared_Attachment_Model_v1.mddoc/architecture/App_Runtime_Operating_Modes_v1.mddoc/architecture/App_Runtime_Billing_Model_v1.mddoc/architecture/Scheduler_as_Platform_App_v1.mddoc/architecture/Shared_Runtime_Worker_Topology_v1.md