Skip to content

Implement SLURM cluster autodiscovery #7

@gregorweiss

Description

@gregorweiss

Query a local SLURM scheduler and return a structured representation of available resources (partitions, node types, accounts, QOS policies, GPU types).

Motivation

Every SLURM-facing feature requires the user to manually specify partition names, account strings, GPU types, and core
counts. SLURM already exposes this via sinfo and sacctmgr. Autodiscovery removes redundant manual configuration and
enables downstream features to make resource-aware decisions automatically.

No dependencies. This issue also creates the mdfactory/performance/ package.

Scope

  • Create mdfactory/performance/__init__.py
  • mdfactory/performance/cluster.py
  • Dataclasses: NodeType, Partition, ClusterInfo
  • discover_cluster() -> ClusterInfo — parses sinfo/sacctmgr output
  • select_partition(cluster, needs_gpu, min_cpus, min_mem_gb) -> Partition | None
  • Graceful degradation: returns None when SLURM commands unavailable
  • Session-level caching (cluster topology doesn't change mid-session)

SLURM commands used

Command Information extracted
sinfo -N --noheader -o "%P %n %c %m %G %f %l %T" Partition, node, CPUs, memory, GPUs (gres), features, timelimit,
state
sacctmgr show assoc user=$USER format=Account --noheader --parsable2 Accounts available to current user
sacctmgr show qos format=Name,MaxWall,MaxTRES --noheader --parsable2 QOS policies with limits

Test structure

Tests use flat files: mdfactory/tests/test_cluster.py (not a subdirectory).

Acceptance criteria

  • from mdfactory.performance import cluster imports without error
  • Unit tests with mocked sinfo/sacctmgr output pass
  • Returns None gracefully on non-SLURM machine (laptop)
  • Parses real sinfo output format correctly (multi-GPU nodes, mixed partitions)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions