feat: add scheduler backends (local/k8s/slurm) with unified CLI by TerrenceZhangX · Pull Request #10 · microsoft/FlowSim

TerrenceZhangX · 2026-03-19T19:11:09Z

Summary

Add a schedulers/ package with three job scheduler backends for stage profiling, plus a unified flowsim CLI.

What's New

Scheduler Backends (`schedulers/`)

local.py — run profiling in a local Docker container
k8s.py — submit as Kubernetes Jobs (PVC/hostPath storage)
slurm.py — submit via sbatch/squeue/scancel (CLI mode); supports docker, enroot, and bare-metal container runtimes; auto-mounts output dir; --exclusive GPU allocation
base.py — abstract base class + ProfileJobSpec dataclass; auto-generated job names include timestamp for uniqueness
config.py — YAML config loading from ~/.flowsim/ with env var overrides
templates/ — annotated config templates for K8s and Slurm

CLI (`scripts/cli/`)

flowsim init {k8s,slurm} — install config template (or --config to use your own)
flowsim submit — submit profiling jobs (local/k8s/slurm), with --dry-run and --sweep-file
flowsim status/logs/cancel/list — job lifecycle management
--sweep support for multi-point profiling in one job

Test Infrastructure (`tests/integration/infra/`)

Kind cluster config + GPU passthrough (CDI mode)
Slurm Docker Compose cluster (slurmctld + compute node with runtime: nvidia); host paths parameterized via $HOST_WORKSPACE
dev-setup.sh / dev-teardown.sh for one-click test environments

Tests

45 unit tests (test_scheduler_cli.py)
Integration tests for all 3 backends (test_scheduler.py)

Stats

23 new files, 5 modified
+4,908 / −179 lines

Usage

# Install
pip install -e .

# Initialize config
flowsim init k8s
flowsim init slurm --config my-slurm.yaml

# Submit — same workload, different schedulers
flowsim submit --scheduler local \
    --collect all \
    --model-path workload/models/configs/Qwen3-235B-A22B \
    --tp 1 --bs 1 --input-len 2048 --existing-ctx 0 --decode-tokens 2 --gpus 1 \
    --extra-server-opts "--load-format dummy"

flowsim submit --scheduler k8s \
    --collect all \
    --model-path workload/models/configs/Qwen3-235B-A22B \
    --tp 1 --bs 1 --input-len 2048 --existing-ctx 0 --decode-tokens 2 --gpus 1 \
    --extra-server-opts "--load-format dummy"

flowsim submit --scheduler slurm \
    --collect all \
    --model-path workload/models/configs/Qwen3-235B-A22B \
    --tp 1 --bs 1 --input-len 2048 --existing-ctx 0 --decode-tokens 2 --gpus 1 \
    --slurm-partition gpu \
    --extra-server-opts "--load-format dummy"

# Multi-point sweep (works with any scheduler)
flowsim submit --scheduler local \
    --collect all \
    --model-path workload/models/configs/Qwen3-235B-A22B \
    --sweep 1:2048:0 4:2048:0 8:2048:0 \
    --decode-tokens 2 --gpus 1 \
    --extra-server-opts "--load-format dummy"

# Job management
flowsim status --scheduler k8s --job <job-name>
flowsim logs   --scheduler k8s --job <job-name>
flowsim cancel --scheduler k8s --job <job-name>
flowsim list   --scheduler slurm

Add a scheduler package with: - ProfileJobSpec dataclass for all profiling parameters - BaseScheduler ABC with render/submit/dry_run interface - K8sScheduler: generates valid K8s Job YAML with GPU resources, PVC/hostPath volumes, nodeSelector, serviceAccount support - SlurmScheduler: generates sbatch scripts with docker/enroot/bare-metal container runtimes, module loading, and custom #SBATCH directives - scripts/submit_profile.py: unified CLI entry point with --scheduler {k8s,slurm}, --dry-run (default) and --submit modes Zero external dependencies — uses only Python stdlib.

K8s: - render() now builds a dict and serializes via yaml.safe_dump (falls back to json.dumps if PyYAML is absent). Fixes YAML injection when values contain : # or quotes. - submit() uses the 'kubernetes' Python client (kubeconfig / in-cluster). - New args: --k8s-kubeconfig, --k8s-context. Slurm: - submit() now posts to slurmrestd REST API via urllib.request (stdlib). - Supports JWT auth, configurable API version (v0.0.39–v0.0.41+), and TLS certificate verification toggle. - New args: --slurm-rest-url, --slurm-jwt-token, --slurm-api-version, --slurm-no-verify-ssl. render() / dry-run remain zero-dependency (stdlib only). submit() requires 'kubernetes' package for K8s; Slurm uses stdlib.

- Core deps: requests, perfetto, numpy, pandas - Optional dependency groups: k8s: kubernetes>=27.0, PyYAML>=6.0 slurm: (stdlib only, no extra deps) sim: scalesim, scipy, torch viz: matplotlib, seaborn api: fastapi, pydantic, uvicorn dev: black, pytest all: everything - Entry point: flowsim-submit -> scripts.submit_profile:main - requires-python >= 3.10

- Add scripts/__init__.py so 'scripts' is a findable package - Remove sys.path hack from submit_profile.py (not needed after install) - Add [tool.setuptools.packages.find] with explicit include list (excludes tests/ and backend/ from the installable package) - Improve K8s submit error: catch both kubeconfig and in-cluster failures and show a single clear message with --k8s-kubeconfig hint Verified: pip install -e '.[k8s]' -> flowsim-submit --dry-run works.

- Add scripts/cli.py with subcommand routing (flowsim {submit, ...}) - Entry point changed: flowsim-submit -> flowsim - 'flowsim submit' delegates to submit_profile.main() - Extensible for future subcommands (profile, parse, simulate)

Removed the redundant --submit flag. The subcommand name already implies submission; --dry-run is the opt-out.

- Slurm: fail fast if --slurm-rest-url or --slurm-jwt-token missing - K8s: warn to stderr when no explicit kubeconfig/context provided - --dry-run skips validation (no cluster needed for manifest preview)

Connection params now read from environment variables as defaults, so you don't have to pass them every invocation: K8s: KUBECONFIG -> --k8s-kubeconfig FLOWSIM_K8S_NAMESPACE -> --k8s-namespace FLOWSIM_K8S_CONTEXT -> --k8s-context Slurm: FLOWSIM_SLURM_REST_URL -> --slurm-rest-url FLOWSIM_SLURM_JWT_TOKEN -> --slurm-jwt-token FLOWSIM_SLURM_PARTITION -> --slurm-partition FLOWSIM_SLURM_TIME -> --slurm-time FLOWSIM_SLURM_API_VERSION -> --slurm-api-version CLI flags override env vars. Env var names shown in --help.

No more built-in defaults for cluster connection params. Users must configure before submitting: flowsim init # copies templates to ~/.flowsim/ vim ~/.flowsim/k8s.yaml # fill in kubeconfig, namespace, etc. vim ~/.flowsim/slurm.yaml # fill in rest_url, partition, etc. flowsim submit ... # works Changes: - Add 'flowsim init' subcommand (copies templates, --force to overwrite) - Split config into ~/.flowsim/k8s.yaml and ~/.flowsim/slurm.yaml - Templates have empty REQUIRED fields — submit fails if unfilled - Config loader: schedulers/config.py with per-scheduler load functions - Priority: CLI flag > env var > config file (no silent fallbacks) - Slurm jwt_token_cmd: execute a command to get token at submit time - --dry-run skips all validation (no config needed for preview)

- flowsim init k8s --kubeconfig ... --namespace ... - flowsim init slurm --rest-url ... --partition ... --account ... - Required fields enforced by argparse, --help shows everything - --force to overwrite existing config - Demote --dry-run to [debug] in submit help text - Remove template-copy approach, use _save_yaml() directly

Docker test environments: - kind-multi-node.yaml: 1 control-plane + 2 workers (GPU 0, GPU 1) - slurm-compose.yaml: slurmctld + 2 slurmd (GPU 0, GPU 1) + slurmrestd - slurm-node.dockerfile + slurm.conf: Slurm 23.11 with JWT auth PD disaggregation: - ProfileJobSpec: disagg_mode, disagg_transfer_backend, disagg_bootstrap_port, disagg_prefill_pp, disagg_ib_device - as_prefill() / as_decode() helpers for creating PD pairs - BaseScheduler: render_pd_pair() and submit_pd_pair() - CLI: --pd flag submits prefill + decode job pair - --disagg-transfer-backend (mooncake/nixl), --disagg-bootstrap-port, etc. Bugfix: - resolve_jwt_token: catch FileNotFoundError when jwt_token_cmd binary missing

- dev-setup.sh: auto-installs kind/kubectl, creates kind cluster, starts Slurm compose, runs flowsim init — all in one command - dev-teardown.sh: tears down both clusters cleanly - Supports 'kind', 'slurm', or 'all' (default) targets - Verified: kind cluster creation + K8s Job submit + PD pair submit all work

- LocalScheduler runs profiling via subprocess on this machine - --local-gpus to set CUDA_VISIBLE_DEVICES (e.g. '0' or '0,1') - --local-workdir for custom working directory - No cluster config needed; replaces manual 'python scripts/run_stage_profile.py' - Supports --pd for local PD disaggregation testing - Skips cluster connection validation for local scheduler

Tests cover: - ProfileJobSpec: job name, server opts, disagg params, as_prefill/decode - K8sScheduler.render: YAML validity, namespace, GPU resources, PVC, hostPath, nodeSelector, serviceAccount, labels, PD pair - SlurmScheduler.render: shebang, sbatch directives, docker/enroot/bare, modules, extra sbatch, constraint, time parsing - LocalScheduler.render: GPU selection, workdir, env vars - CLI init: help, required args, bad kubeconfig, save/load config, overwrite protection, --force - CLI submit: help, dry-run for local/k8s/slurm, PD pair, nixl backend - Config: save/load yaml, jwt_token static/cmd/bad_cmd, cfg_get All tests run inside the FlowSim Docker container.

…8s submit without PVC - log_dir is now derived as {output_dir}/logs/ (single volume covers both) - LocalScheduler.submit() tees stdout/stderr to log files in real time - K8s submit refuses if no --k8s-pvc or --k8s-host-output-dir (prevents data loss) - Slurm output_dir defaults to ~/flowsim_traces (shared filesystem) - Local output_dir defaults to {project}/stage_traces/ - Add flowsim status/logs subcommands (K8s via API, Slurm via slurmrestd, local via log files) - Submit prints result location + follow-up commands after every job - Add integration tests for local scheduler

…of dumping content

- TestLocalScheduler: real TP=1 profiling, verify traces + logs + status/logs CLI - TestK8sScheduler: dry-run YAML (PVC mount, hostPath, log paths), refuse without storage, real Job submit to Kind cluster with status/logs verification - TestSlurmScheduler: dry-run sbatch script (output_dir, log_dir, PD pair) Results: 9 passed, 1 skipped (K8s real submit skipped in container, passes on host)

- Add JobResult dataclass: submit() now returns structured data (job_id, scheduler, state, output_dir, message) instead of string - Add flowsim cancel: K8s (delete_namespaced_job), Slurm (DELETE via slurmrestd), local (no-op for synchronous jobs) - Add flowsim list: list FlowSim jobs with --status filter K8s (label_selector=app=flowsim), Slurm (slurmrestd /jobs), local (scan log files) - Add --follow / -f to flowsim logs: shows tail -f / kubectl logs -f commands for real-time log streaming - submit_pd_pair() now returns list[JobResult] instead of string - Post-submit output shows cancel/list/follow commands

Two-pass argparse: peek --scheduler with a minimal pre-parser, then add only the relevant scheduler's options before full parse. 'flowsim submit --scheduler local --help' no longer shows k8s/slurm args.

Most systems (Ubuntu, Debian) don't have 'python' symlink by default.

Before: flowsim-perf-qwen3-8b_1773771736.stdout.log After: flowsim-perf-qwen3-8b_20260317_184236.stdout.log list_jobs() regex updated to support both old epoch and new formats.

- Add submit_via='cli' mode to SlurmScheduler, using sbatch/squeue/scancel subprocess calls instead of slurmrestd REST API (which has JWT auth issues in Slurm 23.11 docker containers). - Add cli_prefix param for running commands via docker exec. - Use scontrol show job for status (works without slurmdbd). - Slurm compose: base image on flowsim-image:latest, compile Slurm 23.11 with NVML support, cgroup/v1, explicit GRES config. - Slurm test passes in ~76s (same as K8s test). - K8s test uses host mount for traces (no docker cp). - All three backends (local, k8s, slurm) tested and working.

… local/k8s)

Usage: flowsim submit --scheduler local --collect perf --model-path Qwen/Qwen3-8B \ --sweep 1:2048:0 4:8192:0 16:2048:4096 Or from file: flowsim submit --scheduler local --collect perf --model-path Qwen/Qwen3-8B \ --sweep-file sweep_points.txt Each point is a BS:INPUT_LEN:CTX tuple. One server launch, multiple profile points sequentially. Backwards compatible: without --sweep, --bs/--input-len/--existing-ctx still works as single-point.

Two tests in TestLocalSweep: - test_sweep_inline: --sweep 1:2048:0 1:4096:0 1:2048:2048 - test_sweep_file: same points read from a temp file Also fix: use single --sweep with multiple values (nargs=+) instead of repeated --sweep flags which argparse would override.

- Extract resolve_default() to config.py (was _d() duplicated in submit/status) - Extract parse_sweep_point()/load_sweep_file() to scripts/__init__.py - K8s: submit() reuses _load_k8s() instead of duplicating kubeconfig logic - K8s: remove unused kubernetes imports in status()/logs() - Local: move inline imports (glob/re/shlex/threading) to module level - Local: remove dead if-branch in list_jobs (always set Completed) - Slurm: default submit_via='cli', deprecate REST mode with DeprecationWarning - Slurm: add TODO for _logs_cli (currently returns status info only) - CLI: flowsim init slurm supports --submit-via/--cli-prefix, rest-url optional - Template: slurm.yaml updated for CLI-first workflow - run_stage_profile: fix _run_perf sentinel bs=0 -> Optional[int]

…faults) - slurm.py: fix module docstring (no longer says 'posts to slurmrestd') - local.py: remove unused stderr/stderr_size vars in list_jobs() - k8s.py: extract _k8s_job_state() helper (was duplicated in status+list_jobs) - README: update Slurm default to cli, mark REST as deprecated, fix init example

- Delete all slurmrestd REST methods (submit/cancel/status/logs/list) - Remove ssl, urllib, json imports from slurm.py - Remove REST constructor params (rest_url, jwt_token, api_version, verify_ssl, submit_via) - Remove resolve_jwt_token() from config.py - Remove REST CLI args from submit_profile.py, status_profile.py, cli.py - Strip REST fields from slurm.yaml template - Remove JWT-related tests, update init/submit tests - Rewrite schedulers/README.md entirely in English, no REST references - 56 unit tests pass, net -524 lines

…templates - Move dev-setup.sh, dev-teardown.sh, slurm-compose.yaml, slurm-node.dockerfile, kind-multi-node.yaml, slurm.conf, cgroup.conf, gres.conf from dockerfiles/ to tests/integration/infra/ - Delete schedulers/templates/ (unused by code; flowsim init generates config directly from CLI args) - Update all path references in README, config.py, test files, and shell script comments - dockerfiles/ now contains only cuda12.6.dockerfile (app image)

- Remove disagg_mode, disagg_transfer_backend, disagg_bootstrap_port, disagg_prefill_pp, disagg_ib_device fields from ProfileJobSpec - Remove as_prefill(), as_decode(), render_pd_pair(), submit_pd_pair() - Remove --pd, --disagg-* CLI args from submit_profile.py - Remove PD branch from main() submit/dry-run logic - Remove 8 PD-related unit tests - Remove PD Disaggregation section from README - 48 unit tests pass

- flowsim init k8s → writes commented k8s.yaml template to ~/.flowsim/ - flowsim init slurm → writes commented slurm.yaml template - Users edit the file directly (comments explain each field) - Removed ~60 lines of argparse init code - Kept --force overwrite logic - Updated README examples and tests (43 pass)

- flowsim init k8s --config my.yaml → installs user file to ~/.flowsim/ - flowsim init k8s → writes annotated template (unchanged) - Added 2 tests: config copy + missing file error

- Move annotated templates to schedulers/templates/{k8s,slurm}.yaml - flowsim init k8s → copies bundled template to ~/.flowsim/ - flowsim init k8s --config my.yaml → copies user file instead - Remove inline template strings from cli.py

- Root README: replace manual docker run profile/parse with flowsim submit - Schedulers README: remove redundant How It Works, inline YAML examples, scattered test sections - Unify model/params across both READMEs (Qwen3-235B-A22B, tp=1, gpus=1, --load-format dummy) - Add Scheduler Backends section to root README linking to schedulers/README.md

… decode-tokens to 2

…ilter - Add timestamp suffix (-MMDD-HHMMSS) to auto-generated job names for uniqueness - Add #SBATCH --exclusive to Slurm scripts for profiling GPU isolation - Remove flowsim- prefix filter from Slurm list_jobs (let users filter) - Add --sweep-file to scheduler README Common Parameters table

- cli.py → cli/__init__.py (entry point + init command) - submit_profile.py → cli/submit.py (flowsim submit) - status_profile.py → cli/manage.py (flowsim status/logs/list/cancel) - Update all import paths in tests

Replace deploy.resources.reservations with runtime:nvidia + NVIDIA_VISIBLE_DEVICES to fix NVML initialization failure in slurmd-0.

… header - Remove misleading 'local' suffix (file tests all 3 backends) - Add test methodology (How It Works) and Pass Criteria to docstring - Update file references in schedulers/README.md

Sync the output tree in both README.md and schedulers/README.md to reflect the actual directory layout produced by profiling jobs: - Add logs/ with server, shape_server, and job log entries - Add merged/ and shape_traces/ + shape_parsed/ inside point dirs - Add brief descriptions of each subdirectory in root README

Copilot

Pull request overview

This PR introduces a new schedulers/ package (local/K8s/Slurm backends) and a unified flowsim CLI to submit and manage stage-profiling jobs across environments, including sweep (multi-point) profiling support and associated unit/integration test infrastructure.

Changes:

Added scheduler backends (local, k8s, slurm) built around a shared ProfileJobSpec + BaseScheduler API.
Added unified flowsim CLI (init, submit, status/logs/list/cancel) with YAML-based config templates under ~/.flowsim/.
Added unit tests plus integration tests and provisioning scripts for Kind and a docker-compose Slurm test cluster; enhanced stage profiling script with --sweep / --sweep-file.

Reviewed changes

Copilot reviewed 27 out of 28 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
tests/unit/test_scheduler_cli.py	Unit coverage for CLI parsing, config install, and backend renderers.
tests/integration/test_scheduler.py	End-to-end integration tests for local/k8s/slurm and sweep output validation.
tests/integration/infra/slurm.conf	Slurm test cluster configuration used by docker-compose infra.
tests/integration/infra/slurm-node.dockerfile	Container image for Slurm test cluster nodes (built atop flowsim image).
tests/integration/infra/slurm-compose.yaml	Docker compose topology for the local Slurm integration cluster.
tests/integration/infra/kind-multi-node.yaml	Kind cluster definition for K8s integration testing with GPU passthrough.
tests/integration/infra/gres.conf	Slurm GRES GPU definition for the test cluster.
tests/integration/infra/cgroup.conf	Slurm cgroup plugin config for containerized environments.
tests/integration/infra/dev-setup.sh	One-shot setup for Kind + Slurm test environments.
tests/integration/infra/dev-teardown.sh	Teardown script for test clusters.
simulator/base_parser.py	Small refactor in annotation parsing initialization.
scripts/run_stage_profile.py	Adds sweep support and adjusts defaults/log-dir handling for profiling runs.
scripts/cli/submit.py	Adds `flowsim submit` implementation (render/dry-run/submit) and config-based defaults.
scripts/cli/manage.py	Adds job lifecycle management commands (status/logs/list/cancel) across schedulers.
scripts/cli/init.py	Adds unified `flowsim` entry point and `init` config installer.
scripts/init.py	Adds shared sweep parsing utilities used by CLI + profiling script.
schedulers/templates/slurm.yaml	Slurm config template installed via `flowsim init slurm`.
schedulers/templates/k8s.yaml	K8s config template installed via `flowsim init k8s`.
schedulers/slurm.py	Slurm backend: sbatch script rendering + CLI-mode submission/status/logs/list/cancel.
schedulers/local.py	Local backend: `docker run` rendering + synchronous execution with log capture.
schedulers/k8s.py	K8s backend: Job manifest rendering and Python-client submission/status/logs/list/cancel.
schedulers/config.py	Loads per-scheduler YAML configs with env-var override support.
schedulers/base.py	Defines common dataclasses/interfaces and command rendering for schedulers.
schedulers/init.py	Public exports for scheduler package.
schedulers/README.md	User docs for scheduler usage, config, parameters, and output layout.
pyproject.toml	Packages CLI entry point and declares dependencies/extras.
README.md	Updates top-level documentation to use `flowsim` CLI and documents stage profiling/schedulers.
.gitignore	Ignores generated stage traces directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

scripts/cli/submit.py

schedulers/local.py

tests/integration/infra/kind-multi-node.yaml

tests/integration/infra/slurm-compose.yaml

schedulers/local.py

tests/integration/infra/slurm-compose.yaml

tests/integration/infra/slurm.conf

schedulers/slurm.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Replace absolute /home/administrator/… bind mounts with: - ${HOST_WORKSPACE} env var for the read-only /workspace mount - Relative path ../../../stage_traces for the writable traces mount dev-setup.sh now exports HOST_WORKSPACE (defaults to parent of repo root) before invoking docker compose.

Without mounting spec.output_dir into the container, traces and logs are written to the ephemeral container filesystem and lost on exit. Docker mode: prepend -v output_dir:output_dir to the mount list. Enroot mode: append output_dir:output_dir to --container-mounts.

Terrence Zhang added 30 commits March 17, 2026 03:46

fix: 'flowsim submit' submits by default, --dry-run to preview

d37d8f3

Removed the redundant --submit flag. The subcommand name already implies submission; --dry-run is the opt-out.

fix: validate cluster connection params before submit

af48f0c

- Slurm: fail fast if --slurm-rest-url or --slurm-jwt-token missing - K8s: warn to stderr when no explicit kubeconfig/context provided - --dry-run skips validation (no cluster needed for manifest preview)

fix: flowsim logs shows all log files (stdout + stderr) with listing

ea3c27a

fix: flowsim logs shows file locations + actionable commands instead …

eb46c36

…of dumping content

fix: CLI only shows scheduler-specific args based on --scheduler

8cd62f8

Two-pass argparse: peek --scheduler with a minimal pre-parser, then add only the relevant scheduler's options before full parse. 'flowsim submit --scheduler local --help' no longer shows k8s/slurm args.

fix: use python3 instead of python in profile command

84c8953

Most systems (Ubuntu, Debian) don't have 'python' symlink by default.

fix: use YYYYMMDD_HHMMSS timestamp in log filenames

fab6314

Before: flowsim-perf-qwen3-8b_1773771736.stdout.log After: flowsim-perf-qwen3-8b_20260317_184236.stdout.log list_jobs() regex updated to support both old epoch and new formats.

slurm: use YYYYMMDD_HHMMSS timestamp for output dirs (consistent with…

3edd5f4

… local/k8s)

docs: add scheduler README with CLI usage and architecture overview

9bc2d94

Terrence Zhang added 17 commits March 18, 2026 23:48

add --config flag to flowsim init

95028db

- flowsim init k8s --config my.yaml → installs user file to ~/.flowsim/ - flowsim init k8s → writes annotated template (unchanged) - Added 2 tests: config copy + missing file error

update README: reflect template-file init with --config option

da8ab00

format: fix with black

b0dfdd5

docs: add --existing-ctx and --decode-tokens to all examples, default…

9e2541a

… decode-tokens to 2

refactor: remove PyYAML fallback, make it a core dependency

880fe05

fix: reject k8s submit when no PVC or hostPath configured

236548a

docs: add missing parameters

9daee82

refactor: restructure CLI into scripts/cli/ subpackage

2a718a7

- cli.py → cli/__init__.py (entry point + init command) - submit_profile.py → cli/submit.py (flowsim submit) - status_profile.py → cli/manage.py (flowsim status/logs/list/cancel) - Update all import paths in tests

fix: use runtime:nvidia for slurm compute node GPU access

31dc15b

Replace deploy.resources.reservations with runtime:nvidia + NVIDIA_VISIBLE_DEVICES to fix NVML initialization failure in slurmd-0.

refactor: rename test_scheduler_local.py → test_scheduler.py, rewrite…

b7ec2cb

… header - Remove misleading 'local' suffix (file tests all 3 backends) - Add test methodology (How It Works) and Pass Criteria to docstring - Update file references in schedulers/README.md

TerrenceZhangX marked this pull request as ready for review March 19, 2026 23:57

TerrenceZhangX requested a review from Copilot March 19, 2026 23:57

TerrenceZhangX self-assigned this Mar 19, 2026

Copilot started reviewing on behalf of TerrenceZhangX March 19, 2026 23:58 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

TerrenceZhangX and others added 7 commits March 19, 2026 19:09

Update scripts/cli/submit.py

86dd517

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update schedulers/local.py

2dbb896

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update tests/integration/infra/kind-multi-node.yaml

f30329a

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update tests/integration/infra/slurm-compose.yaml

4718764

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update tests/integration/infra/slurm.conf

8f79054

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

TerrenceZhangX merged commit f1859ce into microsoft:main Mar 20, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add scheduler backends (local/k8s/slurm) with unified CLI#10

feat: add scheduler backends (local/k8s/slurm) with unified CLI#10
TerrenceZhangX merged 56 commits intomicrosoft:mainfrom
TerrenceZhangX:zhangt/scheduler-support

TerrenceZhangX commented Mar 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TerrenceZhangX commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's New

Scheduler Backends (schedulers/)

CLI (scripts/cli/)

Test Infrastructure (tests/integration/infra/)

Tests

Stats

Usage

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TerrenceZhangX commented Mar 19, 2026 •

edited

Loading

Scheduler Backends (`schedulers/`)

CLI (`scripts/cli/`)

Test Infrastructure (`tests/integration/infra/`)