diff --git a/.gitignore b/.gitignore index 706276b..b70854b 100644 --- a/.gitignore +++ b/.gitignore @@ -6,4 +6,5 @@ tests/test-artifacts/ unknown_kernels.json /artifacts /server_profile -/server_simulate \ No newline at end of file +/server_simulate +/stage_traces/ \ No newline at end of file diff --git a/README.md b/README.md index c4a674e..6fa4289 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,8 @@ The project supports rapid deployment using Docker, includes scripts for environ ## Table of Contents - [Getting Started](#getting-started) +- [Stage Profiling](#stage-profiling) +- [Scheduler Backends](#scheduler-backends) - [For Developers](#for-developers) - [Risks and limitations](#risks-and-limitations) - [License](#license) @@ -49,228 +51,130 @@ make build-docker This creates a local image named `flowsim-image` with FlowSim patches already applied to sglang. -### 2. Run Profile → Parse → Simulate +### 2. Profile (Generate Traces) -Create workspace directories on your host for storing traces and results: +Use `flowsim submit` to capture stage-separated traces (EXTEND + DECODE), parse them, and run cross-rank analysis — all in one step. See [Stage Profiling](#stage-profiling) for how stages and collection modes work. ```bash -mkdir -p /data/flowsim-profile -mkdir -p /data/flowsim-simulate -``` - -#### Step 1: Profile (Generate Traces) - -```bash -sudo docker run --gpus=all \ - -v /data/flowsim-profile:/workspace/profile \ - -v /data/flowsim-simulate:/workspace/simulate \ - -w /flowsim \ - --cap-add=SYS_ADMIN \ - --network=host \ - --shm-size 911G \ - flowsim-image \ - python scripts/run_profile.py \ - --profile-dir /workspace/profile \ - --log-dir /workspace/profile/logs \ - --bench-timeout 3600 \ - --server-opts "--model-path /flowsim/workload/models/configs/deepseek/ --load-format dummy --tp 4 --ep 4 --host 0.0.0.0 --port 30001 --attention-backend flashinfer --disable-cuda-graph" \ - --bench-opts "--backend sglang --host 0.0.0.0 --port 30001 --dataset-name defined-len --prefill-decode-lens 1024:8 --num-prompts 1 --profile" -``` - -**What this does:** -- Starts an sglang server with profiling enabled -- Runs benchmark requests against it -- Generates `*.trace.json.gz` files in `/data/flowsim-profile` (mounted as `/workspace/profile`) - -**Note:** The first run will be slow (~10 minutes) due to DeepGEMM kernel warmup and compilation. For stable performance, avoid using `--rm` flag and reuse the same container using `sudo docker exec -it bash`. Subsequent runs with similar configurations will be faster. - -**Tip:** -- Adjust `--server-opts` and `--bench-opts` to match your model, parallelism (TP/DP/EP), and workload requirements. All `sglang.launch_server` and `bench_serving.py` parameters are supported. -- Trace files can be visualized using [Perfetto UI](https://ui.perfetto.dev/) by uploading the `.trace.json.gz` files directly. -- For multi-GPU profiling (TP > 1), merge individual traces into a single file for a global view: - ```bash - python /flowsim/utils/merge_trace.py \ - --trace_dir /data/flowsim-profile \ - --output /data/flowsim-profile/merged_trace.json - ``` - Then visualize the merged trace at [Perfetto UI](https://ui.perfetto.dev/). - -#### Step 2: Parse (Convert Trace to CSV) - -```bash -sudo docker run --rm \ - -v /data/flowsim-profile:/workspace/profile \ - -v /data/flowsim-simulate:/workspace/simulate \ - -w /flowsim \ - flowsim-image \ - python -m scripts.run_parse \ - --trace-file /workspace/profile/your-trace-name-TP-0.trace.json.gz \ - --output-dir /workspace/simulate +pip install -e . +flowsim submit --scheduler local \ + --collect all \ + --model-path workload/models/configs/Qwen3-235B-A22B \ + --tp 1 --bs 1 --input-len 2048 --existing-ctx 0 --decode-tokens 2 --gpus 1 \ + --extra-server-opts "--load-format dummy" ``` -Replace `your-trace-name-TP-0.trace.json.gz` with the actual filename from step 1. - -**What this does:** -- Parses the trace file -- Extracts kernel-level information (operator, shapes, dtypes) -- Generates a CSV file and JSON summary in `/data/flowsim-simulate` (mounted as `/workspace/simulate`) - -**Fallback:** If you don't have a GPU or can't run profiling, use the demo trace shipped with the repo: +For K8s / Slurm clusters, see [Scheduler Backends](#scheduler-backends). +**Tip:** Trace files can be visualized at [Perfetto UI](https://ui.perfetto.dev/). For multi-GPU traces, merge them first: ```bash -sudo docker run --rm \ - -v /data/flowsim-simulate:/workspace/simulate \ - -w /flowsim \ - flowsim-image \ - python -m scripts.run_parse \ - --trace-file /flowsim/demo/deepseekv3-TP-0.trace.json.gz \ - --output-dir /workspace/simulate +python utils/merge_trace.py --trace_dir stage_traces/local/*/bs1_input2048_ctx0 --output merged.json ``` -#### Step 3: Simulate (Run Hardware Simulation) +### 3. Simulate (Run Hardware Simulation) -This step requires a running LLMCompass backend. First, build the backend image: +Build and start the LLMCompass backend, then submit parsed traces for kernel-level simulation: ```bash +# Build backend image sudo docker build -t llmcompass-backend -f backend/LLMCompass/Dockerfile backend/LLMCompass/ -``` -Then start the backend: - -```bash -# Terminal 1: Start LLMCompass backend +# Terminal 1: Start backend sudo docker run --rm -p 8000:8000 llmcompass-backend -``` -Then in another terminal, run the simulation: - -```bash # Terminal 2: Run simulation -sudo docker run --rm \ - --network=host \ - -v /data/flowsim-profile:/workspace/profile \ - -v /data/flowsim-simulate:/workspace/simulate \ +sudo docker run --rm --network=host \ + -v /data/flowsim:/workspace \ flowsim-image \ python -m scripts.run_simulate \ - --trace-file /workspace/profile/your-trace-name-TP-0.trace.json.gz \ + --trace-file /workspace/traces/bs1_input2048_ctx0/*-TP-0-EXTEND.trace.json.gz \ --api-url http://127.0.0.1:8000 \ --artifact-dir /workspace/simulate/llmcompass ``` -**What this does:** -- Parses the trace into kernels -- Submits each kernel to the LLMCompass backend `/tasks` API -- Polls until all tasks complete -- Writes request/response artifacts to `/workspace/simulate/llmcompass` - -### 3. Inspect Results - -All generated files are available on your host at `/data/`: +### 4. Inspect Results ```bash -ls -lh /data/flowsim-profile/ # Raw trace files -ls -lh /data/flowsim-simulate/ # Parsed CSV, summary, simulation artifacts +ls -lh /data/flowsim/traces/ # Stage-separated traces + parsed CSVs +ls -lh /data/flowsim/simulate/ # Simulation artifacts ``` --- -## Stage Profiling (`run_stage_profile.py`) +## Stage Profiling -`scripts/run_stage_profile.py` is the single entry-point for **stage-separated** profiling: it captures prefill (EXTEND) and decode traces independently, parses them, runs cross-rank kernel analysis, and optionally collects kernel input shapes. +FlowSim performs **stage-separated** profiling: it captures prefill (EXTEND) and decode traces independently, parses them, runs cross-rank kernel analysis, and optionally collects kernel input shapes. -### Quick reference +### How stages work Each profiling request produces **two** stage-separated traces: - **EXTEND** (prefill) — processes `input_len` new tokens (with optional `existing_ctx` tokens already in KV cache) -- **DECODE** — profiler captures `decode-tokens` decode batch steps - -The profiler captures exactly **one** EXTEND batch and **decode-tokens** DECODE batches per run. +- **DECODE** — captures `decode-tokens` decode batch steps (default 2) -| Flag | Description | Default | -|---|---|---| -| `--input-len` | Number of new prefill tokens per request (EXTEND) | 2048 | -| `--existing-ctx` | Tokens already in KV cache from a prior request (0 = cold prefill) | 0 | -| `--bs` | Batch size (concurrent requests) | 1 | -| `--decode-tokens` | Number of decode tokens to generate (= number of decode batches profiled) | 32 | +### Collection modes | Mode | What it does | |---|---| -| `--collect perf` | Profile a single (bs, input_len, existing_ctx) point → trace (EXTEND + DECODE) → parse → cross-rank analysis | -| `--collect shapes` | Re-run **without CUDA graph** to capture kernel input shapes, then merge into timing CSVs (both EXTEND and DECODE) | -| `--collect all` | Both phases back-to-back (auto-restarts the server in between). Requires `--launch-server`. | - -`--collect` is required. Use `perf`, `shapes`, or `all`. +| `--collect perf` | Profile a single (bs, input_len, existing_ctx) point → trace → parse → cross-rank analysis | +| `--collect shapes` | Re-run **without CUDA graph** to capture kernel input shapes, then merge into timing CSVs | +| `--collect all` | Both phases back-to-back (auto-restarts the server in between) | ### Examples -**Cold prefill** (server already running): - ```bash -python3 scripts/run_stage_profile.py \ +# Basic profiling +flowsim submit --scheduler local \ --collect perf \ - --bs 1 --input-len 2048 --decode-tokens 32 \ - --output-dir /workspace/traces \ - --host 0.0.0.0 --port 30001 -``` + --model-path workload/models/configs/Qwen3-235B-A22B \ + --tp 1 --bs 1 --input-len 2048 --existing-ctx 0 --decode-tokens 2 --gpus 1 \ + --extra-server-opts "--load-format dummy" -**With existing KV cache context:** - -```bash -python3 scripts/run_stage_profile.py \ +# With existing KV cache context +flowsim submit --scheduler local \ --collect perf \ - --bs 4 --input-len 512 --existing-ctx 4096 --decode-tokens 32 \ - --output-dir /workspace/traces \ - --launch-server \ - --server-opts "--model-path Qwen/Qwen3-235B-A22B-FP8 --tp 4 --host 0.0.0.0 --port 30001" -``` + --model-path workload/models/configs/Qwen3-235B-A22B \ + --tp 1 --bs 4 --input-len 512 --existing-ctx 4096 --decode-tokens 2 --gpus 1 \ + --extra-server-opts "--load-format dummy" -**Collect shapes only** (requires a no-CUDA-graph server): - -```bash -python3 scripts/run_stage_profile.py \ - --collect shapes \ - --output-dir /workspace/sweep_P1_tp4 \ - --launch-server \ - --server-opts "--model-path Qwen/Qwen3-235B-A22B-FP8 --tp 4 --host 0.0.0.0 --port 30001" -``` - -When `--collect shapes` is used with `--launch-server`, the server is automatically started with `--disable-cuda-graph --disable-cuda-graph-padding`. - -**Full pipeline** (perf → auto-restart → shapes → merge): +# Full pipeline (perf + shapes) +flowsim submit --scheduler local \ + --collect all \ + --model-path workload/models/configs/Qwen3-235B-A22B \ + --tp 1 --bs 1 --input-len 2048 --existing-ctx 0 --decode-tokens 2 --gpus 1 \ + --extra-server-opts "--load-format dummy" -```bash -python3 scripts/run_stage_profile.py \ +# Multi-point sweep +flowsim submit --scheduler local \ --collect all \ - --output-dir /workspace/sweep_P1_tp4 \ - --launch-server \ - --server-opts "--model-path Qwen/Qwen3-235B-A22B-FP8 --tp 4 --host 0.0.0.0 --port 30001" + --model-path workload/models/configs/Qwen3-235B-A22B \ + --sweep 1:2048:0 4:2048:0 8:2048:0 --decode-tokens 2 --gpus 1 \ + --extra-server-opts "--load-format dummy" ``` +For K8s / Slurm clusters, replace `--scheduler local` with `k8s` or `slurm`. See [schedulers/README.md](schedulers/README.md) for full scheduler documentation. ### Output structure ``` -sweep_P1_tp4/ -├── sweep_summary.json +stage_traces/{scheduler}/{YYYYMMDD_HHMMSS}/ ├── bs1_input2048_ctx0/ -│ ├── *-TP-*-EXTEND.trace.json.gz -│ ├── *-TP-*-DECODE.trace.json.gz -│ ├── parsed/ -│ │ ├── TP-0-EXTEND.csv -│ │ ├── TP-0-DECODE.csv -│ │ └── ... +│ ├── *.trace.json.gz +│ ├── parsed/*.csv +│ ├── merged/*_merged.trace.csv +│ ├── shape_traces/ + shape_parsed/ │ ├── analysis_extend.json │ └── analysis_decode.json -└── ... +├── logs/ +│ ├── server_*.{stdout,stderr}.log +│ ├── shape_server_*.{stdout,stderr}.log +│ └── {job_name}_*.{stdout,stderr}.log +└── sweep_summary.json ``` -After `--collect shapes`, each `parsed/TP-*-DECODE.csv` gains a `Dims` column with kernel tensor shapes. - -### Helper scripts - -| Script | Purpose | -|---|---| -| `tests/integration/test_stage_profile_configs.py` | Integration tests for `--collect {perf,shapes,all}` across parallelism configs. Run with `pytest` inside Docker. Filter with `RUN_CONFIGS=P1`. | +- `parsed/`: Per-rank timing CSVs extracted from traces. +- `merged/`: Timing + shape columns joined into a single CSV per rank/stage. +- `shape_traces/` / `shape_parsed/`: Raw and parsed shape-profiling traces (generated by `--collect shapes` or `--collect all`). +- `logs/`: Server, shape-server, and job stdout/stderr logs. ### Utilities (`utils/`) @@ -278,11 +182,16 @@ After `--collect shapes`, each `parsed/TP-*-DECODE.csv` gains a `Dims` column wi |---|---| | `utils/cross_rank_agg.py` | Cross-rank kernel aggregation (symmetric collectives → min, asymmetric → max, compute → mean) | | `utils/shape_merge.py` | Merge kernel shape data into timing CSVs | -| `utils/net.py` | Shared networking helpers (`wait_for_port`) | | `utils/merge_trace.py` | Merge multi-rank traces into a single Perfetto-compatible file | --- +## Scheduler Backends + +For submitting profiling jobs to **local Docker**, **Kubernetes**, or **Slurm** clusters, use the `flowsim` CLI. See [schedulers/README.md](schedulers/README.md) for full documentation including per-scheduler parameters, configuration, and environment variables. + +--- + ## For Developers ### Customizing Profiling Workloads diff --git a/pyproject.toml b/pyproject.toml index 0b237ec..c91de8a 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,3 +1,74 @@ +[build-system] +requires = ["setuptools>=68.0"] +build-backend = "setuptools.build_meta" + +[project] +name = "flowsim" +version = "0.1.0" +description = "Workload simulation pipeline for kernel-level inference profiling" +readme = "README.md" +license = {text = "MIT"} +requires-python = ">=3.10" +dependencies = [ + "requests>=2.28", + "perfetto>=0.7", + "numpy>=1.24", + "pandas>=1.5", + "PyYAML>=6.0", +] + +[project.optional-dependencies] +# Scheduler backends ------------------------------------------------------- +k8s = [ + "kubernetes>=27.0", # K8s Python client for remote job submission +] +slurm = [] # Slurm REST API uses stdlib urllib only + +# Full simulation stack (matches Dockerfile) -------------------------------- +sim = [ + "scalesim>=2.0", + "scipy>=1.10", + "torch>=2.0", +] + +# Visualization ------------------------------------------------------------- +viz = [ + "matplotlib>=3.7", + "seaborn>=0.12", +] + +# Backend API --------------------------------------------------------------- +api = [ + "fastapi>=0.100", + "pydantic>=2.0", + "uvicorn>=0.23", +] + +# Development --------------------------------------------------------------- +dev = [ + "black>=23.0", + "pytest>=7.0", +] + +# Everything ---------------------------------------------------------------- +all = [ + "flowsim[k8s,sim,viz,api,dev]", +] + +[tool.setuptools.packages.find] +include = [ + "schedulers*", + "scripts*", + "simulator*", + "utils*", +] + +[project.scripts] +flowsim = "scripts.cli:main" + [tool.black] line-length = 80 include = '\.pyi?$' + +[tool.pytest.ini_options] +testpaths = ["tests"] diff --git a/schedulers/README.md b/schedulers/README.md new file mode 100644 index 0000000..3994cb6 --- /dev/null +++ b/schedulers/README.md @@ -0,0 +1,242 @@ +# FlowSim Schedulers + +FlowSim supports three scheduler backends for submitting GPU profiling jobs: + +| Backend | Use Case | Runs On | Dependencies | +|---------|----------|---------|--------------| +| **local** | Single-machine dev/testing | Host Docker container | Docker + NVIDIA GPU | +| **k8s** | Kubernetes cluster | K8s Job Pod | `kubernetes` Python package | +| **slurm** | HPC cluster | Slurm compute node | Slurm CLI (`sbatch`/`squeue`/`scancel`) | + +## Quick Start + +```bash +pip install -e . +flowsim --help +``` + +## Common Workflow + +```bash +# Submit a job (same interface for all backends) +flowsim submit --scheduler \ + --collect \ + --model-path \ + --tp 1 --bs 1 --input-len 2048 --decode-tokens 2 --gpus 1 + +# Job lifecycle +flowsim list --scheduler +flowsim status --scheduler --job +flowsim logs --scheduler --job +flowsim cancel --scheduler --job + +# Preview without submitting +flowsim submit --scheduler ... --dry-run + +# Multi-point sweep +flowsim submit --scheduler \ + --collect all --model-path workload/models/configs/Qwen3-235B-A22B \ + --sweep 1:2048:0 4:2048:0 8:2048:0 --gpus 1 +``` + +### Common Parameters + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `--collect` | `perf` / `shapes` / `all` | required | +| `--model-path` | HuggingFace model path | required | +| `--tp` | Tensor parallelism | `1` | +| `--dp` | Data parallelism | `1` | +| `--bs` | Batch size | `1` | +| `--input-len` | Input sequence length | `2048` | +| `--existing-ctx` | Existing KV cache length | `0` | +| `--decode-tokens` | Decode batches to profile | `2` | +| `--gpus` | GPU count | `1` | +| `--image` | Docker image | `flowsim-image:latest` | +| `--output-dir` | Output directory | `stage_traces/{scheduler}/{timestamp}/` | +| `--extra-server-opts` | Extra sglang server flags (quoted string) | `""` | +| `--sweep` | Multi-point sweep `BS:INPUT_LEN:CTX` (repeatable) | empty | +| `--sweep-file` | File with one `BS:INPUT_LEN:CTX` per line (mutually exclusive with `--sweep`) | none | +| `--job-name` | Custom job name | auto-generated | +| `--dry-run` | Print script only | `false` | + +--- + +## 1. Local Scheduler + +Runs profiling via `docker run` on the host machine. + +```bash +flowsim submit --scheduler local \ + --collect all \ + --model-path workload/models/configs/Qwen3-235B-A22B \ + --tp 1 --bs 1 --input-len 2048 --existing-ctx 0 --decode-tokens 2 --gpus 1 \ + --local-gpus 0 \ + --extra-server-opts "--load-format dummy" +``` + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `--local-gpus` | `CUDA_VISIBLE_DEVICES` (e.g. `0` or `0,1`) | all GPUs | +| `--local-workdir` | Host working directory | FlowSim project root | + +--- + +## 2. Kubernetes Scheduler + +Submits profiling jobs as Kubernetes Jobs. Supports PVC and hostPath storage. + +### Setup + +```bash +flowsim init k8s # install bundled template +flowsim init k8s --config my-cluster.yaml # or use your own +# Edit ~/.flowsim/k8s.yaml +``` + +### Usage + +```bash +flowsim submit --scheduler k8s \ + --collect all \ + --model-path workload/models/configs/Qwen3-235B-A22B \ + --tp 1 --bs 1 --input-len 2048 --existing-ctx 0 --decode-tokens 2 --gpus 1 \ + --extra-server-opts "--load-format dummy" +``` + +### Parameters + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `--k8s-namespace` | K8s namespace | `default` | +| `--k8s-kubeconfig` | kubeconfig path | `~/.kube/config` | +| `--k8s-context` | kubeconfig context | current context | +| `--k8s-pvc` | PVC name for traces | empty | +| `--k8s-host-output-dir` | hostPath (when no PVC) | empty | +| `--k8s-node-selector` | Node selector `KEY=VALUE` (repeatable) | empty | +| `--k8s-service-account` | ServiceAccount | empty | +| `--k8s-shm-size` | Shared memory size | `16Gi` | +| `--k8s-runtime-class` | RuntimeClass (e.g. `nvidia`) | empty | + +--- + +## 3. Slurm Scheduler + +Generates sbatch scripts and submits via `sbatch`/`squeue`/`scancel`. + +### Setup + +```bash +flowsim init slurm # install bundled template +flowsim init slurm --config my-slurm.yaml # or use your own +# Edit ~/.flowsim/slurm.yaml +``` + +### Usage + +```bash +flowsim submit --scheduler slurm \ + --collect all \ + --model-path workload/models/configs/Qwen3-235B-A22B \ + --tp 1 --bs 1 --input-len 2048 --existing-ctx 0 --decode-tokens 2 --gpus 1 \ + --slurm-partition gpu \ + --extra-server-opts "--load-format dummy" +``` + +For remote clusters, use `--slurm-cli-prefix`: +```bash +flowsim submit --scheduler slurm ... \ + --slurm-cli-prefix "docker exec -i slurmctld" +# or: --slurm-cli-prefix "ssh login-node" +``` + +### Parameters + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `--slurm-partition` | Slurm partition | empty | +| `--slurm-time` | Job time limit | `02:00:00` | +| `--slurm-account` | Billing account | empty | +| `--slurm-constraint` | Node constraint | empty | +| `--slurm-cli-prefix` | Shell prefix for remote CLI | empty | +| `--slurm-container-runtime` | `docker` / `enroot` / `none` | `none` | +| `--slurm-container-mounts` | Container mounts | empty | +| `--slurm-module` | `module load` commands (repeatable) | empty | +| `--slurm-extra-sbatch` | Extra `#SBATCH` directives (repeatable) | empty | + +--- + +## Configuration + +Config files live in `~/.flowsim/` and are installed via `flowsim init`. +Templates with comments are in `schedulers/templates/`. + +``` +~/.flowsim/ +├── k8s.yaml +└── slurm.yaml +``` + +**Priority** (highest to lowest): +CLI flag → environment variable → config file → built-in default + +### Environment Variables + +| Variable | Overrides | Example | +|----------|-----------|--------| +| `KUBECONFIG` | `--k8s-kubeconfig` | `/home/user/.kube/config` | +| `FLOWSIM_K8S_NAMESPACE` | `--k8s-namespace` | `ml-team` | +| `FLOWSIM_K8S_CONTEXT` | `--k8s-context` | `kind-flowsim` | +| `FLOWSIM_K8S_CONFIG` | Config file path | `/etc/flowsim/k8s.yaml` | +| `FLOWSIM_SLURM_PARTITION` | `--slurm-partition` | `gpu-h100` | +| `FLOWSIM_SLURM_TIME` | `--slurm-time` | `04:00:00` | +| `FLOWSIM_SLURM_CONFIG` | Config file path | `/etc/flowsim/slurm.yaml` | + +--- + +## Output Structure + +``` +stage_traces/{scheduler}/{YYYYMMDD_HHMMSS}/ +├── bs1_input2048_ctx0/ +│ ├── *.trace.json.gz +│ ├── parsed/*.csv +│ ├── merged/*_merged.trace.csv +│ ├── shape_traces/ + shape_parsed/ +│ ├── analysis_extend.json +│ └── analysis_decode.json +├── logs/ +│ ├── server_*.{stdout,stderr}.log +│ ├── shape_server_*.{stdout,stderr}.log +│ └── {job_name}_*.{stdout,stderr}.log +└── sweep_summary.json +``` + +--- + +## Development + +### Test Clusters + +```bash +# Kind (K8s) — GPU passthrough via CDI +bash tests/integration/infra/dev-setup.sh kind +bash tests/integration/infra/dev-teardown.sh kind + +# Slurm — Docker Compose cluster +cd tests/integration/infra/ +docker compose -f slurm-compose.yaml up -d +docker compose -f slurm-compose.yaml down -v +``` + +### Running Tests + +```bash +# Unit tests +python -m pytest tests/unit/test_scheduler_cli.py -v + +# Integration tests +python -m pytest tests/integration/test_scheduler.py::TestK8sScheduler -v -x +python -m pytest tests/integration/test_scheduler.py::TestSlurmScheduler -v -x +``` + diff --git a/schedulers/__init__.py b/schedulers/__init__.py new file mode 100644 index 0000000..7e0df35 --- /dev/null +++ b/schedulers/__init__.py @@ -0,0 +1,15 @@ +"""Scheduler backends for submitting FlowSim profiling jobs.""" + +from schedulers.base import BaseScheduler, JobResult, ProfileJobSpec +from schedulers.k8s import K8sScheduler +from schedulers.local import LocalScheduler +from schedulers.slurm import SlurmScheduler + +__all__ = [ + "BaseScheduler", + "JobResult", + "K8sScheduler", + "LocalScheduler", + "ProfileJobSpec", + "SlurmScheduler", +] diff --git a/schedulers/base.py b/schedulers/base.py new file mode 100644 index 0000000..ac71548 --- /dev/null +++ b/schedulers/base.py @@ -0,0 +1,218 @@ +"""Abstract base class for FlowSim job schedulers.""" + +from __future__ import annotations + +import abc +import shlex +import time +from dataclasses import dataclass, field +from typing import Optional, Sequence + + +@dataclass +class JobResult: + """Structured return value from ``submit()``.""" + + job_id: str + scheduler: str # "local", "k8s", "slurm" + state: str # "Submitted", "Completed", "Failed" + output_dir: str = "" + message: str = "" + + +@dataclass +class ProfileJobSpec: + """All parameters needed to run a stage-profiling job. + + The scheduler backends render this into a K8s Job YAML or Slurm + sbatch script. + """ + + # -- Profiling workload -- + collect: str # "perf", "shapes", or "all" + model_path: str + tp: int = 1 + dp: int = 1 + bs: int = 1 + input_len: int = 2048 + existing_ctx: int = 0 + decode_tokens: int = 32 + warmup_n: int = 5 + disable_chunked_prefill: bool = False + max_prefill_tokens: int = 131072 + + # -- Infrastructure -- + image: str = "flowsim-image:latest" + gpus: int = 1 # total GPU count (must be >= tp * dp) + host: str = "0.0.0.0" + port: int = 30001 + output_dir: str = "/flowsim/stage_traces" + job_name: str = "" + + # -- Sweep: explicit list of (bs, input_len, existing_ctx) tuples -- + sweep_points: list[tuple[int, int, int]] = field(default_factory=list) + + # -- Extra server opts (appended verbatim) -- + extra_server_opts: str = "" + + def build_server_opts(self) -> str: + """Build the ``--server-opts`` string for run_stage_profile.py.""" + parts = [ + f"--model-path {self.model_path}", + f"--tp {self.tp}", + f"--host {self.host}", + f"--port {self.port}", + ] + if self.dp > 1: + parts.append(f"--dp {self.dp}") + if self.extra_server_opts: + parts.append(self.extra_server_opts) + return " ".join(parts) + + @property + def log_dir(self) -> str: + """Server logs go under ``{output_dir}/logs/``.""" + return self.output_dir + "/logs" + + def build_profile_command(self) -> list[str]: + """Build the full ``python scripts/run_stage_profile.py ...`` command.""" + cmd = [ + "python3", + "scripts/run_stage_profile.py", + "--collect", + self.collect, + "--launch-server", + "--server-opts", + self.build_server_opts(), + "--decode-tokens", + str(self.decode_tokens), + "--warmup-n", + str(self.warmup_n), + "--host", + self.host, + "--port", + str(self.port), + "--output-dir", + self.output_dir, + "--log-dir", + self.log_dir, + ] + if self.sweep_points: + cmd.append("--sweep") + for bs, il, ctx in self.sweep_points: + cmd.append(f"{bs}:{il}:{ctx}") + else: + cmd.extend(["--bs", str(self.bs)]) + cmd.extend(["--input-len", str(self.input_len)]) + cmd.extend(["--existing-ctx", str(self.existing_ctx)]) + if self.disable_chunked_prefill: + cmd.append("--disable-chunked-prefill") + cmd.extend(["--max-prefill-tokens", str(self.max_prefill_tokens)]) + return cmd + + def build_shell_command(self) -> str: + """Build a single shell command string (properly quoted).""" + cmd = self.build_profile_command() + # Quote the --server-opts value since it contains spaces + quoted = [] + i = 0 + while i < len(cmd): + if cmd[i] == "--server-opts" and i + 1 < len(cmd): + quoted.append(cmd[i]) + quoted.append(shlex.quote(cmd[i + 1])) + i += 2 + else: + quoted.append(cmd[i]) + i += 1 + return " ".join(quoted) + + def default_job_name(self) -> str: + """Generate a default job name from workload params. + + Auto-generated names include a short timestamp suffix + (``-MMDD-HHMMSS``) so repeated submissions of the same + workload get distinct names. User-supplied ``--job-name`` + values are returned as-is. + """ + if self.job_name: + return self.job_name + model_short = self.model_path.split("/")[-1].lower().replace(".", "-") + ts = time.strftime("%m%d-%H%M%S") + if self.sweep_points: + name = f"flowsim-{self.collect}-{model_short}-sweep{len(self.sweep_points)}pt-{ts}" + else: + name = f"flowsim-{self.collect}-{model_short}-bs{self.bs}-il{self.input_len}-{ts}" + return name + + +class BaseScheduler(abc.ABC): + """Abstract scheduler backend.""" + + @abc.abstractmethod + def render(self, spec: ProfileJobSpec) -> str: + """Render the job manifest / script as a string.""" + + @abc.abstractmethod + def submit(self, spec: ProfileJobSpec) -> JobResult: + """Submit the job and return a structured :class:`JobResult`.""" + + def cancel(self, job_id: str) -> str: + """Cancel a running or pending job. Returns a status message.""" + raise NotImplementedError( + f"{type(self).__name__} does not support cancel" + ) + + def status(self, job_id: str) -> dict: + """Query job status. Returns dict with at least 'state' key. + + Subclasses should return:: + + { + "state": "Pending" | "Running" | "Succeeded" | "Failed" | ..., + "message": "human-readable detail", + "output_hint": "where to find trace files", + } + """ + raise NotImplementedError( + f"{type(self).__name__} does not support status queries" + ) + + def logs( + self, job_id: str, *, tail: int = 100, follow: bool = False + ) -> str: + """Retrieve recent log output for a job. + + Parameters + ---------- + job_id : str + Job name (K8s) or job ID (Slurm) or log prefix (local). + tail : int + Number of lines from the end to return. + follow : bool + If True, stream logs in real time (blocking). + """ + raise NotImplementedError( + f"{type(self).__name__} does not support log retrieval" + ) + + def list_jobs(self, *, status_filter: str = "") -> list[dict]: + """List jobs managed by this scheduler. + + Parameters + ---------- + status_filter : str + If non-empty, only return jobs matching this state + (e.g., ``"Running"``, ``"Succeeded"``, ``"PENDING"``). + + Returns + ------- + list[dict] + Each dict has at least ``{"job_id": ..., "state": ..., "name": ...}``. + """ + raise NotImplementedError( + f"{type(self).__name__} does not support list" + ) + + def dry_run(self, spec: ProfileJobSpec) -> str: + """Render and return the manifest without submitting.""" + return self.render(spec) diff --git a/schedulers/config.py b/schedulers/config.py new file mode 100644 index 0000000..10c7f8d --- /dev/null +++ b/schedulers/config.py @@ -0,0 +1,88 @@ +"""Load FlowSim scheduler config from per-scheduler YAML files. + +Config file lookup (per scheduler): + +K8s: + 1. ``FLOWSIM_K8S_CONFIG`` env var + 2. ``~/.flowsim/k8s.yaml`` + +Slurm: + 1. ``FLOWSIM_SLURM_CONFIG`` env var + 2. ``~/.flowsim/slurm.yaml`` + +Priority (highest → lowest): + CLI flag > env var > config file > built-in default + +Run ``flowsim init k8s`` or ``flowsim init slurm`` to install +a config template under ``~/.flowsim/``. Templates are in +``schedulers/templates/``. +""" + +from __future__ import annotations + +import os +from pathlib import Path + +import yaml as _yaml + + +def _load_yaml(path: Path) -> dict: + with open(path) as f: + return _yaml.safe_load(f) or {} + + +_CONFIG_DIR = Path.home() / ".flowsim" + + +def _save_yaml(path: Path, data: dict) -> None: + """Write a dict to a YAML file.""" + path.parent.mkdir(parents=True, exist_ok=True) + with open(path, "w") as f: + _yaml.safe_dump(data, f, default_flow_style=False, sort_keys=False) + + +def _resolve_path(env_var: str, filename: str) -> Path | None: + """Return the config file path, or None if it doesn't exist.""" + env = os.environ.get(env_var) + if env: + p = Path(env) + return p if p.is_file() else None + default = _CONFIG_DIR / filename + return default if default.is_file() else None + + +def load_k8s_config() -> dict: + """Load ``~/.flowsim/k8s.yaml`` (or ``FLOWSIM_K8S_CONFIG``).""" + path = _resolve_path("FLOWSIM_K8S_CONFIG", "k8s.yaml") + if path is None: + return {} + try: + return _load_yaml(path) + except Exception: + return {} + + +def load_slurm_config() -> dict: + """Load ``~/.flowsim/slurm.yaml`` (or ``FLOWSIM_SLURM_CONFIG``).""" + path = _resolve_path("FLOWSIM_SLURM_CONFIG", "slurm.yaml") + if path is None: + return {} + try: + return _load_yaml(path) + except Exception: + return {} + + +def cfg_get(cfg: dict, key: str, fallback: str = "") -> str: + """Get a value from a flat config dict, or fallback.""" + val = cfg.get(key) + if val is not None: + return str(val) + return fallback + + +def resolve_default( + env_var: str, cfg: dict, key: str, fallback: str = "" +) -> str: + """Resolve a config value: env var > config file > fallback.""" + return os.environ.get(env_var, "") or cfg_get(cfg, key, fallback) diff --git a/schedulers/k8s.py b/schedulers/k8s.py new file mode 100644 index 0000000..8c07771 --- /dev/null +++ b/schedulers/k8s.py @@ -0,0 +1,415 @@ +"""Kubernetes Job scheduler for FlowSim profiling. + +Uses the ``kubernetes`` Python client for remote submission. +The ``render()`` / ``dry_run()`` path uses stdlib only (json fallback if +PyYAML is not installed — JSON is valid YAML 1.2 and ``kubectl`` accepts it). +""" + +from __future__ import annotations + +import json + +from schedulers.base import BaseScheduler, JobResult, ProfileJobSpec + + +def _k8s_job_state(status) -> str: + """Derive a human-readable state string from a K8s Job status object.""" + if status.succeeded and status.succeeded > 0: + return "Succeeded" + if status.failed and status.failed > 0: + return "Failed" + if status.active and status.active > 0: + return "Running" + return "Pending" + + +# Optional: nicer YAML output for dry-run. +try: + import yaml as _yaml # type: ignore[import-untyped] + + def _dump(obj: dict) -> str: + return _yaml.safe_dump(obj, default_flow_style=False, sort_keys=False) + +except ImportError: + _yaml = None # type: ignore[assignment] + + def _dump(obj: dict) -> str: # type: ignore[misc] + return json.dumps(obj, indent=2, ensure_ascii=False) + "\n" + + +class K8sScheduler(BaseScheduler): + """Generate and optionally submit a Kubernetes Job for profiling. + + Parameters + ---------- + namespace : str + Kubernetes namespace for the Job. + kubeconfig : str, optional + Path to a kubeconfig file. When empty, the ``kubernetes`` client + tries in-cluster config, then ``~/.kube/config``. + context : str, optional + kubeconfig context to activate. + pvc_name : str, optional + Name of a PersistentVolumeClaim to mount for trace output. + If empty, uses ``emptyDir`` (traces are lost when the pod exits). + host_output_dir : str, optional + If set (and *pvc_name* is empty), use a ``hostPath`` volume at + this path instead of a PVC. + node_selector : dict, optional + Kubernetes nodeSelector labels (e.g., ``{"gpu": "a100"}``). + service_account : str, optional + ServiceAccount name for the pod. + shm_size : str + Size of ``/dev/shm`` (shared memory). Defaults to ``"16Gi"``. + runtime_class_name : str, optional + Kubernetes RuntimeClass name for the pod (e.g., ``"nvidia"`` for + CDI-based GPU injection in Kind clusters). + """ + + def __init__( + self, + *, + namespace: str = "default", + kubeconfig: str = "", + context: str = "", + pvc_name: str = "", + host_output_dir: str = "", + node_selector: dict[str, str] | None = None, + service_account: str = "", + shm_size: str = "16Gi", + runtime_class_name: str = "", + ) -> None: + self.namespace = namespace + self.kubeconfig = kubeconfig + self.context = context + self.pvc_name = pvc_name + self.host_output_dir = host_output_dir + self.node_selector = node_selector or {} + self.service_account = service_account + self.shm_size = shm_size + self.runtime_class_name = runtime_class_name + + def render(self, spec: ProfileJobSpec) -> str: + return _dump(self._build_job_dict(spec)) + + # ----------------------------------------------------------------- + # Build a plain-dict manifest (used by both render and submit) + # ----------------------------------------------------------------- + def _build_job_dict(self, spec: ProfileJobSpec) -> dict: + """Return the Job manifest as a nested Python dict.""" + job_name = spec.default_job_name()[:63] + cmd = spec.build_profile_command() + + # volumes + mounts + volume_mounts = [{"name": "dshm", "mountPath": "/dev/shm"}] + volumes: list[dict] = [ + { + "name": "dshm", + "emptyDir": {"medium": "Memory", "sizeLimit": self.shm_size}, + }, + ] + if self.pvc_name: + volume_mounts.append( + {"name": "output", "mountPath": spec.output_dir} + ) + volumes.append( + { + "name": "output", + "persistentVolumeClaim": {"claimName": self.pvc_name}, + } + ) + elif self.host_output_dir: + # Mount at base traces dir so the full directory structure + # (e.g. k8s/{timestamp}/bs1_...) is preserved on the host. + volume_mounts.append( + {"name": "output", "mountPath": "/flowsim/stage_traces"} + ) + volumes.append( + { + "name": "output", + "hostPath": { + "path": self.host_output_dir, + "type": "DirectoryOrCreate", + }, + } + ) + + container = { + "name": "profiler", + "image": spec.image, + "imagePullPolicy": "IfNotPresent", + "workingDir": "/flowsim", + "command": cmd, + "env": [{"name": "SGLANG_PROFILE_KERNELS", "value": "1"}], + "resources": { + "limits": {"nvidia.com/gpu": str(spec.gpus)}, + "requests": {"nvidia.com/gpu": str(spec.gpus)}, + }, + "volumeMounts": volume_mounts, + } + + pod_spec: dict = { + "restartPolicy": "Never", + "containers": [container], + "volumes": volumes, + } + if self.runtime_class_name: + pod_spec["runtimeClassName"] = self.runtime_class_name + if self.service_account: + pod_spec["serviceAccountName"] = self.service_account + if self.node_selector: + pod_spec["nodeSelector"] = dict(self.node_selector) + + return { + "apiVersion": "batch/v1", + "kind": "Job", + "metadata": { + "name": job_name, + "namespace": self.namespace, + "labels": { + "app": "flowsim", + "component": "profiling", + "collect": spec.collect, + }, + }, + "spec": { + "backoffLimit": 0, + "ttlSecondsAfterFinished": 86400, + "template": { + "metadata": { + "labels": {"app": "flowsim", "component": "profiling"} + }, + "spec": pod_spec, + }, + }, + } + + def submit(self, spec: ProfileJobSpec) -> JobResult: + """Submit via the ``kubernetes`` Python client (``pip install kubernetes``).""" + if not self.pvc_name and not self.host_output_dir: + raise ValueError( + "No persistent storage configured. " + "Set --k8s-pvc or --k8s-host-output-dir to avoid losing traces when the pod exits." + ) + batch_api, _ = self._load_k8s() + + body = self._build_job_dict(spec) + resp = batch_api.create_namespaced_job( + namespace=self.namespace, + body=body, + ) + return JobResult( + job_id=resp.metadata.name, + scheduler="k8s", + state="Submitted", + output_dir=spec.output_dir, + message=f"job.batch/{resp.metadata.name} created (namespace={resp.metadata.namespace})", + ) + + # ----------------------------------------------------------------- + # Helpers shared by status / logs + # ----------------------------------------------------------------- + + def _load_k8s(self): + """Load kubeconfig and return (BatchV1Api, CoreV1Api). + + Raises RuntimeError with actionable message on failure. + """ + try: + from kubernetes import client as k8s_client, config as k8s_config + except ImportError: + raise RuntimeError( + "The 'kubernetes' package is required. " + "Install it with: pip install kubernetes" + ) + + config_kwargs: dict = {} + if self.kubeconfig: + config_kwargs["config_file"] = self.kubeconfig + if self.context: + config_kwargs["context"] = self.context + try: + k8s_config.load_kube_config(**config_kwargs) + except k8s_config.ConfigException: + try: + k8s_config.load_incluster_config() + except k8s_config.ConfigException: + hint = ( + " Try --k8s-kubeconfig /path/to/kubeconfig." + if not self.kubeconfig + else "" + ) + raise RuntimeError( + "No valid Kubernetes configuration found. " + "Checked kubeconfig file and in-cluster environment." + hint + ) + + return k8s_client.BatchV1Api(), k8s_client.CoreV1Api() + + def cancel(self, job_id: str) -> str: + """Delete a K8s Job (and its pods) by name.""" + from kubernetes import client as k8s_client + + batch_api, _ = self._load_k8s() + batch_api.delete_namespaced_job( + name=job_id, + namespace=self.namespace, + body=k8s_client.V1DeleteOptions(propagation_policy="Foreground"), + ) + return f"job.batch/{job_id} deleted (namespace={self.namespace})" + + def status(self, job_id: str) -> dict: + """Query K8s Job status by job name.""" + batch_api, core_api = self._load_k8s() + + job = batch_api.read_namespaced_job( + name=job_id, namespace=self.namespace + ) + + # Determine state + state = _k8s_job_state(job.status) + + # Pod info + pods = core_api.list_namespaced_pod( + namespace=self.namespace, + label_selector=f"job-name={job_id}", + ) + pod_statuses = [] + for pod in pods.items: + phase = pod.status.phase + node = pod.spec.node_name or "unscheduled" + pod_statuses.append(f"{pod.metadata.name} ({phase}, node={node})") + + output_hint = "" + if self.pvc_name: + output_hint = f"Traces persisted on PVC '{self.pvc_name}'" + elif self.host_output_dir: + output_hint = f"Traces at hostPath {self.host_output_dir} on the scheduled node" + else: + output_hint = "WARNING: no PVC or hostPath configured — traces will be lost when pod exits" + + msg_parts = [ + f"Job: {job_id} Namespace: {self.namespace} State: {state}" + ] + if pod_statuses: + msg_parts.append("Pods: " + ", ".join(pod_statuses)) + msg_parts.append(output_hint) + + return { + "state": state, + "message": "\n".join(msg_parts), + "output_hint": output_hint, + } + + def logs( + self, job_id: str, *, tail: int = 100, follow: bool = False + ) -> str: + """Show where logs are and how to access them for a K8s Job.""" + _, core_api = self._load_k8s() + + pods = core_api.list_namespaced_pod( + namespace=self.namespace, + label_selector=f"job-name={job_id}", + ) + if not pods.items: + return ( + f"No pods found for job {job_id} in namespace {self.namespace}" + ) + + if follow: + # Stream logs from the first running/succeeded pod + for pod in pods.items: + name = pod.metadata.name + if pod.status.phase in ("Running", "Succeeded"): + # Use kubectl follow since the Python client follow is blocking + return ( + f"Follow logs:\n" + f" kubectl logs -f {name} -n {self.namespace}" + ) + name = pods.items[0].metadata.name + return f"Follow logs:\n kubectl logs -f {name} -n {self.namespace}" + + parts: list[str] = [] + + # Pod info + for pod in pods.items: + name = pod.metadata.name + phase = pod.status.phase + parts.append(f"Pod: {name} ({phase})") + + parts.append("") + + # Commands to view pod stdout + parts.append("View profiling script output:") + for pod in pods.items: + name = pod.metadata.name + parts.append(f" kubectl logs {name} -n {self.namespace}") + parts.append( + f" kubectl logs {name} -n {self.namespace} --tail={tail}" + ) + + parts.append("") + + # Persistent log files + if self.pvc_name: + parts.append( + f"Server logs + traces persisted on PVC '{self.pvc_name}'." + ) + parts.append("Copy to local machine:") + for pod in pods.items: + name = pod.metadata.name + if pod.status.phase in ("Running", "Succeeded"): + parts.append( + f" kubectl cp {self.namespace}/{name}:/flowsim/stage_traces ./stage_traces" + ) + break + else: + parts.append( + " (pod not running — mount the PVC in another pod to retrieve files)" + ) + elif self.host_output_dir: + parts.append(f"Server logs + traces at hostPath on the node:") + parts.append(f" {self.host_output_dir}/") + parts.append(f" {self.host_output_dir}/logs/") + # Identify node + for pod in pods.items: + if pod.spec.node_name: + parts.append(f" Node: {pod.spec.node_name}") + parts.append( + f" scp {pod.spec.node_name}:{self.host_output_dir}/ ./stage_traces/" + ) + break + + return "\n".join(parts) + + def list_jobs(self, *, status_filter: str = "") -> list[dict]: + """List FlowSim Jobs in the namespace (label: app=flowsim).""" + batch_api, _ = self._load_k8s() + + jobs = batch_api.list_namespaced_job( + namespace=self.namespace, + label_selector="app=flowsim", + ) + result: list[dict] = [] + for job in jobs.items: + state = _k8s_job_state(job.status) + + if status_filter and state.lower() != status_filter.lower(): + continue + + created = "" + if job.metadata.creation_timestamp: + created = job.metadata.creation_timestamp.strftime( + "%Y-%m-%d %H:%M:%S" + ) + + result.append( + { + "job_id": job.metadata.name, + "name": job.metadata.name, + "state": state, + "namespace": self.namespace, + "created": created, + } + ) + return result diff --git a/schedulers/local.py b/schedulers/local.py new file mode 100644 index 0000000..7015d28 --- /dev/null +++ b/schedulers/local.py @@ -0,0 +1,369 @@ +"""Local scheduler — run profiling via Docker on the local machine. + +``render()`` returns the ``docker run`` command string. +``submit()`` executes it as a subprocess, with stdout/stderr tee'd to log files. +The profiling runs inside the FlowSim Docker image with GPU access. +""" + +from __future__ import annotations + +import glob +import os +import re +import shlex +import subprocess +import sys +import threading +import time + +from schedulers.base import BaseScheduler, JobResult, ProfileJobSpec + + +def _shell_quote(s: str) -> str: + """Quote a string for safe embedding in a bash -c '...' invocation.""" + return shlex.quote(s) + + +class LocalScheduler(BaseScheduler): + """Run profiling jobs locally inside a Docker container. + + Parameters + ---------- + gpus : str + GPU device IDs for Docker ``--gpus`` (e.g., ``"0"`` or ``"0,1"``). + Empty string means all GPUs. + workdir : str + Host directory to use as the FlowSim project root for log scanning. + Defaults to the FlowSim project root on the host. + """ + + def __init__( + self, + *, + gpus: str = "", + workdir: str = "", + ) -> None: + self.gpus = gpus + self.workdir = workdir or self._find_project_root() + + @staticmethod + def _find_project_root() -> str: + """Walk up from this file to find the FlowSim project root.""" + d = os.path.dirname(os.path.abspath(__file__)) + # schedulers/ is one level below project root + return os.path.dirname(d) + + @staticmethod + def _check_image_exists(image: str) -> None: + """Raise if the Docker image is not available locally.""" + result = subprocess.run( + ["docker", "image", "inspect", image], + capture_output=True, + timeout=10, + ) + if result.returncode != 0: + raise SystemExit( + f"[local] Docker image '{image}' not found.\n" + f"Build it first, e.g.:\n" + f" docker build -t {image} -f dockerfiles/cuda12.6.dockerfile ." + ) + + def _docker_gpu_flag(self) -> str: + """Build the ``--gpus`` flag for ``docker run``.""" + if not self.gpus: + return "--gpus all" + return f"--gpus '\"device={self.gpus}\"'" + + def _host_output_dir(self, spec_output_dir: str) -> str: + """Host directory that gets bind-mounted into the container. + + Mirrors the container path structure under the host workdir. + e.g. container ``/flowsim/stage_traces/local/20260317_211318`` + → host ``{workdir}/stage_traces/local/20260317_211318`` + """ + # spec_output_dir is like /flowsim/stage_traces/local/{ts} + # Strip the /flowsim/ prefix to get the relative path + rel = spec_output_dir + if rel.startswith("/flowsim/"): + rel = rel[len("/flowsim/") :] + return os.path.join(self.workdir, rel) + + def _build_docker_cmd(self, spec: ProfileJobSpec) -> str: + """Build the full ``docker run`` command. + + Paths in *spec* (model_path, output_dir, log_dir) are expected to be + relative to the project root or absolute container paths (``/flowsim/…``). + The container workdir is ``/flowsim``, so relative paths resolve + correctly without any string replacement. + """ + job_name = spec.default_job_name()[:63] + host_output = self._host_output_dir(spec.output_dir) + container_output = ( + spec.output_dir + ) # e.g. /flowsim/stage_traces/local/{ts} + + inner_cmd = spec.build_shell_command() + + parts = [ + "docker run --rm", + f"--name {job_name}", + self._docker_gpu_flag(), + "--ipc=host --shm-size=16g", + "--network=host", + f"-e SGLANG_PROFILE_KERNELS=1", + f"-v {host_output}:{container_output}", + f"-v {self.workdir}/simulator:/flowsim/simulator", + f"-v {self.workdir}/scripts:/flowsim/scripts", + f"-w /flowsim", + spec.image, + f"bash -c {_shell_quote(inner_cmd)}", + ] + return " \\\n ".join(parts) + + def render(self, spec: ProfileJobSpec) -> str: + return self._build_docker_cmd(spec) + + def submit(self, spec: ProfileJobSpec) -> JobResult: + """Launch a Docker container for profiling. + + stdout and stderr are streamed to the terminal *and* saved to + log files under ``spec.output_dir/logs/`` on the host. + """ + self._check_image_exists(spec.image) + + # Ensure host output dir exists before mounting + host_output = self._host_output_dir(spec.output_dir) + log_dir = os.path.join(host_output, "logs") + os.makedirs(log_dir, exist_ok=True) + + docker_cmd = self._build_docker_cmd(spec) + job_name = spec.default_job_name() + ts = time.strftime("%Y%m%d_%H%M%S") + + # Remove stale container with the same name (e.g. from a killed run) + subprocess.run( + ["docker", "rm", "-f", job_name[:63]], + capture_output=True, + timeout=10, + ) + stdout_path = os.path.join(log_dir, f"{job_name}_{ts}.stdout.log") + stderr_path = os.path.join(log_dir, f"{job_name}_{ts}.stderr.log") + + print(f"[local] Running {job_name} in Docker...") + print(f"[local] image: {spec.image}") + print(f"[local] gpus: {self.gpus or 'all'}") + print(f"[local] host output: {host_output}") + print(f"[local] logs: {stdout_path}") + print(f"[local] {stderr_path}") + print(f"[local] cmd:\n {docker_cmd}") + print() + + with open(stdout_path, "w") as fout, open(stderr_path, "w") as ferr: + proc = subprocess.Popen( + docker_cmd, + shell=True, + cwd=self.workdir, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + ) + + def _tee(src, dest_file, dest_stream): + for line in src: + dest_stream.buffer.write(line) + dest_stream.buffer.flush() + dest_file.write(line.decode("utf-8", errors="replace")) + dest_file.flush() + + t_out = threading.Thread( + target=_tee, + args=(proc.stdout, fout, sys.stdout), + daemon=True, + ) + t_err = threading.Thread( + target=_tee, + args=(proc.stderr, ferr, sys.stderr), + daemon=True, + ) + t_out.start() + t_err.start() + proc.wait() + t_out.join() + t_err.join() + + if proc.returncode != 0: + return JobResult( + job_id=job_name, + scheduler="local", + state="Failed", + output_dir=host_output, + message=( + f"{job_name} FAILED (exit code {proc.returncode})\n" + f"stdout log: {stdout_path}\n" + f"stderr log: {stderr_path}" + ), + ) + return JobResult( + job_id=job_name, + scheduler="local", + state="Completed", + output_dir=host_output, + message=( + f"{job_name} completed successfully\n" + f"stdout log: {stdout_path}\n" + f"stderr log: {stderr_path}" + ), + ) + + def cancel(self, job_id: str) -> str: + """Stop the Docker container for a local job. + + The Docker container name is truncated to 63 characters when created. + To ensure we stop the correct container even if a longer job id is + provided (for example, the full job name), apply the same truncation + here before calling ``docker stop``. + """ + container_name = job_id[:63] + proc = subprocess.run( + ["docker", "stop", container_name], + capture_output=True, + text=True, + timeout=30, + ) + if proc.returncode == 0: + return f"Stopped container {container_name}" + return f"Could not stop container {container_name}: {proc.stderr.strip()}" + + def _find_log_dirs(self) -> list[str]: + """Find all log directories under stage_traces/{scheduler}/*/logs/.""" + base = os.path.join(self.workdir, "stage_traces", "local") + # New layout: stage_traces/local/{ts}/logs/ + dirs = sorted(glob.glob(os.path.join(base, "*/logs"))) + # Also check legacy flat layout: stage_traces/logs/ + legacy = os.path.join(self.workdir, "stage_traces", "logs") + if os.path.isdir(legacy): + dirs.append(legacy) + return dirs + + def status(self, job_id: str) -> dict: + """Check local job status by looking for log files. + + ``job_id`` is the job name prefix used in log filenames. + """ + matches = [] + for log_dir in self._find_log_dirs(): + matches.extend( + sorted( + glob.glob(os.path.join(log_dir, f"{job_id}_*.stdout.log")) + ) + ) + + if not matches: + return { + "state": "NotFound", + "message": f"No logs found for job '{job_id}'", + "output_hint": "", + } + + latest = matches[-1] + stderr_log = latest.replace(".stdout.log", ".stderr.log") + # trace_dir is the parent of logs/ + trace_dir = os.path.dirname(os.path.dirname(latest)) + + return { + "state": "Completed", + "message": ( + f"Latest log: {latest}\n" + f"Stderr log: {stderr_log}\n" + f"Traces dir: {trace_dir}" + ), + "output_hint": trace_dir, + } + + def logs( + self, job_id: str, *, tail: int = 100, follow: bool = False + ) -> str: + """List log files for a local job and print access commands.""" + matches = [] + for log_dir in self._find_log_dirs(): + matches.extend( + sorted(glob.glob(os.path.join(log_dir, f"{job_id}_*"))) + ) + + if not matches: + for log_dir in self._find_log_dirs(): + matches.extend( + sorted(glob.glob(os.path.join(log_dir, f"*{job_id}*"))) + ) + + if not matches: + return f"No logs found matching '{job_id}'" + + if follow: + stdout_files = sorted( + f for f in matches if f.endswith(".stdout.log") + ) + if stdout_files: + return f"Follow logs with:\n tail -f {stdout_files[-1]}" + return f"No stdout log found to follow for '{job_id}'" + + log_dir = os.path.dirname(matches[-1]) + parts = [f"Log directory: {log_dir}", ""] + parts.append(f"Files ({len(matches)}):") + for p in matches: + size = os.path.getsize(p) + parts.append(f" {os.path.basename(p)} ({size:,} bytes)") + + # Provide commands + parts.append("") + parts.append("View logs:") + stdout_files = sorted(f for f in matches if f.endswith(".stdout.log")) + stderr_files = sorted(f for f in matches if f.endswith(".stderr.log")) + if stdout_files: + parts.append(f" less {stdout_files[-1]}") + if stderr_files: + parts.append(f" less {stderr_files[-1]}") + if stdout_files: + parts.append("") + parts.append("Follow logs:") + parts.append(f" tail -f {stdout_files[-1]}") + + trace_dir = os.path.dirname(log_dir) # parent of logs/ + parts.append("") + parts.append(f"Trace files: {trace_dir}") + parts.append(f" ls {trace_dir}") + + return "\n".join(parts) + + def list_jobs(self, *, status_filter: str = "") -> list[dict]: + """List local jobs by scanning log files.""" + matches = [] + for log_dir in self._find_log_dirs(): + matches.extend( + sorted(glob.glob(os.path.join(log_dir, "*.stdout.log"))) + ) + + jobs: list[dict] = [] + for path in matches: + basename = os.path.basename(path) + # Parse: {job_name}_{YYYYMMDD_HHMMSS}.stdout.log + # Also support old epoch format {job_name}_{digits}.stdout.log + m = re.match(r"^(.+)_(\d{8}_\d{6}|\d{10,})\.stdout\.log$", basename) + if not m: + continue + name = m.group(1) + ts = m.group(2) + state = "Completed" + jobs.append( + { + "job_id": name, + "name": name, + "state": state, + "timestamp": ts, + } + ) + + if status_filter: + filt = status_filter.lower() + jobs = [j for j in jobs if j["state"].lower() == filt] + + return jobs diff --git a/schedulers/slurm.py b/schedulers/slurm.py new file mode 100644 index 0000000..543b22f --- /dev/null +++ b/schedulers/slurm.py @@ -0,0 +1,332 @@ +"""Slurm sbatch scheduler for FlowSim profiling. + +``render()`` / ``dry_run()`` produce a standalone bash script (zero deps). +``submit()`` pipes the script to ``sbatch`` via subprocess (CLI mode). + +Requires ``sbatch``/``squeue``/``scancel`` on PATH (or reachable +via ``cli_prefix``, e.g. ``"docker exec slurmctld"``). +""" + +from __future__ import annotations + +import shlex +import subprocess + +from schedulers.base import BaseScheduler, JobResult, ProfileJobSpec + + +class SlurmScheduler(BaseScheduler): + """Generate and optionally submit an sbatch script for profiling. + + Parameters + ---------- + partition : str + Slurm partition to submit to. + time_limit : str + Wall-clock time limit (e.g., ``"01:00:00"``). + account : str, optional + ``--account`` for which allocation to charge. + constraint : str, optional + ``--constraint`` node feature (e.g., ``"gpu80g"``). + container_runtime : str + How to run the container inside the allocation. + ``"docker"`` -> ``docker run`` + ``"enroot"`` -> ``srun --container-image`` + ``"none"`` -> run bare-metal (no container) + container_mounts : str + Bind-mount string passed to the container runtime + (e.g., ``"/data:/data"``). + modules : list[str] + ``module load`` commands to run before the job + (relevant for ``"none"`` runtime). + extra_sbatch : list[str] + Additional ``#SBATCH`` lines, each *without* the ``#SBATCH`` prefix. + cli_prefix : str + Shell prefix for CLI commands (e.g. ``"docker exec -i slurmctld"``). + """ + + def __init__( + self, + *, + partition: str = "gpu", + time_limit: str = "02:00:00", + account: str = "", + constraint: str = "", + container_runtime: str = "none", + container_mounts: str = "", + modules: list[str] | None = None, + extra_sbatch: list[str] | None = None, + cli_prefix: str = "", + ) -> None: + self.partition = partition + self.time_limit = time_limit + self.account = account + self.constraint = constraint + self.container_runtime = container_runtime + self.container_mounts = container_mounts + self.modules = modules or [] + self.extra_sbatch = extra_sbatch or [] + self.cli_prefix = cli_prefix + + def render(self, spec: ProfileJobSpec) -> str: + job_name = spec.default_job_name() + cmd = spec.build_shell_command() + + lines = [ + "#!/bin/bash", + f"#SBATCH --job-name={job_name}", + f"#SBATCH --partition={self.partition}", + f"#SBATCH --gpus-per-node={spec.gpus}", + f"#SBATCH --ntasks=1", + f"#SBATCH --exclusive", + f"#SBATCH --time={self.time_limit}", + f"#SBATCH --output={spec.output_dir}/{job_name}_%j.log", + ] + + if self.account: + lines.append(f"#SBATCH --account={self.account}") + if self.constraint: + lines.append(f"#SBATCH --constraint={self.constraint}") + for extra in self.extra_sbatch: + lines.append(f"#SBATCH {extra}") + + lines.append("") + lines.append("set -euo pipefail") + lines.append("") + + # Ensure output dir exists (needed for #SBATCH --output) + lines.append(f"mkdir -p {spec.output_dir}") + lines.append("") + + if self.modules: + for mod in self.modules: + lines.append(f"module load {mod}") + lines.append("") + + lines.append("export SGLANG_PROFILE_KERNELS=1") + lines.append("") + + if self.container_runtime == "docker": + # Always mount output_dir so traces/logs persist on the host. + mounts = f" -v {spec.output_dir}:{spec.output_dir}" + if self.container_mounts: + mounts += f" -v {self.container_mounts}" + lines.append( + f"docker run --gpus all --ipc=host --shm-size=16g" + f"{mounts} -w /flowsim {spec.image} \\" + ) + lines.append(f" {cmd}") + elif self.container_runtime == "enroot": + # Always mount output_dir so traces/logs persist on the host. + out_mount = f"{spec.output_dir}:{spec.output_dir}" + if self.container_mounts: + all_mounts = f"{self.container_mounts},{out_mount}" + else: + all_mounts = out_mount + lines.append( + f"srun --container-image={spec.image}" + f" --container-workdir=/flowsim" + f" --container-mounts={all_mounts} \\" + ) + lines.append(f" {cmd}") + elif self.container_runtime == "none": + lines.append(f"cd /flowsim") + lines.append(cmd) + else: + raise ValueError( + f"Unknown container_runtime: {self.container_runtime!r}. " + "Choose from: docker, enroot, none" + ) + + lines.append("") + return "\n".join(lines) + + def submit(self, spec: ProfileJobSpec) -> JobResult: + """Submit the job via ``sbatch``.""" + return self._submit_cli(spec) + + # ------------------------------------------------------------------ + # CLI helpers + # ------------------------------------------------------------------ + + def _cli_cmd(self, *args: str) -> list[str]: + """Build a command list, prepending ``cli_prefix`` if set.""" + prefix = shlex.split(self.cli_prefix) if self.cli_prefix else [] + return prefix + list(args) + + def _cli_run( + self, + *args: str, + input_data: str | None = None, + timeout: int = 60, + ) -> subprocess.CompletedProcess: + """Run a Slurm CLI command and return the CompletedProcess.""" + cmd = self._cli_cmd(*args) + return subprocess.run( + cmd, + capture_output=True, + text=True, + input=input_data, + timeout=timeout, + ) + + def _submit_cli(self, spec: ProfileJobSpec) -> JobResult: + """Submit via ``sbatch`` (piping the script on stdin).""" + script = self.render(spec) + job_name = spec.default_job_name() + + r = self._cli_run("sbatch", "--parsable", input_data=script, timeout=30) + if r.returncode != 0: + raise RuntimeError( + f"sbatch failed (exit {r.returncode}):\n{r.stderr}" + ) + + job_id = r.stdout.strip().split(";")[ + 0 + ] # parsable: "jobid" or "jobid;cluster" + return JobResult( + job_id=job_id, + scheduler="slurm", + state="Submitted", + output_dir=spec.output_dir, + message=f"Submitted batch job {job_id}", + ) + + def cancel(self, job_id: str) -> str: + """Cancel a Slurm job.""" + return self._cancel_cli(job_id) + + def status(self, job_id: str) -> dict: + """Query Slurm job status.""" + return self._status_cli(job_id) + + def logs( + self, job_id: str, *, tail: int = 100, follow: bool = False + ) -> str: + """Show Slurm job log information.""" + return self._logs_cli(job_id, tail=tail, follow=follow) + + def list_jobs(self, *, status_filter: str = "") -> list[dict]: + """List Slurm jobs.""" + return self._list_jobs_cli(status_filter=status_filter) + + # ------------------------------------------------------------------ + # CLI implementations + # ------------------------------------------------------------------ + + def _cancel_cli(self, job_id: str) -> str: + r = self._cli_run("scancel", job_id) + if r.returncode != 0: + raise RuntimeError(f"scancel failed: {r.stderr}") + return f"Cancelled Slurm job {job_id}" + + def _status_cli(self, job_id: str) -> dict: + # Use scontrol show job — works for both running and completed jobs + # (completed jobs stay in memory for MinJobAge seconds, default 300s) + r = self._cli_run("scontrol", "show", "job", job_id) + if r.returncode != 0 or not r.stdout.strip(): + return { + "state": "Unknown", + "message": f"No job found with ID {job_id}", + "output_hint": "", + } + + # Parse key=value output + fields: dict[str, str] = {} + for token in r.stdout.replace("\n", " ").split(): + if "=" in token: + k, _, v = token.partition("=") + fields[k] = v + + state = fields.get("JobState", "UNKNOWN") + name = fields.get("JobName", "") + nodes = fields.get("NodeList", "") + output_file = fields.get("StdOut", "") + + # Normalize Slurm uppercase states to capitalized format + _STATE_MAP = { + "PENDING": "Pending", + "RUNNING": "Running", + "SUSPENDED": "Suspended", + "COMPLETED": "Completed", + "CANCELLED": "Cancelled", + "FAILED": "Failed", + "TIMEOUT": "Timeout", + "NODE_FAIL": "Failed", + "PREEMPTED": "Preempted", + "OUT_OF_MEMORY": "Failed", + } + state = _STATE_MAP.get(state, state) + + msg_parts = [ + f"Job ID: {job_id} Name: {name} State: {state}", + f"Nodes: {nodes}" if nodes else "Nodes: (not yet assigned)", + ] + if output_file: + msg_parts.append(f"Output log: {output_file}") + + return { + "state": state, + "message": "\n".join(msg_parts), + "output_hint": output_file, + } + + def _logs_cli( + self, job_id: str, *, tail: int = 100, follow: bool = False + ) -> str: + info = self._status_cli(job_id) + output_file = info.get("output_hint", "") + + if not output_file: + return info["message"] + "\n(no log file path found)" + + # Try to read the log file via CLI prefix (handles remote Slurm) + if follow: + return ( + f"{info['message']}\n\n" + f"Follow logs:\n" + f" tail -f {output_file}" + ) + + r = self._cli_run("tail", f"-{tail}", output_file, timeout=15) + if r.returncode == 0 and r.stdout.strip(): + return r.stdout + + # Fallback: file may not exist yet or be on a remote node + return ( + f"{info['message']}\n\n" + f"Log file: {output_file}\n" + f"View on login node:\n" + f" tail -{tail} {output_file}\n" + f"Follow:\n" + f" tail -f {output_file}" + ) + + def _list_jobs_cli(self, *, status_filter: str = "") -> list[dict]: + r = self._cli_run( + "squeue", + "-o", + "%i|%j|%T|%P|%N", + "-h", + ) + if r.returncode != 0: + raise RuntimeError(f"squeue failed: {r.stderr}") + result: list[dict] = [] + for line in r.stdout.strip().splitlines(): + if not line.strip(): + continue + parts = line.split("|", 4) + name = parts[1] if len(parts) > 1 else "" + state = parts[2] if len(parts) > 2 else "UNKNOWN" + if status_filter and state.upper() != status_filter.upper(): + continue + result.append( + { + "job_id": parts[0] if parts else "", + "name": name, + "state": state, + "partition": parts[3] if len(parts) > 3 else "", + "nodes": parts[4] if len(parts) > 4 else "", + } + ) + return result diff --git a/schedulers/templates/k8s.yaml b/schedulers/templates/k8s.yaml new file mode 100644 index 0000000..8f548de --- /dev/null +++ b/schedulers/templates/k8s.yaml @@ -0,0 +1,27 @@ +# FlowSim Kubernetes scheduler config +# Copy to ~/.flowsim/k8s.yaml and edit: +# flowsim init k8s --config schedulers/templates/k8s.yaml + +# Path to kubeconfig file (required) +kubeconfig: ~/.kube/config + +# Kubeconfig context (empty = current-context) +context: "" + +# Kubernetes namespace (required) +namespace: default + +# Persistent storage for trace output (set one): +# pvc: my-traces-pvc +# host_output_dir: /data/flowsim-traces +pvc: "" +host_output_dir: "" + +# Service account for the job pod (empty = default) +service_account: "" + +# Shared memory size (for /dev/shm in the pod) +shm_size: "16Gi" + +# RuntimeClass (e.g. "nvidia" for CDI GPU passthrough) +runtime_class_name: "" diff --git a/schedulers/templates/slurm.yaml b/schedulers/templates/slurm.yaml new file mode 100644 index 0000000..5f27328 --- /dev/null +++ b/schedulers/templates/slurm.yaml @@ -0,0 +1,27 @@ +# FlowSim Slurm scheduler config +# Copy to ~/.flowsim/slurm.yaml and edit: +# flowsim init slurm --config schedulers/templates/slurm.yaml + +# Slurm partition (required) +partition: gpu + +# Billing account (empty = default) +account: "" + +# Job time limit +time: "02:00:00" + +# Node constraint (e.g. "h100") +constraint: "" + +# CLI prefix for remote sbatch/squeue/scancel +# Examples: +# "docker exec -i slurmctld" (via Docker container) +# "ssh login-node" (via SSH) +cli_prefix: "" + +# Container runtime: docker | enroot | none +container_runtime: none + +# Container mount spec (for enroot/docker) +container_mounts: "" diff --git a/scripts/__init__.py b/scripts/__init__.py new file mode 100644 index 0000000..e785b75 --- /dev/null +++ b/scripts/__init__.py @@ -0,0 +1,36 @@ +"""Shared utilities for FlowSim CLI scripts.""" + + +def parse_sweep_point(s: str) -> tuple[int, int, int]: + """Parse a ``BS:INPUT_LEN:CTX`` string into an int 3-tuple. + + Raises :class:`ValueError` on bad input. + """ + parts = s.strip().split(":") + if len(parts) != 3: + raise ValueError( + f"Bad sweep point {s!r}: expected BS:INPUT_LEN:CTX " + f"(e.g. 1:2048:0)" + ) + try: + return int(parts[0]), int(parts[1]), int(parts[2]) + except ValueError: + raise ValueError( + f"Bad sweep point {s!r}: all three values must be integers" + ) + + +def load_sweep_file(path: str) -> list[tuple[int, int, int]]: + """Read sweep points from a file (one ``BS:INPUT_LEN:CTX`` per line). + + Blank lines and ``#`` comments are skipped. + Raises :class:`ValueError` on bad entries. + """ + points: list[tuple[int, int, int]] = [] + with open(path) as f: + for line in f: + line = line.strip() + if not line or line.startswith("#"): + continue + points.append(parse_sweep_point(line)) + return points diff --git a/scripts/cli/__init__.py b/scripts/cli/__init__.py new file mode 100644 index 0000000..9d4755e --- /dev/null +++ b/scripts/cli/__init__.py @@ -0,0 +1,167 @@ +"""FlowSim CLI — unified entry point. + +Usage:: + + flowsim init k8s # create ~/.flowsim/k8s.yaml template + flowsim init slurm # create ~/.flowsim/slurm.yaml template + flowsim submit --scheduler k8s --collect perf --model-path ... + flowsim submit ... --dry-run # debug: preview manifest +""" + +from __future__ import annotations + +import argparse +import sys +from pathlib import Path + +_CONFIG_DIR = Path.home() / ".flowsim" +_TEMPLATES_DIR = ( + Path(__file__).resolve().parent.parent.parent / "schedulers" / "templates" +) + + +def _cmd_init(argv: list[str]) -> int: + """Install a scheduler config to ~/.flowsim/. + + Without --config: copies the bundled template from schedulers/templates/. + With --config: copies the specified file. + """ + parser = argparse.ArgumentParser( + prog="flowsim init", + description=( + "Install scheduler config under ~/.flowsim/.\n\n" + "Examples:\n" + " flowsim init k8s # install bundled template\n" + " flowsim init k8s --config my.yaml # install your own file\n" + " flowsim init slurm --force # overwrite existing" + ), + formatter_class=argparse.RawDescriptionHelpFormatter, + ) + parser.add_argument( + "scheduler", + choices=["k8s", "slurm"], + help="Scheduler type", + ) + parser.add_argument( + "--config", + "-c", + default="", + help="Path to a config YAML to install (default: bundled template)", + ) + parser.add_argument( + "--force", + action="store_true", + help="Overwrite existing config file", + ) + args = parser.parse_args(argv) + + dst = _CONFIG_DIR / f"{args.scheduler}.yaml" + + if dst.exists() and not args.force: + print( + f"Error: {dst} already exists (use --force to overwrite)", + file=sys.stderr, + ) + return 1 + + if args.config: + src = Path(args.config).expanduser() + else: + src = _TEMPLATES_DIR / f"{args.scheduler}.yaml" + + if not src.is_file(): + print(f"Error: config file not found: {src}", file=sys.stderr) + return 1 + + import shutil + + _CONFIG_DIR.mkdir(parents=True, exist_ok=True) + shutil.copy2(src, dst) + print(f"Installed {src} → {dst}") + print( + f"Edit {dst}, then run: flowsim submit --scheduler " + f"{args.scheduler} ..." + ) + return 0 + + +def main(argv: list[str] | None = None) -> int: + parser = argparse.ArgumentParser( + prog="flowsim", + description="FlowSim: workload simulation pipeline CLI", + ) + sub = parser.add_subparsers(dest="command") + sub.required = True + + sub.add_parser( + "init", + help="Configure a scheduler (k8s/slurm) and save to ~/.flowsim/", + add_help=False, + ) + sub.add_parser( + "submit", + help="Submit a profiling job to K8s or Slurm", + add_help=False, + ) + sub.add_parser( + "status", + help="Query job status (local/k8s/slurm)", + add_help=False, + ) + sub.add_parser( + "logs", + help="Retrieve job logs (local/k8s/slurm)", + add_help=False, + ) + sub.add_parser( + "list", + help="List FlowSim jobs (local/k8s/slurm)", + add_help=False, + ) + sub.add_parser( + "cancel", + help="Cancel a running job (k8s/slurm)", + add_help=False, + ) + + args, remaining = parser.parse_known_args(argv) + + if args.command == "init": + return _cmd_init(remaining) + + if args.command == "submit": + from scripts.cli.submit import main as submit_main + + submit_main(remaining) + return 0 + + if args.command == "status": + from scripts.cli.manage import main_status + + main_status(remaining) + return 0 + + if args.command == "logs": + from scripts.cli.manage import main_logs + + main_logs(remaining) + return 0 + + if args.command == "list": + from scripts.cli.manage import main_list + + main_list(remaining) + return 0 + + if args.command == "cancel": + from scripts.cli.manage import main_cancel + + main_cancel(remaining) + return 0 + + parser.print_help() + return 1 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/scripts/cli/manage.py b/scripts/cli/manage.py new file mode 100644 index 0000000..bf389ab --- /dev/null +++ b/scripts/cli/manage.py @@ -0,0 +1,211 @@ +#!/usr/bin/env python3 +"""Query FlowSim profiling job status, logs, list, and cancel. + +Usage examples +-------------- + +Check K8s job status:: + + flowsim status --scheduler k8s --job flowsim-perf-qwen3-8b-bs1-il2048 + +Get K8s job logs:: + + flowsim logs --scheduler k8s --job flowsim-perf-qwen3-8b-bs1-il2048 + +Follow K8s job logs:: + + flowsim logs --scheduler k8s --job flowsim-perf-qwen3-8b-bs1-il2048 --follow + +List all FlowSim jobs:: + + flowsim list --scheduler k8s + flowsim list --scheduler k8s --status Running + +Cancel a job:: + + flowsim cancel --scheduler k8s --job flowsim-perf-qwen3-8b-bs1-il2048 +""" + +from __future__ import annotations + +import argparse +import sys + +from schedulers.config import ( + cfg_get, + load_k8s_config, + load_slurm_config, + resolve_default, +) +from schedulers.k8s import K8sScheduler +from schedulers.local import LocalScheduler +from schedulers.slurm import SlurmScheduler + +_d = resolve_default + + +def _add_scheduler_args(p: argparse.ArgumentParser) -> None: + """Add common scheduler choice arg (first pass only).""" + p.add_argument( + "--scheduler", + choices=["local", "k8s", "slurm"], + required=True, + ) + + +def _add_scheduler_specific_args( + p: argparse.ArgumentParser, scheduler: str +) -> None: + """Add only the args relevant to the chosen scheduler (second pass).""" + k8s_cfg = load_k8s_config() + slurm_cfg = load_slurm_config() + + if scheduler == "local": + p.add_argument("--local-workdir", default="") + + elif scheduler == "k8s": + p.add_argument( + "--k8s-namespace", + default=_d( + "FLOWSIM_K8S_NAMESPACE", k8s_cfg, "namespace", "default" + ), + ) + p.add_argument( + "--k8s-kubeconfig", + default=_d("KUBECONFIG", k8s_cfg, "kubeconfig", ""), + ) + p.add_argument( + "--k8s-context", + default=_d("FLOWSIM_K8S_CONTEXT", k8s_cfg, "context", ""), + ) + p.add_argument( + "--k8s-pvc", + default=cfg_get(k8s_cfg, "pvc", ""), + ) + p.add_argument( + "--k8s-host-output-dir", + default=cfg_get(k8s_cfg, "host_output_dir", ""), + ) + + elif scheduler == "slurm": + p.add_argument( + "--slurm-cli-prefix", + default=cfg_get(slurm_cfg, "cli_prefix", ""), + ) + + +def _build_scheduler(args: argparse.Namespace): + if args.scheduler == "local": + return LocalScheduler(workdir=getattr(args, "local_workdir", "")) + elif args.scheduler == "k8s": + return K8sScheduler( + namespace=args.k8s_namespace, + kubeconfig=args.k8s_kubeconfig, + context=args.k8s_context, + pvc_name=getattr(args, "k8s_pvc", "") or "", + host_output_dir=getattr(args, "k8s_host_output_dir", "") or "", + ) + else: + return SlurmScheduler( + cli_prefix=args.slurm_cli_prefix, + ) + + +def _parse_two_pass( + p: argparse.ArgumentParser, argv: list[str] | None = None +) -> argparse.Namespace: + """Two-pass parse: peek --scheduler, add scheduler-specific args, full parse.""" + _pre = argparse.ArgumentParser(add_help=False) + _pre.add_argument("--scheduler", choices=["local", "k8s", "slurm"]) + pre, _ = _pre.parse_known_args(argv) + _add_scheduler_specific_args(p, pre.scheduler) + return p.parse_args(argv) + + +def main_status(argv: list[str] | None = None) -> None: + p = argparse.ArgumentParser(description="Query FlowSim job status.") + _add_scheduler_args(p) + p.add_argument("--job", required=True, help="Job name or ID") + args = _parse_two_pass(p, argv) + + scheduler = _build_scheduler(args) + try: + info = scheduler.status(args.job) + print(f"State: {info['state']}") + print(info["message"]) + except Exception as exc: + print(f"Error: {exc}", file=sys.stderr) + sys.exit(1) + + +def main_logs(argv: list[str] | None = None) -> None: + p = argparse.ArgumentParser(description="Retrieve FlowSim job logs.") + _add_scheduler_args(p) + p.add_argument("--job", required=True, help="Job name or ID") + p.add_argument( + "--tail", + type=int, + default=100, + help="Number of log lines (default: 100)", + ) + p.add_argument( + "--follow", "-f", action="store_true", help="Follow log output" + ) + args = _parse_two_pass(p, argv) + + scheduler = _build_scheduler(args) + try: + text = scheduler.logs(args.job, tail=args.tail, follow=args.follow) + print(text) + except Exception as exc: + print(f"Error: {exc}", file=sys.stderr) + sys.exit(1) + + +def main_list(argv: list[str] | None = None) -> None: + p = argparse.ArgumentParser(description="List FlowSim jobs.") + _add_scheduler_args(p) + p.add_argument( + "--status", + default="", + help="Filter by job state (e.g. Running, Succeeded, PENDING)", + ) + args = _parse_two_pass(p, argv) + + scheduler = _build_scheduler(args) + try: + jobs = scheduler.list_jobs(status_filter=args.status) + if not jobs: + print("No jobs found.") + return + # Print table header + headers = list(jobs[0].keys()) + widths = { + h: max(len(h), max(len(str(j.get(h, ""))) for j in jobs)) + for h in headers + } + header_line = " ".join(h.upper().ljust(widths[h]) for h in headers) + print(header_line) + print("-" * len(header_line)) + for job in jobs: + print( + " ".join(str(job.get(h, "")).ljust(widths[h]) for h in headers) + ) + except Exception as exc: + print(f"Error: {exc}", file=sys.stderr) + sys.exit(1) + + +def main_cancel(argv: list[str] | None = None) -> None: + p = argparse.ArgumentParser(description="Cancel a FlowSim job.") + _add_scheduler_args(p) + p.add_argument("--job", required=True, help="Job name or ID to cancel") + args = _parse_two_pass(p, argv) + + scheduler = _build_scheduler(args) + try: + msg = scheduler.cancel(args.job) + print(msg) + except Exception as exc: + print(f"Error: {exc}", file=sys.stderr) + sys.exit(1) diff --git a/scripts/cli/submit.py b/scripts/cli/submit.py new file mode 100644 index 0000000..e59d697 --- /dev/null +++ b/scripts/cli/submit.py @@ -0,0 +1,490 @@ +#!/usr/bin/env python3 +"""Submit FlowSim profiling jobs locally, to Kubernetes, or to Slurm. + +Usage examples +-------------- + +Run locally (no cluster needed): + + flowsim submit \\ + --scheduler local \\ + --collect perf \\ + --model-path Qwen/Qwen3-8B \\ + --tp 1 --local-gpus 0 + +Dry-run (print Kubernetes Job YAML to stdout): + + flowsim submit \\ + --scheduler k8s \\ + --collect perf \\ + --model-path Qwen/Qwen3-235B-A22B-FP8 \\ + --tp 4 --gpus 4 \\ + --bs 1 --input-len 2048 --decode-tokens 2 \\ + --image flowsim-image:latest \\ + --k8s-namespace default \\ + --k8s-pvc flowsim-traces \\ + --dry-run + +Dry-run (print Slurm sbatch script to stdout): + + flowsim submit \\ + --scheduler slurm \\ + --collect perf \\ + --model-path Qwen/Qwen3-235B-A22B-FP8 \\ + --tp 4 --gpus 4 \\ + --slurm-partition gpu-a100 \\ + --slurm-time 02:00:00 \\ + --dry-run + +Submit directly to cluster (omit --dry-run): + + flowsim submit \\ + --scheduler k8s \\ + ... +""" +from __future__ import annotations + +import argparse +import os +import sys + +from schedulers.base import ProfileJobSpec +from schedulers.config import ( + cfg_get, + load_k8s_config, + load_slurm_config, + resolve_default, +) +from schedulers.k8s import K8sScheduler +from schedulers.local import LocalScheduler +from schedulers.slurm import SlurmScheduler +from scripts import load_sweep_file, parse_sweep_point + +# Short alias for argparse default= expressions +_d = resolve_default + + +def _parse_args(argv: list[str] | None = None) -> argparse.Namespace: + # Load per-scheduler config files for defaults + k8s_cfg = load_k8s_config() + slurm_cfg = load_slurm_config() + + p = argparse.ArgumentParser( + description="Submit FlowSim profiling jobs to local, K8s, or Slurm.", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=__doc__, + ) + + # -- Scheduler choice -- + p.add_argument( + "--scheduler", + choices=["local", "k8s", "slurm"], + required=True, + help="Scheduler backend.", + ) + + # -- Profiling workload (mirrors run_stage_profile.py) -- + wl = p.add_argument_group("workload") + wl.add_argument( + "--collect", + choices=["perf", "shapes", "all"], + required=True, + ) + wl.add_argument("--model-path", required=True, help="HF model path") + wl.add_argument("--tp", type=int, default=1) + wl.add_argument("--dp", type=int, default=1) + wl.add_argument("--bs", type=int, default=1, help="Batch size") + wl.add_argument("--input-len", type=int, default=2048) + wl.add_argument("--existing-ctx", type=int, default=0) + wl.add_argument("--decode-tokens", type=int, default=2) + wl.add_argument("--warmup-n", type=int, default=5) + wl.add_argument( + "--disable-chunked-prefill", + action="store_true", + ) + wl.add_argument("--max-prefill-tokens", type=int, default=131072) + wl.add_argument( + "--extra-server-opts", + default="", + help="Extra server options appended verbatim", + ) + wl.add_argument( + "--sweep", + type=str, + nargs="+", + default=[], + metavar="BS:INPUT_LEN:CTX", + help=( + "Profile multiple (bs, input_len, existing_ctx) points in one job. " + "Each value is a colon-separated tuple, e.g. --sweep 1:2048:0 4:8192:0. " + "Overrides --bs, --input-len, --existing-ctx." + ), + ) + wl.add_argument( + "--sweep-file", + type=str, + default="", + metavar="FILE", + help=( + "Read sweep points from a file (one BS:INPUT_LEN:CTX per line, " + "# comments allowed). Overrides --bs, --input-len, --existing-ctx." + ), + ) + + # -- Infrastructure -- + infra = p.add_argument_group("infrastructure") + infra.add_argument("--image", default="flowsim-image:latest") + infra.add_argument( + "--gpus", + type=int, + default=1, + help="Total GPU count", + ) + infra.add_argument("--host", default="0.0.0.0") + infra.add_argument("--port", type=int, default=30001) + infra.add_argument("--output-dir", default="") + infra.add_argument("--job-name", default="") + + # -- Action -- + p.add_argument( + "--dry-run", + action="store_true", + help="[debug] Print rendered manifest without submitting", + ) + + # ---- Two-pass: peek at --scheduler, then add only relevant args ---- + # Use a minimal pre-parser to avoid required-arg errors during peek. + _pre = argparse.ArgumentParser(add_help=False) + _pre.add_argument("--scheduler", choices=["local", "k8s", "slurm"]) + pre, _ = _pre.parse_known_args(argv) + + if pre.scheduler == "local": + loc = p.add_argument_group("local options") + loc.add_argument( + "--local-gpus", + default="", + help="CUDA_VISIBLE_DEVICES for local execution (e.g. '0' or '0,1')", + ) + loc.add_argument( + "--local-workdir", + default="", + help="Working directory for local execution (default: FlowSim project root)", + ) + + elif pre.scheduler == "k8s": + k8s = p.add_argument_group( + "kubernetes options (config: ~/.flowsim/k8s.yaml)" + ) + k8s.add_argument( + "--k8s-namespace", + default=_d( + "FLOWSIM_K8S_NAMESPACE", k8s_cfg, "namespace", "default" + ), + help="K8s namespace (env: FLOWSIM_K8S_NAMESPACE)", + ) + k8s.add_argument( + "--k8s-kubeconfig", + default=_d("KUBECONFIG", k8s_cfg, "kubeconfig", ""), + help="Path to kubeconfig file (env: KUBECONFIG)", + ) + k8s.add_argument( + "--k8s-context", + default=_d("FLOWSIM_K8S_CONTEXT", k8s_cfg, "context", ""), + help="kubeconfig context (env: FLOWSIM_K8S_CONTEXT)", + ) + k8s.add_argument( + "--k8s-pvc", + default=cfg_get(k8s_cfg, "pvc", ""), + help="PVC name for output volume (omit for emptyDir)", + ) + k8s.add_argument( + "--k8s-host-output-dir", + default=cfg_get(k8s_cfg, "host_output_dir", ""), + help="hostPath for output (used when --k8s-pvc is empty)", + ) + k8s.add_argument( + "--k8s-node-selector", + action="append", + default=[], + metavar="KEY=VALUE", + help="Node selector labels (repeatable)", + ) + k8s.add_argument( + "--k8s-service-account", + default=cfg_get(k8s_cfg, "service_account", ""), + ) + k8s.add_argument( + "--k8s-shm-size", + default=cfg_get(k8s_cfg, "shm_size", "16Gi"), + ) + k8s.add_argument( + "--k8s-runtime-class", + default=cfg_get(k8s_cfg, "runtime_class_name", ""), + help="RuntimeClass for pod (e.g. 'nvidia' for CDI mode)", + ) + + elif pre.scheduler == "slurm": + slurm = p.add_argument_group( + "slurm options (config: ~/.flowsim/slurm.yaml)" + ) + slurm.add_argument( + "--slurm-partition", + default=_d("FLOWSIM_SLURM_PARTITION", slurm_cfg, "partition", ""), + help="Slurm partition (env: FLOWSIM_SLURM_PARTITION)", + ) + slurm.add_argument( + "--slurm-time", + default=_d("FLOWSIM_SLURM_TIME", slurm_cfg, "time", "02:00:00"), + help="Wall time limit (env: FLOWSIM_SLURM_TIME)", + ) + slurm.add_argument( + "--slurm-account", + default=cfg_get(slurm_cfg, "account", ""), + ) + slurm.add_argument( + "--slurm-constraint", + default=cfg_get(slurm_cfg, "constraint", ""), + ) + slurm.add_argument( + "--slurm-container-runtime", + choices=["docker", "enroot", "none"], + default=cfg_get(slurm_cfg, "container_runtime", "none"), + ) + slurm.add_argument( + "--slurm-container-mounts", + default=cfg_get(slurm_cfg, "container_mounts", ""), + ) + # Modules from config (list) + CLI (append) + cfg_modules = ( + slurm_cfg.get("modules") + if isinstance(slurm_cfg.get("modules"), list) + else [] + ) + slurm.add_argument( + "--slurm-module", + action="append", + default=[str(m) for m in cfg_modules], + help="Modules to load (repeatable, merged with config)", + ) + slurm.add_argument( + "--slurm-extra-sbatch", + action="append", + default=[], + metavar="DIRECTIVE", + help="Extra #SBATCH directives (repeatable, without prefix)", + ) + slurm.add_argument( + "--slurm-cli-prefix", + default=cfg_get(slurm_cfg, "cli_prefix", ""), + help='Shell prefix for CLI mode (e.g. "docker exec -i slurmctld")', + ) + + return p.parse_args(argv) + + +def _parse_sweep_points(args) -> list[tuple[int, int, int]]: + """Resolve sweep points from --sweep / --sweep-file args.""" + if args.sweep and args.sweep_file: + sys.exit("Error: --sweep and --sweep-file are mutually exclusive") + try: + if args.sweep: + return [parse_sweep_point(s) for s in args.sweep] + if args.sweep_file: + return load_sweep_file(args.sweep_file) + except ValueError as e: + sys.exit(str(e)) + return [] + + +def _build_spec(args: argparse.Namespace) -> ProfileJobSpec: + sweep_points = _parse_sweep_points(args) + return ProfileJobSpec( + collect=args.collect, + model_path=args.model_path, + tp=args.tp, + dp=args.dp, + bs=args.bs, + input_len=args.input_len, + existing_ctx=args.existing_ctx, + decode_tokens=args.decode_tokens, + warmup_n=args.warmup_n, + disable_chunked_prefill=args.disable_chunked_prefill, + max_prefill_tokens=args.max_prefill_tokens, + image=args.image, + gpus=args.gpus, + host=args.host, + port=args.port, + output_dir=args.output_dir, + job_name=args.job_name, + extra_server_opts=args.extra_server_opts, + sweep_points=sweep_points, + ) + + +def _build_scheduler(args: argparse.Namespace): + if args.scheduler == "local": + return LocalScheduler( + gpus=args.local_gpus, + workdir=args.local_workdir, + ) + elif args.scheduler == "k8s": + node_sel = {} + for item in args.k8s_node_selector: + k, _, v = item.partition("=") + if not v: + sys.exit( + f"Bad --k8s-node-selector format: {item!r} (use KEY=VALUE)" + ) + node_sel[k] = v + return K8sScheduler( + namespace=args.k8s_namespace, + kubeconfig=args.k8s_kubeconfig, + context=args.k8s_context, + pvc_name=args.k8s_pvc, + host_output_dir=args.k8s_host_output_dir, + node_selector=node_sel, + service_account=args.k8s_service_account, + shm_size=args.k8s_shm_size, + runtime_class_name=args.k8s_runtime_class, + ) + else: + return SlurmScheduler( + partition=args.slurm_partition, + time_limit=args.slurm_time, + account=args.slurm_account, + constraint=args.slurm_constraint, + container_runtime=args.slurm_container_runtime, + container_mounts=args.slurm_container_mounts, + modules=args.slurm_module, + extra_sbatch=args.slurm_extra_sbatch, + cli_prefix=args.slurm_cli_prefix, + ) + + +def main(argv: list[str] | None = None) -> None: + args = _parse_args(argv) + + # Smart defaults for output_dir based on scheduler. + # Layout: stage_traces/{scheduler}/{timestamp}/ + import time as _time + + _ts = _time.strftime("%Y%m%d_%H%M%S") + if not args.output_dir: + if args.scheduler == "local": + args.output_dir = f"/flowsim/stage_traces/local/{_ts}" + elif args.scheduler == "slurm": + args.output_dir = f"/flowsim/stage_traces/slurm/{_ts}" + else: + args.output_dir = f"/flowsim/stage_traces/k8s/{_ts}" + + # Validate required connection params before submit + if not args.dry_run and args.scheduler not in ("local",): + _validate_connection(args) + + # For local scheduler, convert absolute host model_path to relative + # so it resolves correctly inside the container (workdir=/flowsim). + if args.scheduler == "local" and os.path.isabs(args.model_path): + project_root = os.path.dirname( + os.path.dirname(os.path.abspath(__file__)) + ) + if args.model_path.startswith(project_root): + args.model_path = os.path.relpath(args.model_path, project_root) + + spec = _build_spec(args) + scheduler = _build_scheduler(args) + + if args.dry_run: + print(scheduler.dry_run(spec)) + else: + result = scheduler.submit(spec) + print(result.message) + + # Tell user where to find results + print() + print(f"Traces: {result.output_dir}") + print(f"Logs: {result.output_dir}/logs/") + job_id = result.job_id + sched = args.scheduler + + if sched == "k8s": + if args.k8s_pvc: + print(f" (persisted on PVC '{args.k8s_pvc}')") + else: + print( + f" (persisted at hostPath '{args.k8s_host_output_dir}' on the node)" + ) + print( + f"\nTo check status: flowsim status --scheduler k8s --job {job_id}" + ) + print( + f"To view logs: flowsim logs --scheduler k8s --job {job_id}" + ) + print( + f"To follow logs: flowsim logs --scheduler k8s --job {job_id} --follow" + ) + print( + f"To cancel: flowsim cancel --scheduler k8s --job {job_id}" + ) + elif sched == "slurm": + print(f" (on cluster shared filesystem)") + print( + f"\nTo check status: flowsim status --scheduler slurm --job {job_id}" + ) + print( + f"To view logs: flowsim logs --scheduler slurm --job {job_id}" + ) + print( + f"To cancel: flowsim cancel --scheduler slurm --job {job_id}" + ) + else: + print( + f"\nTo view logs: flowsim logs --scheduler local --job {job_id}" + ) + print(f"To list all jobs: flowsim list --scheduler {sched}") + + +_INIT_HINT = "Run 'flowsim init' to create config files." + + +def _validate_connection(args: argparse.Namespace) -> None: + """Fail fast if required cluster connection params are missing.""" + if args.scheduler == "k8s": + if not args.k8s_namespace: + sys.exit( + "Error: K8s namespace not set.\n" + "Set it in ~/.flowsim/k8s.yaml, FLOWSIM_K8S_NAMESPACE env var,\n" + f"or --k8s-namespace flag. {_INIT_HINT}" + ) + # Traces + logs must survive pod termination + if not args.k8s_pvc and not args.k8s_host_output_dir: + sys.exit( + "Error: no persistent storage configured for K8s job output.\n" + "Traces and logs are written to output_dir inside the pod —\n" + "without a volume mount they are lost when the pod exits.\n\n" + "Set one of:\n" + " --k8s-pvc (PersistentVolumeClaim)\n" + " --k8s-host-output-dir (hostPath on the node)\n\n" + "Or configure in ~/.flowsim/k8s.yaml:\n" + " pvc: my-traces-pvc\n" + " # or\n" + " host_output_dir: /data/flowsim-traces" + ) + # kubeconfig is optional (in-cluster auto-discovery), but warn + if not args.k8s_kubeconfig and not args.k8s_context: + print( + "Note: no kubeconfig or context specified. " + "Will try ~/.kube/config and in-cluster auto-discovery.", + file=sys.stderr, + ) + elif args.scheduler == "slurm": + if not args.slurm_partition: + sys.exit( + "Error: missing required Slurm config:\n" + " - partition (--slurm-partition)\n\n" + f"Set it in ~/.flowsim/slurm.yaml or via CLI flag.\n" + + _INIT_HINT + ) + + +if __name__ == "__main__": + main() diff --git a/scripts/run_stage_profile.py b/scripts/run_stage_profile.py index 8346e3b..36505ec 100644 --- a/scripts/run_stage_profile.py +++ b/scripts/run_stage_profile.py @@ -61,14 +61,14 @@ python scripts/run_stage_profile.py \\ --collect perf \\ --host 0.0.0.0 --port 30001 \\ - --bs 1 --input-len 2048 --decode-tokens 32 \\ + --bs 1 --input-len 2048 --decode-tokens 2 \\ --output-dir /flowsim/stage_traces Example — with existing KV cache context python scripts/run_stage_profile.py \\ --collect perf \\ --host 0.0.0.0 --port 30001 \\ - --bs 4 --input-len 512 --existing-ctx 4096 --decode-tokens 32 \\ + --bs 4 --input-len 512 --existing-ctx 4096 --decode-tokens 2 \\ --output-dir /flowsim/stage_traces Example — launch server + full pipeline (perf → shapes) @@ -107,12 +107,13 @@ ) from utils.net import wait_for_port from utils.shape_merge import merge_shapes_dir +from scripts import load_sweep_file, parse_sweep_point # --------------------------------------------------------------------------- # Defaults # --------------------------------------------------------------------------- DEFAULT_WARMUP_N = 5 -DEFAULT_DECODE_TOKENS = 32 +DEFAULT_DECODE_TOKENS = 2 DEFAULT_MAX_PREFILL_TOKENS = 131072 @@ -700,6 +701,31 @@ def parse_args(argv: Optional[list] = None) -> argparse.Namespace: default="/flowsim/stage_traces", help="Root directory for trace output", ) + + sweep = p.add_argument_group("sweep (multi-point profiling)") + sweep.add_argument( + "--sweep", + type=str, + nargs="+", + default=[], + metavar="BS:INPUT_LEN:CTX", + help=( + "Profile multiple (bs, input_len, existing_ctx) points in one job. " + "Each value is a colon-separated tuple, e.g. --sweep 1:2048:0 4:8192:0 16:2048:4096. " + "Overrides --bs, --input-len, --existing-ctx." + ), + ) + sweep.add_argument( + "--sweep-file", + type=str, + default="", + metavar="FILE", + help=( + "Read sweep points from a file (one BS:INPUT_LEN:CTX per line, " + "# comments allowed). Overrides --bs, --input-len, --existing-ctx." + ), + ) + srv = p.add_argument_group("server launch (optional)") srv.add_argument( "--launch-server", @@ -714,13 +740,27 @@ def parse_args(argv: Optional[list] = None) -> argparse.Namespace: ) srv.add_argument( "--log-dir", - default="/flowsim/tests/test-artifacts", - help="Directory for server logs", + default="", + help="Directory for server logs (default: {output-dir}/logs/)", ) return p.parse_args(argv) +def _load_sweep_points(args) -> list[tuple[int, int, int]]: + """Resolve sweep points from --sweep, --sweep-file, or single-point args.""" + if args.sweep and args.sweep_file: + print("[ERROR] --sweep and --sweep-file are mutually exclusive") + raise SystemExit(1) + + if args.sweep: + return [parse_sweep_point(s) for s in args.sweep] + if args.sweep_file: + return load_sweep_file(args.sweep_file) + # Single-point from --bs / --input-len / --existing-ctx + return [(args.bs, args.input_len, args.existing_ctx)] + + # --------------------------------------------------------------------------- # Phase runners # --------------------------------------------------------------------------- @@ -759,11 +799,20 @@ def _start_server( return proc -def _run_perf(args, summary: list[dict]) -> int: +def _run_perf( + args, + summary: list[dict], + *, + bs: Optional[int] = None, + input_len: Optional[int] = None, + existing_ctx: Optional[int] = None, +) -> int: """Collect traces for a single (bs, input_len, existing_ctx, decode_tokens) point.""" - bs = args.bs - input_len = args.input_len - existing_ctx = args.existing_ctx + bs = bs if bs is not None else args.bs + input_len = input_len if input_len is not None else args.input_len + existing_ctx = ( + existing_ctx if existing_ctx is not None else args.existing_ctx + ) tag = f"bs{bs}_input{input_len}_ctx{existing_ctx}" sub_dir = os.path.join(args.output_dir, tag) @@ -873,6 +922,10 @@ def _write_summary(args, summary: list[dict]) -> None: def main(argv: Optional[list] = None) -> int: args = parse_args(argv) + # Default log_dir to {output_dir}/logs/ if not specified + if not args.log_dir: + args.log_dir = os.path.join(args.output_dir, "logs") + if args.decode_tokens < 2: print( "[ERROR] --decode-tokens must be >= 2. " @@ -883,6 +936,14 @@ def main(argv: Optional[list] = None) -> int: server_proc = None summary: list[dict] = [] + sweep_points = _load_sweep_points(args) + is_sweep = len(sweep_points) > 1 + + if is_sweep: + print(f"\n[sweep] {len(sweep_points)} points to profile:") + for i, (bs, il, ctx) in enumerate(sweep_points): + print(f" [{i+1}] bs={bs} input_len={il} existing_ctx={ctx}") + print() try: # ================================================================== @@ -904,7 +965,10 @@ def main(argv: Optional[list] = None) -> int: print(" PHASE 1 / 2 : PERF COLLECTION") print("=" * 60 + "\n") server_proc = _start_server(args, disable_cuda_graph=False) - _run_perf(args, summary) + for idx, (bs, il, ctx) in enumerate(sweep_points): + if is_sweep: + print(f"\n[sweep] Point {idx+1}/{len(sweep_points)}") + _run_perf(args, summary, bs=bs, input_len=il, existing_ctx=ctx) _write_summary(args, summary) print("\n[server] Shutting down for shape pass …") kill_server(server_proc) @@ -925,7 +989,10 @@ def main(argv: Optional[list] = None) -> int: if args.collect == "perf": if args.launch_server: server_proc = _start_server(args, disable_cuda_graph=False) - _run_perf(args, summary) + for idx, (bs, il, ctx) in enumerate(sweep_points): + if is_sweep: + print(f"\n[sweep] Point {idx+1}/{len(sweep_points)}") + _run_perf(args, summary, bs=bs, input_len=il, existing_ctx=ctx) _write_summary(args, summary) return 0 diff --git a/simulator/base_parser.py b/simulator/base_parser.py index ca9cadb..2b77967 100644 --- a/simulator/base_parser.py +++ b/simulator/base_parser.py @@ -319,12 +319,12 @@ def _parse_events(self) -> list[tuple]: else: # Case 2: If no ext_id, we need to find the shape from user annotations # Key Identification Methodology: Annotation is overlapped with kernel + dims_anno = "N/A" + input_type_anno = "N/A" + desc_anno = "" for anno_idx, anno in enumerate(annotation_events): if anno_idx in used_annotations: continue - dims_anno = "N/A" - input_type_anno = "N/A" - desc_anno = "" if "ProfilerStep" in anno.get("name", ""): continue anno_start = anno.get("ts", 0) diff --git a/tests/integration/infra/cgroup.conf b/tests/integration/infra/cgroup.conf new file mode 100644 index 0000000..68de2cc --- /dev/null +++ b/tests/integration/infra/cgroup.conf @@ -0,0 +1,3 @@ +# cgroup.conf — use cgroup v1 (only v1 plugin available; v2 host is compatible +# via the unified/hybrid hierarchy mount) +CgroupPlugin=cgroup/v1 diff --git a/tests/integration/infra/dev-setup.sh b/tests/integration/infra/dev-setup.sh new file mode 100755 index 0000000..02e447f --- /dev/null +++ b/tests/integration/infra/dev-setup.sh @@ -0,0 +1,363 @@ +#!/usr/bin/env bash +# dev-setup.sh — one-shot setup for FlowSim test clusters (kind + Slurm) +# +# Usage: +# ./tests/integration/infra/dev-setup.sh # setup both kind + slurm +# ./tests/integration/infra/dev-setup.sh kind # kind only +# ./tests/integration/infra/dev-setup.sh slurm # slurm only +# +# Teardown: +# ./tests/integration/infra/dev-teardown.sh + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +KIND_VERSION="v0.27.0" +KIND_CLUSTER_NAME="flowsim" +KIND_WORKERS=("${KIND_CLUSTER_NAME}-worker") +KUBECTL_STABLE_URL="https://dl.k8s.io/release/stable.txt" +HELM_INSTALL_URL="https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3" +NVIDIA_CTK_KEYRING="/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg" + +log() { printf "\033[1;32m[setup]\033[0m %s\n" "$*"; } +warn() { printf "\033[1;33m[setup]\033[0m %s\n" "$*"; } +err() { printf "\033[1;31m[setup]\033[0m %s\n" "$*" >&2; exit 1; } + +# ---------------------------------------------------------------- +# Dependency checks & auto-install +# ---------------------------------------------------------------- +ensure_docker() { + command -v docker >/dev/null || err "Docker is required but not installed." + docker info >/dev/null 2>&1 || err "Docker daemon not running." + log "Docker: $(docker --version)" +} + +ensure_kind() { + if command -v kind >/dev/null; then + log "kind already installed: $(kind version)" + return + fi + log "Installing kind ${KIND_VERSION}..." + curl -fsSLo /tmp/kind "https://kind.sigs.k8s.io/dl/${KIND_VERSION}/kind-linux-amd64" + chmod +x /tmp/kind + sudo mv /tmp/kind /usr/local/bin/kind + log "kind installed: $(kind version)" +} + +ensure_kubectl() { + if command -v kubectl >/dev/null; then + log "kubectl already installed" + return + fi + log "Installing kubectl..." + local ver + ver="$(curl -fsSL "${KUBECTL_STABLE_URL}")" + curl -fsSLo /tmp/kubectl "https://dl.k8s.io/release/${ver}/bin/linux/amd64/kubectl" + chmod +x /tmp/kubectl + sudo mv /tmp/kubectl /usr/local/bin/kubectl + log "kubectl installed: $(kubectl version --client --short 2>/dev/null || true)" +} + +# ---------------------------------------------------------------- +# Kind cluster with NVIDIA GPU via CDI +# (Official approach from NVIDIA k8s-device-plugin demo) +# https://github.com/NVIDIA/k8s-device-plugin/tree/main/demo/clusters/kind +# ---------------------------------------------------------------- +ensure_nvidia_runtime() { + # Docker must use nvidia as default runtime so Kind node containers get GPU access + command -v nvidia-ctk >/dev/null || err "nvidia-container-toolkit is required (nvidia-ctk not found)." + command -v nvidia-smi >/dev/null || err "NVIDIA driver not found (nvidia-smi missing)." + log "nvidia-ctk: $(nvidia-ctk --version 2>&1 | head -1)" + + if ! docker info 2>/dev/null | grep -q "Default Runtime: nvidia"; then + log "Setting nvidia as default Docker runtime..." + sudo nvidia-ctk runtime configure --runtime=docker --set-as-default + sudo systemctl restart docker + log "Docker restarted with nvidia runtime as default" + else + log "Docker already using nvidia as default runtime" + fi + + # Required: accept-nvidia-visible-devices-as-volume-mounts must be true + # for Kind GPU passthrough via /var/run/nvidia-container-devices/all + local cfg="/etc/nvidia-container-runtime/config.toml" + if grep -qE '^\s*accept-nvidia-visible-devices-as-volume-mounts\s*=\s*true' "$cfg" 2>/dev/null; then + log "accept-nvidia-visible-devices-as-volume-mounts already enabled" + else + log "Enabling accept-nvidia-visible-devices-as-volume-mounts in $cfg..." + if grep -qE '#?\s*accept-nvidia-visible-devices-as-volume-mounts' "$cfg" 2>/dev/null; then + sudo sed -i 's/#*\s*accept-nvidia-visible-devices-as-volume-mounts.*/accept-nvidia-visible-devices-as-volume-mounts = true/' "$cfg" + else + echo 'accept-nvidia-visible-devices-as-volume-mounts = true' | sudo tee -a "$cfg" >/dev/null + fi + sudo systemctl restart docker + log "Host nvidia-container-runtime config updated and Docker restarted" + fi +} + +ensure_helm() { + if command -v helm >/dev/null; then + log "helm already installed: $(helm version --short 2>/dev/null)" + return + fi + log "Installing helm..." + curl -fsSL "${HELM_INSTALL_URL}" | bash + log "helm installed: $(helm version --short)" +} + +setup_kind() { + ensure_docker + ensure_nvidia_runtime + ensure_kind + ensure_kubectl + ensure_helm + + if kind get clusters 2>/dev/null | grep -q "^${KIND_CLUSTER_NAME}$"; then + warn "kind cluster '${KIND_CLUSTER_NAME}' already exists, skipping creation" + else + log "Creating kind cluster '${KIND_CLUSTER_NAME}' (1 control-plane + 1 GPU worker)..." + kind create cluster --name "${KIND_CLUSTER_NAME}" \ + --config "${SCRIPT_DIR}/kind-multi-node.yaml" + fi + + # ── Post-creation: configure GPU support inside each worker node ── + for worker in "${KIND_WORKERS[@]}"; do + log "=== Configuring ${worker} ===" + + # Step 1: Unmount masked /proc/driver/nvidia + log "Unmounting /proc/driver/nvidia in ${worker}..." + docker exec "${worker}" umount -R /proc/driver/nvidia 2>/dev/null || true + + # Step 2: Install nvidia-container-toolkit inside the worker node + log "Installing nvidia-container-toolkit inside ${worker}..." + docker exec "${worker}" bash -c "apt-get update && apt-get install -y gpg" + docker exec "${worker}" bash -c "\ + curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \ + | gpg --dearmor -o ${NVIDIA_CTK_KEYRING} \ + && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \ + | sed 's#deb https://#deb [signed-by=${NVIDIA_CTK_KEYRING}] https://#g' \ + | tee /etc/apt/sources.list.d/nvidia-container-toolkit.list \ + && apt-get update \ + && apt-get install -y nvidia-container-toolkit" + + # Step 3: Configure CDI mode in containerd inside worker + log "Configuring CDI mode for containerd in ${worker}..." + docker exec "${worker}" bash -c "\ + nvidia-ctk config --set nvidia-container-runtime.modes.cdi.annotation-prefixes=nvidia.cdi.k8s.io/ \ + && nvidia-ctk runtime configure --runtime=containerd --cdi.enabled --config-source=command \ + && systemctl restart containerd" + + # Step 4: Label worker node for GPU presence + kubectl --context "kind-${KIND_CLUSTER_NAME}" label node "${worker}" \ + --overwrite nvidia.com/gpu.present=true + done + + # Step 5: Create nvidia RuntimeClass + log "Creating nvidia RuntimeClass..." + kubectl --context "kind-${KIND_CLUSTER_NAME}" apply -f - <<'RTEOF' +apiVersion: node.k8s.io/v1 +handler: nvidia +kind: RuntimeClass +metadata: + name: nvidia +RTEOF + + # Step 6: Deploy per-node NVIDIA device plugin DaemonSets + # Each worker gets its own DaemonSet with a specific NVIDIA_VISIBLE_DEVICES + # so the device plugin only discovers/advertises that worker's assigned GPU. + # (Helm's single DaemonSet can't set different env per node.) + log "Deploying NVIDIA device plugin (per-node GPU assignment)..." + local CTX="kind-${KIND_CLUSTER_NAME}" + local PLUGIN_IMAGE="nvcr.io/nvidia/k8s-device-plugin:v0.17.1" + local gpu_idx=0 + for worker in "${KIND_WORKERS[@]}"; do + local ds_name="nvidia-device-plugin-${worker##*-}" # e.g. nvidia-device-plugin-worker + kubectl --context "$CTX" apply -f - </dev/null 2>&1; then + for worker in "${KIND_WORKERS[@]}"; do + if docker exec "${worker}" crictl images 2>/dev/null | grep -q "flowsim-image.*latest"; then + log "${FLOWSIM_IMAGE} already loaded in ${worker}, skipping" + else + log "Loading ${FLOWSIM_IMAGE} into ${worker} (~34GB, may take several minutes)..." + if command -v pv >/dev/null; then + docker save "${FLOWSIM_IMAGE}" | pv -f -a -b | \ + docker exec -i "${worker}" ctr -n k8s.io images import - + else + docker save "${FLOWSIM_IMAGE}" | \ + docker exec -i "${worker}" ctr -n k8s.io images import - + fi + log "${FLOWSIM_IMAGE} loaded into ${worker}" + fi + done + else + warn "${FLOWSIM_IMAGE} not found on host, skipping image load (build it first)" + fi + + # Step 9: Wait for GPU resources + log "Waiting for nvidia.com/gpu resources to appear (up to 180s)..." + local gpu_retries=36 + while true; do + gpu_count=$(kubectl --context "kind-${KIND_CLUSTER_NAME}" get nodes \ + -o jsonpath='{range .items[*]}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}' 2>/dev/null \ + | grep -cE '^[1-9]' || true) + if [ "${gpu_count}" -ge 1 ]; then + log "GPUs registered on ${gpu_count} node(s)" + break + fi + gpu_retries=$((gpu_retries - 1)) + if [ "${gpu_retries}" -le 0 ]; then + warn "GPUs not registered after 180s — debugging info:" + kubectl --context "kind-${KIND_CLUSTER_NAME}" get pods -n nvidia-device-plugin -o wide 2>/dev/null || true + kubectl --context "kind-${KIND_CLUSTER_NAME}" describe nodes 2>/dev/null | grep -A5 "Allocatable" || true + break + fi + sleep 5 + done + + # Step 10: Init FlowSim K8s config + log "Initializing FlowSim K8s config..." + flowsim init k8s \ + --kubeconfig "${HOME}/.kube/config" \ + --context "kind-${KIND_CLUSTER_NAME}" \ + --namespace default \ + --host-output-dir /tmp/flowsim-traces \ + --runtime-class-name nvidia \ + --force + + log "Cluster nodes:" + kubectl --context "kind-${KIND_CLUSTER_NAME}" get nodes -o wide + echo + + log "GPU resources:" + kubectl --context "kind-${KIND_CLUSTER_NAME}" get nodes \ + -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}' 2>/dev/null || true + echo + + log "Kind cluster with GPU (CDI mode) ready." +} + +# ---------------------------------------------------------------- +# Slurm cluster (docker compose) +# ---------------------------------------------------------------- +setup_slurm() { + ensure_docker + + if ! docker compose version >/dev/null 2>&1; then + err "docker compose v2 is required but not available." + fi + + # HOST_WORKSPACE is used by slurm-compose.yaml for the read-only /workspace mount. + REPO_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)" + export HOST_WORKSPACE="${HOST_WORKSPACE:-$(dirname "${REPO_ROOT}")}" + + log "Building and starting Slurm cluster (slurmctld + 2 slurmd + slurmrestd)..." + log " HOST_WORKSPACE=${HOST_WORKSPACE}" + docker compose -f "${SCRIPT_DIR}/slurm-compose.yaml" up -d --build + + log "Waiting for slurmctld to become ready..." + local retries=30 + while ! docker exec slurmctld sinfo >/dev/null 2>&1; do + retries=$((retries - 1)) + if [ "${retries}" -le 0 ]; then + err "slurmctld did not become ready in time" + fi + sleep 2 + done + + log "Slurm cluster status:" + docker exec slurmctld sinfo + echo + + log "Initializing FlowSim Slurm config..." + flowsim init slurm \ + --rest-url "http://localhost:6820" \ + --partition normal \ + --account default \ + --jwt-token-cmd "docker exec slurmctld scontrol token lifespan=3600" \ + --force + echo + log "Slurm cluster ready. Test with:" + log " flowsim submit --scheduler slurm --collect perf --model-path --dry-run" +} + +# ---------------------------------------------------------------- +# Main +# ---------------------------------------------------------------- +target="${1:-all}" + +case "${target}" in + kind) + setup_kind + ;; + slurm) + setup_slurm + ;; + all) + setup_kind + echo + setup_slurm + ;; + *) + echo "Usage: $0 [kind|slurm|all]" + exit 1 + ;; +esac + +echo +log "All done. Teardown with: ./tests/integration/infra/dev-teardown.sh" diff --git a/tests/integration/infra/dev-teardown.sh b/tests/integration/infra/dev-teardown.sh new file mode 100755 index 0000000..c5e74ee --- /dev/null +++ b/tests/integration/infra/dev-teardown.sh @@ -0,0 +1,48 @@ +#!/usr/bin/env bash +# dev-teardown.sh — tear down FlowSim test clusters +# +# Usage: +# ./tests/integration/infra/dev-teardown.sh # teardown both +# ./tests/integration/infra/dev-teardown.sh kind # kind only +# ./tests/integration/infra/dev-teardown.sh slurm # slurm only + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +KIND_CLUSTER_NAME="flowsim" + +log() { printf "\033[1;32m[teardown]\033[0m %s\n" "$*"; } +warn() { printf "\033[1;33m[teardown]\033[0m %s\n" "$*"; } + +teardown_kind() { + # Delete device plugin namespace (contains per-node DaemonSets) + if command -v kubectl >/dev/null; then + kubectl delete namespace nvidia-device-plugin --ignore-not-found 2>/dev/null || true + fi + if command -v kind >/dev/null && kind get clusters 2>/dev/null | grep -q "^${KIND_CLUSTER_NAME}$"; then + log "Deleting kind cluster '${KIND_CLUSTER_NAME}'..." + kind delete cluster --name "${KIND_CLUSTER_NAME}" + else + warn "kind cluster '${KIND_CLUSTER_NAME}' not found, skipping" + fi +} + +teardown_slurm() { + if docker compose -f "${SCRIPT_DIR}/slurm-compose.yaml" ps --quiet 2>/dev/null | head -1 | grep -q .; then + log "Stopping Slurm containers..." + docker compose -f "${SCRIPT_DIR}/slurm-compose.yaml" down -v + else + warn "Slurm containers not running, skipping" + fi +} + +target="${1:-all}" + +case "${target}" in + kind) teardown_kind ;; + slurm) teardown_slurm ;; + all) teardown_kind; teardown_slurm ;; + *) echo "Usage: $0 [kind|slurm|all]"; exit 1 ;; +esac + +log "Done." diff --git a/tests/integration/infra/gres.conf b/tests/integration/infra/gres.conf new file mode 100644 index 0000000..745eeac --- /dev/null +++ b/tests/integration/infra/gres.conf @@ -0,0 +1,3 @@ +# Slurm GRES config — explicit GPU definition (AutoDetect=nvml requires +# cgroup v2 which is not available; define GPU manually) +Name=gpu Type=nvidia File=/dev/nvidia0 Count=1 diff --git a/tests/integration/infra/kind-multi-node.yaml b/tests/integration/infra/kind-multi-node.yaml new file mode 100644 index 0000000..7ec1b68 --- /dev/null +++ b/tests/integration/infra/kind-multi-node.yaml @@ -0,0 +1,37 @@ +# Kind cluster config — 1 control-plane + 1 GPU worker node +# +# GPU support via CDI mode (NVIDIA k8s-device-plugin official approach). +# See: https://github.com/NVIDIA/k8s-device-plugin/tree/main/demo/clusters/kind +# +# The single worker binds GPU 0. Change the containerPath index to +# assign a different GPU. +# +# Pre-requisites (host): +# - Docker with nvidia as default runtime +# - accept-nvidia-visible-devices-as-volume-mounts = true +# in /etc/nvidia-container-runtime/config.toml +# - kind, kubectl, helm +# +# Usage: +# ./tests/integration/infra/dev-setup.sh kind +# +# Teardown: +# ./tests/integration/infra/dev-teardown.sh kind + +kind: Cluster +apiVersion: kind.x-k8s.io/v1alpha4 + +nodes: + - role: control-plane + + # Worker — GPU 0 only + - role: worker + extraMounts: + - hostPath: /dev/null + containerPath: /var/run/nvidia-container-devices/0 + - hostPath: /path/to/host/workspace + containerPath: /workspace + readOnly: true + # Writable mount so K8s pods can write traces directly to host + - hostPath: /path/to/host/stage_traces + containerPath: /host-stage-traces diff --git a/tests/integration/infra/slurm-compose.yaml b/tests/integration/infra/slurm-compose.yaml new file mode 100644 index 0000000..b9ba09a --- /dev/null +++ b/tests/integration/infra/slurm-compose.yaml @@ -0,0 +1,141 @@ +# Slurm test cluster — slurmctld + 1 compute node (GPU 0) +# +# Requires HOST_WORKSPACE env var pointing to the directory containing +# model weights (mounted read-only into containers as /workspace). +# +# Usage: +# export HOST_WORKSPACE=/path/to/workspace +# cd tests/integration/infra/ +# docker compose -f slurm-compose.yaml up -d +# +# # Wait for cluster to be ready (~30s) +# docker exec slurmctld sinfo +# +# # Get JWT token for REST API +# docker exec slurmctld scontrol token lifespan=3600 +# +# # Init FlowSim +# flowsim init slurm --rest-url http://localhost:6820 \ +# --partition normal --account default \ +# --jwt-token-cmd "docker exec slurmctld scontrol token lifespan=3600" \ +# --force +# +# # Submit a job +# flowsim submit --scheduler slurm --collect perf \ +# --model-path /models/Qwen-7B --gpus 1 +# +# # Teardown +# docker compose -f slurm-compose.yaml down -v +# # Or from project root: +# docker compose -f tests/integration/infra/slurm-compose.yaml down -v + +x-slurm-base: &slurm-base + build: + context: . + dockerfile: slurm-node.dockerfile + volumes: + - slurm-etc:/etc/slurm + - munge-socket:/run/munge + # Share workspace for model weights / traces + - ${HOST_WORKSPACE:?set HOST_WORKSPACE to the directory containing model weights}:/workspace:ro + networks: + - slurm-net + +services: + # ---- Munge (shared auth daemon) ---- + munge: + <<: *slurm-base + container_name: munge + hostname: munge + command: > + bash -c " + if [ ! -f /etc/munge/munge.key ]; then + mungekey --create --force + fi + chown munge:munge /etc/munge/munge.key + chmod 400 /etc/munge/munge.key + mkdir -p /run/munge + chown munge:munge /run/munge + chmod 755 /run/munge + gosu munge munged --foreground + " + volumes: + - munge-key:/etc/munge + - munge-socket:/run/munge + + # ---- Controller ---- + slurmctld: + <<: *slurm-base + container_name: slurmctld + hostname: slurmctld + command: > + bash -c " + mkdir -p /run/munge && chown munge:munge /run/munge + until [ -S /run/munge/munge.socket.2 ]; do sleep 0.5; done + slurmctld -D -vvv + " + depends_on: + - munge + volumes: + - slurm-etc:/etc/slurm + - munge-key:/etc/munge:ro + - munge-socket:/run/munge + - slurm-state:/var/spool/slurmctld + + # ---- Compute node 0 (GPU 0) ---- + slurmd-0: + <<: *slurm-base + container_name: slurmd-0 + hostname: slurmd-0 + runtime: nvidia + environment: + NVIDIA_VISIBLE_DEVICES: "0" + command: > + bash -c " + mkdir -p /run/munge && chown munge:munge /run/munge + until [ -S /run/munge/munge.socket.2 ]; do sleep 0.5; done + slurmd -D -vvv + " + depends_on: + - slurmctld + volumes: + - slurm-etc:/etc/slurm:ro + - munge-key:/etc/munge:ro + - munge-socket:/run/munge + - ${HOST_WORKSPACE:?set HOST_WORKSPACE}:/workspace:ro + # Writable mount so traces appear on host + - ../../../stage_traces:/flowsim/stage_traces + # Cgroup needed by slurmd + - /sys/fs/cgroup:/sys/fs/cgroup:rw + + # ---- REST API (optional, for REST mode) ---- + # slurmrestd: + # <<: *slurm-base + # container_name: slurmrestd + # hostname: slurmrestd + # command: > + # bash -c " + # mkdir -p /run/munge && chown munge:munge /run/munge + # until [ -S /run/munge/munge.socket.2 ]; do sleep 0.5; done + # gosu slurm slurmrestd -a rest_auth/jwt 0.0.0.0:6820 -vvv -s slurmctld + # " + # depends_on: + # - slurmctld + # ports: + # - "6820:6820" + # cap_add: + # - SYS_ADMIN + # volumes: + # - slurm-etc:/etc/slurm:ro + # - munge-key:/etc/munge:ro + # - munge-socket:/run/munge + +volumes: + slurm-etc: + slurm-state: + munge-key: + munge-socket: + +networks: + slurm-net: + driver: bridge diff --git a/tests/integration/infra/slurm-node.dockerfile b/tests/integration/infra/slurm-node.dockerfile new file mode 100644 index 0000000..8b79db0 --- /dev/null +++ b/tests/integration/infra/slurm-node.dockerfile @@ -0,0 +1,55 @@ +# Slurm node image — controller, compute, and REST API +# +# Based on flowsim-image so compute nodes have the full Python/sglang +# environment. Slurm 23.11 is compiled on top with JWT + NVML GRES. +# Used by slurm-compose.yaml. + +FROM flowsim-image:latest + +ENV DEBIAN_FRONTEND=noninteractive + +# Slurm build dependencies + munge +RUN apt-get update && apt-get install -y --no-install-recommends \ + gosu \ + libhttp-parser-dev \ + libjson-c-dev \ + libjwt-dev \ + libmunge-dev \ + munge \ + && rm -rf /var/lib/apt/lists/* + +# Install Slurm 23.11 from source (slurmrestd + JWT auth + NVML GRES) +ARG SLURM_VERSION=23.11.10 +RUN cd /tmp && \ + wget -q https://download.schedmd.com/slurm/slurm-${SLURM_VERSION}.tar.bz2 && \ + tar xjf slurm-${SLURM_VERSION}.tar.bz2 && \ + cd slurm-${SLURM_VERSION} && \ + ./configure \ + --prefix=/usr \ + --sysconfdir=/etc/slurm \ + --with-jwt \ + --with-http-parser \ + --with-json \ + --with-nvml \ + --enable-slurmrestd && \ + make -j"$(nproc)" && \ + make install && \ + rm -rf /tmp/slurm-* + +# Create required directories and users +RUN useradd -r -s /sbin/nologin slurm 2>/dev/null || true && \ + mkdir -p /etc/slurm /var/spool/slurmctld /var/spool/slurmd /var/log/slurm && \ + chown slurm:slurm /var/spool/slurmctld /var/spool/slurmd /var/log/slurm + +# Slurm config +COPY slurm.conf /etc/slurm/slurm.conf +COPY gres.conf /etc/slurm/gres.conf +COPY cgroup.conf /etc/slurm/cgroup.conf + +# JWT key for REST API auth +RUN dd if=/dev/urandom bs=32 count=1 2>/dev/null | base64 > /etc/slurm/jwt_hs256.key && \ + chown slurm:slurm /etc/slurm/jwt_hs256.key && \ + chmod 0600 /etc/slurm/jwt_hs256.key + +WORKDIR /flowsim +CMD ["bash"] diff --git a/tests/integration/infra/slurm.conf b/tests/integration/infra/slurm.conf new file mode 100644 index 0000000..ea7611b --- /dev/null +++ b/tests/integration/infra/slurm.conf @@ -0,0 +1,51 @@ +# slurm.conf — minimal single-node cluster for FlowSim testing +# +# Controller: slurmctld +# Compute: slurmd-0 (1 GPU) +# REST API: not provisioned in this test configuration + +ClusterName=flowsim +SlurmctldHost=slurmctld + +# Auth +AuthType=auth/munge +AuthAltTypes=auth/jwt +AuthAltParameters=jwt_key=/etc/slurm/jwt_hs256.key + +# Paths +SlurmctldPidFile=/var/run/slurmctld.pid +SlurmdPidFile=/var/run/slurmd.pid +StateSaveLocation=/var/spool/slurmctld +SlurmdSpoolDir=/var/spool/slurmd +SlurmctldLogFile=/var/log/slurm/slurmctld.log +SlurmdLogFile=/var/log/slurm/slurmd.log + +# Scheduling +SchedulerType=sched/backfill +SelectType=select/cons_tres +SelectTypeParameters=CR_Core_Memory + +# Accounting (disabled — no slurmdbd in test cluster) +JobAcctGatherType=jobacct_gather/none + +# Task management — disable cgroups (not available in containers) +TaskPlugin=task/none +ProctrackType=proctrack/linuxproc +JobContainerType=job_container/none + +# Timeouts +SlurmctldTimeout=30 +SlurmdTimeout=30 +InactiveLimit=0 +MinJobAge=300 +KillWait=30 +Waittime=0 + +# GRES (GPU) auto-detection +GresTypes=gpu + +# Partitions — single compute node for testing +PartitionName=normal Nodes=slurmd-0 Default=YES MaxTime=INFINITE State=UP + +# Node definition — 1 GPU (CPUs/memory match hardware) +NodeName=slurmd-0 CPUs=112 RealMemory=128000 Gres=gpu:1 State=UNKNOWN diff --git a/tests/integration/test_scheduler.py b/tests/integration/test_scheduler.py new file mode 100644 index 0000000..7ecaf9b --- /dev/null +++ b/tests/integration/test_scheduler.py @@ -0,0 +1,830 @@ +"""Integration tests for all FlowSim scheduler backends. + +How It Works +------------ +Each test class exercises one scheduler backend end-to-end through the +``flowsim`` CLI (the same commands a user would run). The flow is: + +1. ``flowsim submit`` — submit a ``--collect all`` profiling job. +2. ``flowsim list`` — verify the job appears in the listing. +3. ``flowsim status`` — poll until Completed / Succeeded (up to 20 min). +4. Validate outputs on the host file system. + +Infrastructure is auto-provisioned by session-scoped fixtures: + +* **Local** — uses Docker on the host directly (no extra infra). +* **K8s** — spins up a Kind cluster via ``dev-setup.sh kind``. +* **Slurm** — spins up a docker-compose Slurm cluster via + ``dev-setup.sh slurm`` (slurmctld + slurmd-0 with GPU 0). + +Pass Criteria +------------- +* Job reaches Completed/Succeeded within the timeout. +* Stage-separated trace files exist (EXTEND + DECODE ``.trace.json.gz``). +* Parsed CSVs exist under ``parsed/`` with non-zero rows. +* GEMM kernels: EXTEND ``dim0 == bs * input_len``, DECODE ``dim0 == bs``. +* FlashAttn kernels: EXTEND dims contain ``[bs, input_len + existing_ctx]`` (±1). +* ``analysis_extend.json`` and ``analysis_decode.json`` are valid JSON. +* After ``--collect shapes``, ``Dims`` column is present in merged CSVs. +* Sweep jobs produce per-point subdirs + ``sweep_summary.json``. +* Log files (stdout/stderr) exist under ``logs/``. + +Requirements +------------ +* Docker with ``flowsim-image:latest`` built. +* GPU-equipped host machine. +* ``tests/integration/infra/dev-setup.sh`` available. + +Environment Variables +--------------------- +``MODEL`` + Model path (default: ``workload/models/configs/Qwen3-235B-A22B``). +``LOAD_FORMAT`` + Load format (default: ``dummy``). + +Usage +----- + # All scheduler tests: + python -m pytest tests/integration/test_scheduler.py -v -x + + # Single backend: + python -m pytest tests/integration/test_scheduler.py -v -x -k "local" + python -m pytest tests/integration/test_scheduler.py -v -x -k "k8s" + python -m pytest tests/integration/test_scheduler.py -v -x -k "slurm" +""" + +import ast +import csv +import glob +import json +import os +import subprocess +import sys +import tempfile +import time + +import pytest + +from schedulers.base import JobResult, ProfileJobSpec +from schedulers.local import LocalScheduler + +_PROJECT_ROOT = os.path.abspath( + os.path.join(os.path.dirname(__file__), "..", "..") +) +_DEV_SETUP = os.path.join( + _PROJECT_ROOT, "tests", "integration", "infra", "dev-setup.sh" +) +_DEV_TEARDOWN = os.path.join( + _PROJECT_ROOT, "tests", "integration", "infra", "dev-teardown.sh" +) + +MODEL = os.environ.get("MODEL", "workload/models/configs/Qwen3-235B-A22B") +LOAD_FORMAT = os.environ.get("LOAD_FORMAT", "dummy") + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + + +def _flowsim_cli( + *args: str, timeout: int = 1200 +) -> subprocess.CompletedProcess: + """Run a ``flowsim`` subcommand via Python entry point.""" + cmd = [ + sys.executable, + "-u", + "-c", + "from scripts.cli import main; main()", + *args, + ] + env = os.environ.copy() + env["PYTHONPATH"] = _PROJECT_ROOT + (":" + env.get("PYTHONPATH", "")) + env["PYTHONUNBUFFERED"] = "1" + return subprocess.run( + cmd, + capture_output=True, + text=True, + cwd=_PROJECT_ROOT, + env=env, + timeout=timeout, + ) + + +def _assert_traces(output_dir: str) -> None: + """Assert EXTEND + DECODE traces and parsed CSVs exist.""" + traces = glob.glob( + os.path.join(output_dir, "**/*.trace.json.gz"), recursive=True + ) + assert len(traces) > 0, f"No trace files under {output_dir}" + extend = [t for t in traces if "EXTEND" in os.path.basename(t)] + decode = [t for t in traces if "DECODE" in os.path.basename(t)] + assert len(extend) > 0, "No EXTEND traces" + assert len(decode) > 0, "No DECODE traces" + + csvs = glob.glob( + os.path.join(output_dir, "**/parsed/*.csv"), recursive=True + ) + assert len(csvs) > 0, f"No parsed CSVs under {output_dir}" + # At least EXTEND should be parsed; DECODE CSV may be absent for short sequences + extend_csvs = [c for c in csvs if "EXTEND" in os.path.basename(c)] + assert len(extend_csvs) > 0, "No EXTEND parsed CSVs" + + +def _assert_logs(output_dir: str) -> None: + """Assert server log files exist under {output_dir}/logs/.""" + log_dir = os.path.join(output_dir, "logs") + assert os.path.isdir(log_dir), f"Log directory not found: {log_dir}" + log_files = os.listdir(log_dir) + assert len(log_files) > 0, f"No log files in {log_dir}" + stdout_logs = [f for f in log_files if f.endswith(".stdout.log")] + stderr_logs = [f for f in log_files if f.endswith(".stderr.log")] + assert len(stdout_logs) > 0, f"No stdout logs in {log_dir}" + assert len(stderr_logs) > 0, f"No stderr logs in {log_dir}" + # At least one log should be non-empty + sizes = [os.path.getsize(os.path.join(log_dir, f)) for f in stdout_logs] + assert max(sizes) > 0, "All stdout logs are empty" + + +# --------------------------------------------------------------------------- +# Shape validation helpers (same logic as test_stage_profile_configs.py) +# --------------------------------------------------------------------------- +def _read_csv(path): + with open(path, newline="") as f: + return list(csv.DictReader(f)) + + +_GEMM_NAME_PATTERNS = ("nvjet", "cublasLt", "cublas_", "cutlass_gemm") + + +def _first_matmul_dim0(rows): + """Return dim0 of the first GEMM kernel (the M dimension).""" + for row in rows: + if row.get("op", "") == "matmul": + dims = ast.literal_eval(row["Dims"]) + return dims[0][0] + for row in rows: + name = row["Name"] + dims_str = row.get("Dims", "N/A") + if dims_str == "N/A" or not dims_str: + continue + if any(pat in name for pat in _GEMM_NAME_PATTERNS): + dims = ast.literal_eval(dims_str) + if len(dims) >= 2 and len(dims[0]) == 2 and len(dims[1]) == 2: + return dims[0][0] + return None + + +def _attention_seqlen_pair(rows, bs, seq_len): + """Check that [bs, seq_len] (or +1) appears in FlashAttn dims.""" + for row in rows: + name = row["Name"] + if "FlashAttn" not in name: + continue + if "Combine" in name or "prepare" in name: + continue + dims = ast.literal_eval(row["Dims"]) + for d in dims: + if ( + isinstance(d, list) + and len(d) == 2 + and d[0] == bs + and d[1] in (seq_len, seq_len + 1) + ): + return d + return None + return None + + +def _validate_shapes(output_dir, bs, input_len, existing_ctx): + """Validate GEMM dim0 and FlashAttn seqlen in merged/shape_parsed CSVs.""" + tag = f"bs{bs}_input{input_len}_ctx{existing_ctx}" + for csv_subdir in ("merged", "shape_parsed"): + extend_csvs = sorted( + glob.glob( + os.path.join(output_dir, tag, csv_subdir, "*TP-0*EXTEND*.csv") + ) + ) + decode_csvs = sorted( + glob.glob( + os.path.join(output_dir, tag, csv_subdir, "*TP-0*DECODE*.csv") + ) + ) + if extend_csvs and decode_csvs: + break + else: + pytest.fail( + f"No EXTEND+DECODE CSVs for TP-0 in {output_dir}/{tag}/{{merged,shape_parsed}}/" + ) + + extend_rows = _read_csv(extend_csvs[0]) + decode_rows = _read_csv(decode_csvs[0]) + + # EXTEND first GEMM dim0 == bs * input_len + ext_gemm_dim0 = _first_matmul_dim0(extend_rows) + assert ext_gemm_dim0 is not None, "No matmul kernel found in EXTEND CSV" + expected_ext = bs * input_len + assert ( + ext_gemm_dim0 == expected_ext + ), f"EXTEND first GEMM dim0={ext_gemm_dim0}, expected bs*input_len={expected_ext}" + + # EXTEND FlashAttn dims contain [bs, seq_len] + seq_len = input_len + existing_ctx + attn_pair = _attention_seqlen_pair(extend_rows, bs, seq_len) + assert ( + attn_pair is not None + ), f"No FlashAttention dim matching [bs={bs}, seqlen={seq_len}(+1)] in EXTEND CSV" + + # DECODE first GEMM dim0 == bs + dec_gemm_dim0 = _first_matmul_dim0(decode_rows) + assert dec_gemm_dim0 is not None, "No matmul kernel found in DECODE CSV" + assert ( + dec_gemm_dim0 == bs + ), f"DECODE first GEMM dim0={dec_gemm_dim0}, expected bs={bs}" + + +# ===================================================================== +# LOCAL SCHEDULER — real profiling (4-step flow) +# ===================================================================== +class TestLocalScheduler: + """Run real profiling via ``flowsim`` CLI on the local Docker scheduler. + + Flow per test point: + 1. ``flowsim submit`` — submit the job (collect all) + 2. ``flowsim list`` — verify the job appears + 3. ``flowsim status`` — poll until Completed + 4. Validate trace CSVs — GEMM dim0, FlashAttn seqlen for EXTEND & DECODE + """ + + _TP1_POINTS = [ + {"bs": 1, "input_len": 2048, "existing_ctx": 0, "decode_tokens": 2}, + {"bs": 1, "input_len": 2048, "existing_ctx": 2048, "decode_tokens": 2}, + ] + + @pytest.mark.parametrize( + "point", + _TP1_POINTS, + ids=[ + f"bs{p['bs']}_il{p['input_len']}_ctx{p['existing_ctx']}" + for p in _TP1_POINTS + ], + ) + def test_local_tp1_all(self, point): + bs = point["bs"] + input_len = point["input_len"] + existing_ctx = point["existing_ctx"] + decode_tokens = point["decode_tokens"] + + # ── Step 1: submit ── + r = _flowsim_cli( + "submit", + "--scheduler", + "local", + "--collect", + "all", + "--model-path", + MODEL, + "--tp", + "1", + "--bs", + str(bs), + "--input-len", + str(input_len), + "--existing-ctx", + str(existing_ctx), + "--decode-tokens", + str(decode_tokens), + "--warmup-n", + "2", + "--gpus", + "1", + "--local-gpus", + "0", + "--extra-server-opts", + f"--load-format {LOAD_FORMAT}", + ) + if r.returncode != 0: + print("STDOUT:", r.stdout[-3000:]) + print("STDERR:", r.stderr[-3000:]) + assert r.returncode == 0, f"flowsim submit failed (exit {r.returncode})" + + # Extract job_id from output (line like "flowsim-all-... completed successfully") + combined = r.stdout + r.stderr + job_id = None + for line in combined.splitlines(): + if "flowsim-all-" in line: + for word in line.split(): + if word.startswith("flowsim-all-"): + job_id = word.rstrip(".,;:") + break + if job_id: + break + assert ( + job_id + ), f"Could not find job_id in submit output:\n{combined[-1000:]}" + + # ── Step 2: list — verify job appears ── + r_list = _flowsim_cli("list", "--scheduler", "local") + assert r_list.returncode == 0, "flowsim list failed" + assert ( + job_id in r_list.stdout + ), f"Job {job_id} not found in list output:\n{r_list.stdout}" + + # ── Step 3: status — should be Completed (submit is synchronous) ── + r_status = _flowsim_cli( + "status", "--scheduler", "local", "--job", job_id + ) + assert r_status.returncode == 0, "flowsim status failed" + status_out = r_status.stdout.lower() + assert ( + "completed" in status_out + ), f"Job {job_id} not completed:\n{r_status.stdout}" + + # ── Step 4: validate trace CSVs ── + # Extract output_dir from status output (Traces dir: ...) + output_dir = None + for line in r_status.stdout.splitlines(): + if "Traces dir:" in line: + output_dir = line.split("Traces dir:", 1)[1].strip() + break + assert output_dir and os.path.isdir( + output_dir + ), f"Could not find traces dir in status output:\n{r_status.stdout}" + _assert_traces(output_dir) + _assert_logs(output_dir) + _validate_shapes( + output_dir, bs=bs, input_len=input_len, existing_ctx=existing_ctx + ) + + +# ===================================================================== +# Cluster setup helpers & fixtures +# ===================================================================== + + +def _run_dev_setup(target: str) -> None: + """Run ``tests/integration/infra/dev-setup.sh `` and assert success.""" + r = subprocess.run( + ["bash", _DEV_SETUP, target], + capture_output=True, + text=True, + cwd=_PROJECT_ROOT, + timeout=300, + ) + if r.returncode != 0: + raise RuntimeError( + f"dev-setup.sh {target} failed (exit {r.returncode}):\n" + f"stdout: {r.stdout[-2000:]}\nstderr: {r.stderr[-2000:]}" + ) + + +def _run_dev_teardown(target: str) -> None: + """Run ``tests/integration/infra/dev-teardown.sh ``.""" + subprocess.run( + ["bash", _DEV_TEARDOWN, target], + capture_output=True, + text=True, + cwd=_PROJECT_ROOT, + timeout=120, + ) + + +def _kind_cluster_running() -> bool: + """Check if the Kind cluster named 'flowsim' is reachable.""" + try: + r = subprocess.run( + ["kubectl", "--context", "kind-flowsim", "get", "nodes"], + capture_output=True, + text=True, + timeout=15, + ) + return r.returncode == 0 and "Ready" in r.stdout + except Exception: + return False + + +@pytest.fixture(scope="session") +def kind_cluster(): + """Ensure Kind cluster is running; auto-setup if needed. + + The cluster is kept alive after the test session to avoid + re-loading the 34 GB image every time. Use ``dev-teardown.sh kind`` + to clean up manually. + """ + if not _kind_cluster_running(): + _run_dev_setup("kind") + assert _kind_cluster_running(), "Kind cluster not reachable after setup" + yield + + +@pytest.fixture(scope="session") +def slurm_cluster(): + """Ensure Slurm cluster is running; auto-setup if needed. + + Cluster is kept alive after tests. Use ``dev-teardown.sh slurm`` + to clean up manually. + """ + if not _slurm_cluster_running(): + _run_dev_setup("slurm") + assert _slurm_cluster_running(), "Slurm cluster not reachable after setup" + yield + + +# ===================================================================== +# K8S SCHEDULER +# ===================================================================== +class TestK8sScheduler: + """K8s scheduler: real submit to Kind cluster. + + Automatically sets up the Kind cluster via ``dev-setup.sh`` if not + already running. + """ + + def test_k8s_real_submit_to_kind(self, kind_cluster): + """Submit a real Job to Kind cluster: submit → list → status → retrieve → validate.""" + import shutil + import tempfile + + job_name = f"test-integ-{int(time.time()) % 100000}" + local_traces = tempfile.mkdtemp(prefix="flowsim-k8s-traces-") + + try: + # ── Step 0: clean stale test traces on host ── + host_traces = os.path.join(_PROJECT_ROOT, "stage_traces") + os.makedirs(host_traces, exist_ok=True) + + # ── Step 1: submit (host mount for trace retrieval) ── + r = _flowsim_cli( + "submit", + "--scheduler", + "k8s", + "--collect", + "all", + "--model-path", + MODEL, + "--tp", + "1", + "--bs", + "1", + "--input-len", + "2048", + "--existing-ctx", + "0", + "--decode-tokens", + "2", + "--warmup-n", + "2", + "--gpus", + "1", + "--k8s-namespace", + "default", + "--k8s-host-output-dir", + "/host-stage-traces", + "--job-name", + job_name, + "--extra-server-opts", + f"--load-format {LOAD_FORMAT}", + ) + combined = r.stdout + r.stderr + if r.returncode != 0: + print("Submit output:", combined[-3000:]) + assert r.returncode == 0, f"K8s submit failed: {combined[-1000:]}" + + # ── Step 2: list — verify job appears ── + r_list = _flowsim_cli("list", "--scheduler", "k8s") + assert r_list.returncode == 0 + assert ( + job_name in r_list.stdout + ), f"Job {job_name} not in list:\n{r_list.stdout}" + + # ── Step 3: status — poll until Completed/Succeeded (max 20 min) ── + deadline = time.time() + 1200 + state = "" + while time.time() < deadline: + r_status = _flowsim_cli( + "status", "--scheduler", "k8s", "--job", job_name + ) + assert r_status.returncode == 0 + state = r_status.stdout.lower() + if "completed" in state or "succeeded" in state: + break + if "failed" in state: + pytest.fail(f"K8s job failed:\n{r_status.stdout}") + time.sleep(15) + assert ( + "completed" in state or "succeeded" in state + ), f"K8s job did not complete in time:\n{r_status.stdout}" + + # ── Step 4: traces are on host via Kind mount ── + # output_dir inside container: /flowsim/stage_traces/k8s/{ts} + # host_output_dir on worker: /host-stage-traces + # → host: {project}/stage_traces/k8s/{ts}/ + k8s_traces = os.path.join(host_traces, "k8s") + assert os.path.isdir( + k8s_traces + ), f"No k8s traces dir at {k8s_traces}" + # Find the latest timestamped subdir + ts_dirs = sorted(os.listdir(k8s_traces)) + assert ts_dirs, f"No timestamp dirs in {k8s_traces}" + local_traces = os.path.join(k8s_traces, ts_dirs[-1]) + + # ── Step 5: validate trace CSVs ── + _assert_traces(local_traces) + _assert_logs(local_traces) + _validate_shapes(local_traces, bs=1, input_len=2048, existing_ctx=0) + + finally: + # Cleanup: cancel job (traces stay on host for inspection) + _flowsim_cli("cancel", "--scheduler", "k8s", "--job", job_name) + + +# ===================================================================== +# SLURM SCHEDULER +# ===================================================================== + + +def _slurm_cluster_running() -> bool: + """Check if local Slurm test cluster (docker compose) is running.""" + try: + r = subprocess.run( + ["docker", "exec", "slurmctld", "sinfo", "-h"], + capture_output=True, + text=True, + timeout=10, + ) + return r.returncode == 0 and r.stdout.strip() != "" + except Exception: + return False + + +# CLI prefix for running Slurm commands inside the slurmctld container. +# Uses -i so sbatch can read scripts from stdin. +_SLURM_CLI_PREFIX = "docker exec -i slurmctld" + + +class TestSlurmScheduler: + """Slurm scheduler: real submit to local docker-compose cluster. + + Uses CLI mode (sbatch/squeue/scancel) — no slurmrestd needed. + Automatically sets up the Slurm cluster via ``dev-setup.sh slurm`` + if not already running. + """ + + def test_slurm_real_submit(self, slurm_cluster): + """Submit to local Slurm cluster: submit → list → status → retrieve → validate.""" + + # Compute node has /flowsim/stage_traces mounted writable to host. + # output_dir inside the container maps directly to the host. + host_traces = os.path.join(_PROJECT_ROOT, "stage_traces") + os.makedirs(host_traces, exist_ok=True) + ts = time.strftime("%Y%m%d_%H%M%S") + output_dir = f"/flowsim/stage_traces/slurm/{ts}" + + job_id = None + try: + # ── Step 1: submit (CLI mode, container_runtime=none) ── + r = _flowsim_cli( + "submit", + "--scheduler", + "slurm", + "--collect", + "all", + "--model-path", + MODEL, + "--tp", + "1", + "--bs", + "1", + "--input-len", + "2048", + "--existing-ctx", + "0", + "--decode-tokens", + "2", + "--warmup-n", + "2", + "--gpus", + "1", + "--slurm-partition", + "normal", + "--slurm-cli-prefix", + _SLURM_CLI_PREFIX, + "--slurm-container-runtime", + "none", + "--output-dir", + output_dir, + "--extra-server-opts", + f"--load-format {LOAD_FORMAT}", + ) + combined = r.stdout + r.stderr + if r.returncode != 0: + print("Submit output:", combined[-3000:]) + assert r.returncode == 0, f"Slurm submit failed: {combined[-1000:]}" + + # Extract job_id from output (line like "Submitted batch job 123") + for line in combined.splitlines(): + if "submitted" in line.lower(): + for word in line.split(): + if word.isdigit(): + job_id = word + break + if job_id: + break + assert ( + job_id + ), f"Could not find job_id in submit output:\n{combined[-1000:]}" + + # ── Step 2: status — poll until Completed (max 20 min) ── + deadline = time.time() + 1200 + state = "" + while time.time() < deadline: + r_status = _flowsim_cli( + "status", + "--scheduler", + "slurm", + "--job", + job_id, + "--slurm-cli-prefix", + _SLURM_CLI_PREFIX, + ) + assert r_status.returncode == 0 + state = r_status.stdout.lower() + if "completed" in state or "succeeded" in state: + break + if "failed" in state: + pytest.fail(f"Slurm job failed:\n{r_status.stdout}") + time.sleep(15) + assert ( + "completed" in state or "succeeded" in state + ), f"Slurm job did not complete in time:\n{r_status.stdout}" + + # ── Step 3: traces are on host via mount ── + slurm_traces = os.path.join(host_traces, "slurm") + assert os.path.isdir( + slurm_traces + ), f"No slurm traces dir at {slurm_traces}" + ts_dirs = sorted(os.listdir(slurm_traces)) + assert ts_dirs, f"No test dirs in {slurm_traces}" + local_traces = os.path.join(slurm_traces, ts_dirs[-1]) + + # ── Step 4: validate trace CSVs ── + _assert_traces(local_traces) + _assert_logs(local_traces) + _validate_shapes(local_traces, bs=1, input_len=2048, existing_ctx=0) + + finally: + # Cleanup: cancel job (traces stay on host for inspection) + if job_id: + _flowsim_cli( + "cancel", + "--scheduler", + "slurm", + "--job", + job_id, + "--slurm-cli-prefix", + _SLURM_CLI_PREFIX, + ) + + +# ===================================================================== +# SWEEP — multi-point profiling in a single job +# ===================================================================== + +# Three lightweight points: different (bs, input_len, existing_ctx) +_SWEEP_POINTS = [ + (1, 2048, 0), + (1, 4096, 0), + (1, 2048, 2048), +] + + +def _assert_sweep_output( + host_output_dir: str, points: list[tuple[int, int, int]] +) -> None: + """Validate that every sweep point produced traces and parsed CSVs.""" + for bs, il, ctx in points: + tag = f"bs{bs}_input{il}_ctx{ctx}" + point_dir = os.path.join(host_output_dir, tag) + assert os.path.isdir(point_dir), f"Missing sweep point dir: {point_dir}" + _assert_traces(point_dir) + + # sweep_summary.json should exist at the root + summary_path = os.path.join(host_output_dir, "sweep_summary.json") + assert os.path.isfile(summary_path), f"Missing {summary_path}" + with open(summary_path) as f: + summary = json.load(f) + assert len(summary) == len( + points + ), f"Expected {len(points)} entries in sweep_summary.json, got {len(summary)}" + for entry in summary: + assert entry["traces"] > 0, f"Point {entry} has 0 traces" + + +class TestLocalSweep: + """Multi-point sweep via ``--sweep`` and ``--sweep-file`` on local scheduler. + + Validates that one job profiles all requested points and produces + correct directory structure, traces, and sweep_summary.json. + """ + + def test_sweep_inline(self): + """Submit a 3-point sweep using inline --sweep tuples.""" + sweep_args = [f"{bs}:{il}:{ctx}" for bs, il, ctx in _SWEEP_POINTS] + + r = _flowsim_cli( + "submit", + "--scheduler", + "local", + "--collect", + "perf", + "--model-path", + MODEL, + "--tp", + "1", + "--decode-tokens", + "2", + "--warmup-n", + "2", + "--gpus", + "1", + "--local-gpus", + "0", + "--extra-server-opts", + f"--load-format {LOAD_FORMAT}", + "--sweep", + *sweep_args, + ) + combined = r.stdout + r.stderr + if r.returncode != 0: + print("STDOUT:", r.stdout[-3000:]) + print("STDERR:", r.stderr[-3000:]) + assert r.returncode == 0, f"sweep submit failed (exit {r.returncode})" + + # Find host output dir from submit output + output_dir = None + for line in combined.splitlines(): + if "Traces:" in line: + output_dir = line.split("Traces:", 1)[1].strip() + break + assert output_dir and os.path.isdir( + output_dir + ), f"Could not find traces dir in output:\n{combined[-1000:]}" + + _assert_sweep_output(output_dir, _SWEEP_POINTS) + _assert_logs(output_dir) + + def test_sweep_file(self): + """Submit a 3-point sweep reading points from a file.""" + with tempfile.NamedTemporaryFile( + mode="w", suffix=".txt", delete=False, prefix="sweep_" + ) as f: + f.write("# bs:input_len:existing_ctx\n") + for bs, il, ctx in _SWEEP_POINTS: + f.write(f"{bs}:{il}:{ctx}\n") + sweep_file = f.name + + try: + r = _flowsim_cli( + "submit", + "--scheduler", + "local", + "--collect", + "perf", + "--model-path", + MODEL, + "--tp", + "1", + "--decode-tokens", + "2", + "--warmup-n", + "2", + "--gpus", + "1", + "--local-gpus", + "0", + "--extra-server-opts", + f"--load-format {LOAD_FORMAT}", + "--sweep-file", + sweep_file, + ) + combined = r.stdout + r.stderr + if r.returncode != 0: + print("STDOUT:", r.stdout[-3000:]) + print("STDERR:", r.stderr[-3000:]) + assert ( + r.returncode == 0 + ), f"sweep-file submit failed (exit {r.returncode})" + + # Find host output dir from submit output + output_dir = None + for line in combined.splitlines(): + if "Traces:" in line: + output_dir = line.split("Traces:", 1)[1].strip() + break + assert output_dir and os.path.isdir( + output_dir + ), f"Could not find traces dir in output:\n{combined[-1000:]}" + + _assert_sweep_output(output_dir, _SWEEP_POINTS) + _assert_logs(output_dir) + finally: + os.unlink(sweep_file) diff --git a/tests/unit/test_scheduler_cli.py b/tests/unit/test_scheduler_cli.py new file mode 100644 index 0000000..9f9c5ab --- /dev/null +++ b/tests/unit/test_scheduler_cli.py @@ -0,0 +1,513 @@ +"""Unit tests for the scheduler CLI (flowsim init / submit) and backends.""" + +from __future__ import annotations + +import os +import tempfile +from pathlib import Path +from unittest import mock + +import pytest +import yaml + +from schedulers.base import ProfileJobSpec +from schedulers.k8s import K8sScheduler +from schedulers.local import LocalScheduler +from schedulers.slurm import SlurmScheduler + +# ========================================================================= +# ProfileJobSpec +# ========================================================================= + + +class TestProfileJobSpec: + """Tests for ProfileJobSpec dataclass methods.""" + + @pytest.fixture() + def spec(self) -> ProfileJobSpec: + return ProfileJobSpec( + collect="perf", + model_path="Qwen/Qwen3-8B", + tp=2, + bs=4, + input_len=1024, + ) + + def test_default_job_name(self, spec: ProfileJobSpec): + name = spec.default_job_name() + assert name.startswith("flowsim-perf-qwen3-8b-bs4-il1024-") + + def test_custom_job_name(self, spec: ProfileJobSpec): + spec.job_name = "my-job" + assert spec.default_job_name() == "my-job" + + def test_build_server_opts_basic(self, spec: ProfileJobSpec): + opts = spec.build_server_opts() + assert "--model-path Qwen/Qwen3-8B" in opts + assert "--tp 2" in opts + + def test_build_server_opts_dp(self, spec: ProfileJobSpec): + spec.dp = 4 + assert "--dp 4" in spec.build_server_opts() + + def test_build_server_opts_extra(self, spec: ProfileJobSpec): + spec.extra_server_opts = "--some-flag" + assert "--some-flag" in spec.build_server_opts() + + def test_build_profile_command(self, spec: ProfileJobSpec): + cmd = spec.build_profile_command() + assert cmd[0] == "python3" + assert "scripts/run_stage_profile.py" in cmd[1] + assert "--collect" in cmd + assert "perf" in cmd + assert "--bs" in cmd + assert "4" in cmd + + def test_build_shell_command_quotes_server_opts(self, spec: ProfileJobSpec): + shell = spec.build_shell_command() + # server-opts contains spaces, must be quoted + assert "--server-opts '" in shell or '--server-opts "' in shell + + +# ========================================================================= +# K8sScheduler.render +# ========================================================================= + + +class TestK8sScheduler: + """Tests for K8s Job manifest generation.""" + + @pytest.fixture() + def scheduler(self) -> K8sScheduler: + return K8sScheduler( + namespace="ml-team", + kubeconfig="/fake/kubeconfig", + context="prod", + shm_size="32Gi", + ) + + @pytest.fixture() + def spec(self) -> ProfileJobSpec: + return ProfileJobSpec( + collect="perf", + model_path="Qwen/Qwen3-8B", + gpus=2, + ) + + def test_render_valid_yaml(self, scheduler, spec): + rendered = scheduler.render(spec) + doc = yaml.safe_load(rendered) + assert doc["apiVersion"] == "batch/v1" + assert doc["kind"] == "Job" + + def test_render_namespace(self, scheduler, spec): + doc = yaml.safe_load(scheduler.render(spec)) + assert doc["metadata"]["namespace"] == "ml-team" + + def test_render_gpu_resources(self, scheduler, spec): + doc = yaml.safe_load(scheduler.render(spec)) + container = doc["spec"]["template"]["spec"]["containers"][0] + assert container["resources"]["limits"]["nvidia.com/gpu"] == "2" + + def test_render_shm_size(self, scheduler, spec): + doc = yaml.safe_load(scheduler.render(spec)) + volumes = doc["spec"]["template"]["spec"]["volumes"] + dshm = [v for v in volumes if v["name"] == "dshm"][0] + assert dshm["emptyDir"]["sizeLimit"] == "32Gi" + + def test_render_pvc_volume(self, spec): + sched = K8sScheduler(namespace="default", pvc_name="my-pvc") + doc = yaml.safe_load(sched.render(spec)) + volumes = doc["spec"]["template"]["spec"]["volumes"] + pvc_vol = [v for v in volumes if v["name"] == "output"] + assert len(pvc_vol) == 1 + assert pvc_vol[0]["persistentVolumeClaim"]["claimName"] == "my-pvc" + + def test_render_host_output_dir(self, spec): + sched = K8sScheduler(namespace="default", host_output_dir="/data/out") + doc = yaml.safe_load(sched.render(spec)) + volumes = doc["spec"]["template"]["spec"]["volumes"] + host_vol = [v for v in volumes if v["name"] == "output"] + assert len(host_vol) == 1 + assert host_vol[0]["hostPath"]["path"] == "/data/out" + + def test_render_node_selector(self, spec): + sched = K8sScheduler(namespace="default", node_selector={"gpu": "h100"}) + doc = yaml.safe_load(sched.render(spec)) + pod_spec = doc["spec"]["template"]["spec"] + assert pod_spec["nodeSelector"]["gpu"] == "h100" + + def test_render_service_account(self, spec): + sched = K8sScheduler(namespace="default", service_account="runner") + doc = yaml.safe_load(sched.render(spec)) + pod_spec = doc["spec"]["template"]["spec"] + assert pod_spec["serviceAccountName"] == "runner" + + def test_render_labels(self, scheduler, spec): + doc = yaml.safe_load(scheduler.render(spec)) + labels = doc["metadata"]["labels"] + assert labels["app"] == "flowsim" + assert labels["collect"] == "perf" + + +# ========================================================================= +# SlurmScheduler.render +# ========================================================================= + + +class TestSlurmScheduler: + """Tests for Slurm sbatch script generation.""" + + @pytest.fixture() + def scheduler(self) -> SlurmScheduler: + return SlurmScheduler( + partition="gpu-h100", + time_limit="01:00:00", + account="my-proj", + ) + + @pytest.fixture() + def spec(self) -> ProfileJobSpec: + return ProfileJobSpec( + collect="perf", + model_path="Qwen/Qwen3-8B", + gpus=4, + ) + + def test_render_shebang(self, scheduler, spec): + script = scheduler.render(spec) + assert script.startswith("#!/bin/bash\n") + + def test_render_sbatch_directives(self, scheduler, spec): + script = scheduler.render(spec) + assert "#SBATCH --partition=gpu-h100" in script + assert "#SBATCH --gpus-per-node=4" in script + assert "#SBATCH --exclusive" in script + assert "#SBATCH --time=01:00:00" in script + assert "#SBATCH --account=my-proj" in script + + def test_render_env_vars(self, scheduler, spec): + script = scheduler.render(spec) + assert "SGLANG_PROFILE_KERNELS=1" in script + + def test_render_command(self, scheduler, spec): + script = scheduler.render(spec) + assert "scripts/run_stage_profile.py" in script + assert "--collect perf" in script + + def test_render_docker_runtime(self, spec): + sched = SlurmScheduler( + partition="gpu", + container_runtime="docker", + container_mounts="/data:/data", + ) + script = sched.render(spec) + assert "docker run" in script + assert "-v /data:/data" in script + # output_dir is always auto-mounted + assert f"-v {spec.output_dir}:{spec.output_dir}" in script + + def test_render_enroot_runtime(self, spec): + sched = SlurmScheduler( + partition="gpu", + container_runtime="enroot", + ) + script = sched.render(spec) + assert "srun --container-image" in script + # output_dir is always auto-mounted + assert f"{spec.output_dir}:{spec.output_dir}" in script + + def test_render_modules(self, spec): + sched = SlurmScheduler( + partition="gpu", + modules=["cuda/12.6", "anaconda3"], + ) + script = sched.render(spec) + assert "module load cuda/12.6" in script + assert "module load anaconda3" in script + + def test_render_extra_sbatch(self, spec): + sched = SlurmScheduler( + partition="gpu", + extra_sbatch=["--mem=64G", "--exclusive"], + ) + script = sched.render(spec) + assert "#SBATCH --mem=64G" in script + assert "#SBATCH --exclusive" in script + + def test_render_constraint(self, spec): + sched = SlurmScheduler(partition="gpu", constraint="gpu80g") + script = sched.render(spec) + assert "#SBATCH --constraint=gpu80g" in script + + +# ========================================================================= +# LocalScheduler.render +# ========================================================================= + + +class TestLocalScheduler: + """Tests for local execution backend.""" + + @pytest.fixture(autouse=True) + def _skip_image_check(self): + with mock.patch.object(LocalScheduler, "_check_image_exists"): + yield + + @pytest.fixture() + def spec(self) -> ProfileJobSpec: + return ProfileJobSpec( + collect="perf", + model_path="Qwen/Qwen3-8B", + ) + + def test_render_with_gpus(self, spec): + sched = LocalScheduler(gpus="0,1") + output = sched.render(spec) + assert "device=0,1" in output + assert "docker run" in output + + def test_render_without_gpus(self, spec): + sched = LocalScheduler(gpus="") + output = sched.render(spec) + assert "CUDA_VISIBLE_DEVICES" not in output + + def test_render_has_command(self, spec): + sched = LocalScheduler() + output = sched.render(spec) + assert "scripts/run_stage_profile.py" in output + assert "SGLANG_PROFILE_KERNELS=1" in output + + def test_render_workdir(self, spec): + sched = LocalScheduler(workdir="/my/project") + output = sched.render(spec) + # Docker mode: workdir is used for log scanning, not in the docker command + assert "docker run" in output + assert "scripts/run_stage_profile.py" in output + + def test_dry_run_equals_render(self, spec): + sched = LocalScheduler(gpus="0") + assert sched.dry_run(spec) == sched.render(spec) + + +# ========================================================================= +# CLI: flowsim init +# ========================================================================= + + +class TestCLIInit: + """Tests for `flowsim init` subcommand.""" + + def test_init_no_args_shows_help(self, capsys): + from scripts.cli import _cmd_init + + with pytest.raises(SystemExit) as exc_info: + _cmd_init([]) + assert exc_info.value.code != 0 + + def test_init_k8s_creates_template(self, tmp_path: Path): + config_dir = tmp_path / "flowsim" + with mock.patch("scripts.cli._CONFIG_DIR", config_dir): + from scripts.cli import _cmd_init + + rc = _cmd_init(["k8s"]) + assert rc == 0 + cfg_file = config_dir / "k8s.yaml" + assert cfg_file.exists() + content = cfg_file.read_text() + assert "kubeconfig:" in content + assert "namespace:" in content + # Template should have comments + assert content.startswith("#") + # Should be valid YAML + cfg = yaml.safe_load(content) + assert "kubeconfig" in cfg + assert "namespace" in cfg + + def test_init_slurm_creates_template(self, tmp_path: Path): + config_dir = tmp_path / "flowsim" + with mock.patch("scripts.cli._CONFIG_DIR", config_dir): + from scripts.cli import _cmd_init + + rc = _cmd_init(["slurm"]) + assert rc == 0 + cfg_file = config_dir / "slurm.yaml" + assert cfg_file.exists() + content = cfg_file.read_text() + assert "partition:" in content + assert "cli_prefix:" in content + # Template should have comments + assert content.startswith("#") + cfg = yaml.safe_load(content) + assert "partition" in cfg + + def test_init_refuses_overwrite(self, tmp_path: Path): + config_dir = tmp_path / "flowsim" + config_dir.mkdir() + (config_dir / "slurm.yaml").write_text("existing: true\n") + + with mock.patch("scripts.cli._CONFIG_DIR", config_dir): + from scripts.cli import _cmd_init + + rc = _cmd_init(["slurm"]) + assert rc != 0 # should refuse + + def test_init_force_overwrite(self, tmp_path: Path): + config_dir = tmp_path / "flowsim" + config_dir.mkdir() + (config_dir / "slurm.yaml").write_text("existing: true\n") + + with mock.patch("scripts.cli._CONFIG_DIR", config_dir): + from scripts.cli import _cmd_init + + rc = _cmd_init(["slurm", "--force"]) + assert rc == 0 + content = (config_dir / "slurm.yaml").read_text() + assert "partition:" in content + assert "existing" not in content + + def test_init_config_copies_file(self, tmp_path: Path): + # User has an existing config + user_cfg = tmp_path / "my-k8s.yaml" + user_cfg.write_text("namespace: prod\nkubeconfig: /etc/kube\n") + + config_dir = tmp_path / "flowsim" + with mock.patch("scripts.cli._CONFIG_DIR", config_dir): + from scripts.cli import _cmd_init + + rc = _cmd_init(["k8s", "--config", str(user_cfg)]) + assert rc == 0 + installed = config_dir / "k8s.yaml" + assert installed.exists() + cfg = yaml.safe_load(installed.read_text()) + assert cfg["namespace"] == "prod" + + def test_init_config_missing_file(self): + from scripts.cli import _cmd_init + + rc = _cmd_init(["k8s", "--config", "/nonexistent/path.yaml"]) + assert rc != 0 + + +# ========================================================================= +# CLI: flowsim submit (parse/dry-run only, no actual submission) +# ========================================================================= + + +class TestCLISubmit: + """Tests for `flowsim submit` argument parsing and dry-run.""" + + @pytest.fixture(autouse=True) + def _skip_image_check(self): + with mock.patch.object(LocalScheduler, "_check_image_exists"): + yield + + def _run(self, *args: str, expect_ok: bool = True) -> str: + """Run submit via the Python function, capture stdout.""" + from scripts.cli.submit import main as submit_main + import io + from contextlib import redirect_stdout + + buf = io.StringIO() + with redirect_stdout(buf): + submit_main(list(args)) + return buf.getvalue() + + def test_submit_help(self, capsys): + from scripts.cli.submit import main as submit_main + + with pytest.raises(SystemExit) as exc_info: + submit_main(["--help"]) + assert exc_info.value.code == 0 + out = capsys.readouterr().out + assert "--scheduler" in out + assert "local" in out + + def test_submit_missing_required(self): + from scripts.cli.submit import main as submit_main + + with pytest.raises(SystemExit): + submit_main([]) + + def test_submit_local_dry_run(self): + out = self._run( + "--scheduler", + "local", + "--collect", + "perf", + "--model-path", + "Qwen/Qwen3-8B", + "--dry-run", + ) + assert "scripts/run_stage_profile.py" in out + assert "SGLANG_PROFILE_KERNELS=1" in out + + def test_submit_local_dry_run_with_gpus(self): + out = self._run( + "--scheduler", + "local", + "--collect", + "perf", + "--model-path", + "Qwen/Qwen3-8B", + "--local-gpus", + "0,1", + "--dry-run", + ) + assert "device=0,1" in out + + def test_submit_k8s_dry_run(self): + out = self._run( + "--scheduler", + "k8s", + "--collect", + "perf", + "--model-path", + "Qwen/Qwen3-8B", + "--k8s-namespace", + "default", + "--dry-run", + ) + assert "apiVersion: batch/v1" in out + assert "kind: Job" in out + + def test_submit_slurm_dry_run(self): + out = self._run( + "--scheduler", + "slurm", + "--collect", + "perf", + "--model-path", + "Qwen/Qwen3-8B", + "--slurm-partition", + "gpu", + "--dry-run", + ) + assert "#!/bin/bash" in out + assert "#SBATCH --partition=gpu" in out + + +# ========================================================================= +# Config loading +# ========================================================================= + + +class TestConfig: + """Tests for config file loading and saving.""" + + def test_save_and_load_yaml(self, tmp_path: Path): + from schedulers.config import _save_yaml, _load_yaml + + data = {"partition": "gpu", "account": "proj"} + path = tmp_path / "test.yaml" + _save_yaml(path, data) + loaded = _load_yaml(path) + assert loaded == data + + def test_cfg_get(self): + from schedulers.config import cfg_get + + cfg = {"key": "value", "empty": ""} + assert cfg_get(cfg, "key", "default") == "value" + assert cfg_get(cfg, "empty", "default") == "" + assert cfg_get(cfg, "missing", "default") == "default"