From 7e9de04f52a5057f7ac918337aacadc60f4c9367 Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Fri, 20 Mar 2026 03:57:34 +0000 Subject: [PATCH 01/23] Add fleet mode design spec Thin orchestrator approach: fleet profile references existing workload profiles, randomly assigns bad actors, generates N archives via existing ArchiveWriter. No overlay engine needed. --- .../specs/2026-03-20-fleet-mode-design.md | 253 ++++++++++++++++++ 1 file changed, 253 insertions(+) create mode 100644 docs/superpowers/specs/2026-03-20-fleet-mode-design.md diff --git a/docs/superpowers/specs/2026-03-20-fleet-mode-design.md b/docs/superpowers/specs/2026-03-20-fleet-mode-design.md new file mode 100644 index 0000000..c6fd7dc --- /dev/null +++ b/docs/superpowers/specs/2026-03-20-fleet-mode-design.md @@ -0,0 +1,253 @@ +# Fleet Mode — Design Specification + +**Date:** 2026-03-20 +**Status:** Approved + +--- + +## Motivation + +Generate a coherent set of PCP archives representing a multi-host fleet for: + +1. **pmview-nextgen**: 3D visualization of fleet-scale performance data via pmproxy +2. **PCP training**: help people practice finding "bad actor" hosts using PCP tooling +3. **Visualization tool testing**: stress-test tools that consume multiple host archives + +Primary driver: feeding fleet data into [pmview-nextgen](https://github.com/tallpsmith/pmview-nextgen) via a single pmproxy instance serving the archive directory. + +## Approach + +**Thin Orchestrator** — the `fleet` subcommand parses a fleet profile YAML, assigns +hosts to workload profiles (with random bad-actor selection), then calls the existing +`ArchiveWriter` once per host. No overlay engine, no merge logic — bad actors use +complete standalone workload profiles. + +--- + +## 1. Fleet Profile Format + +A fleet profile is a standalone YAML file, distinct from a workload profile. It +references workload profiles by path. + +```yaml +# fleet: web-cluster.yaml +meta: + name: web-cluster + duration: 24h + interval: 15s + hostname_prefix: web # hosts named web-01, web-02, ... web-NN + hardware: generic-large # hardware profile for all hosts + +hosts: + count: 20 # total host count + baseline: profiles/normal-web.yaml + jitter: 0.05 # +/-5% per-host variation on stressor values + +bad_actors: + count: 2 # how many hosts get a bad profile instead + jitter: 0.15 # +/-15% — optional, defaults to hosts.jitter + profiles: + - profiles/cpu-saturated.yaml + - profiles/memory-starved.yaml +``` + +### Fleet-Level Overrides + +Fleet `meta` settings override the corresponding values in individual workload +profiles. The overridden fields are: + +- `duration` +- `interval` +- `hardware` +- `timezone` (defaults to UTC) +- `hostname` (generated from `hostname_prefix` + zero-padded index) + +The workload profile's stressor sections (cpu, memory, disk, network phases) are +used as-is — the fleet only overrides operational parameters. + +**Override warnings:** When a referenced workload profile defines a value that +conflicts with the fleet setting, a warning is emitted (once per unique profile): + +``` +WARNING: workload profile 'profiles/normal-web.yaml' defines duration=3600 — overridden by fleet setting duration=86400 +``` + +### Path Resolution + +Workload profile paths are resolved relative to the fleet profile file's directory. + +--- + +## 2. CLI Interface + +``` +pmlogsynth fleet [OPTIONS] FLEET_PROFILE + +Arguments: + FLEET_PROFILE Path to fleet YAML profile + +Options: + -o, --output-dir PATH Output directory [default: ./generated-archives/fleet-] + --seed INT PRNG seed for reproducible jitter and bad-actor assignment + --jobs INT Parallel generation workers [default: CPU count] + --dry-run Print host->profile assignments without generating + --force Overwrite existing archive files + --validate Validate fleet + referenced profiles, then exit + --start TIMESTAMP Archive start time (same formats as generate) + -v, --verbose Per-host progress output + -C, --config-dir PATH Additional hardware profile directory +``` + +### Dry-Run Output + +``` +Fleet: web-cluster (20 hosts, seed=42) + + web-01 baseline profiles/normal-web.yaml (jitter: x1.03) + web-02 baseline profiles/normal-web.yaml (jitter: x0.97) + ... + web-13 BAD profiles/cpu-saturated.yaml (jitter: x1.01) + ... + web-18 BAD profiles/memory-starved.yaml (jitter: x0.98) + ... +``` + +--- + +## 3. Output Layout & Fleet Manifest + +Flat directory structure — no subdirectories: + +``` +generated-archives/fleet-web-cluster/ +├── web-01.0 +├── web-01.index +├── web-01.meta +│ ... +├── web-20.0 +├── web-20.index +├── web-20.meta +└── fleet.manifest +``` + +### fleet.manifest + +Machine-readable YAML listing every archive, its role, and metadata: + +```yaml +meta: + name: web-cluster + generated: "2026-03-20T09:00:00Z" + pmlogsynth_version: "1.0" + seed: 42 + duration: 86400 + interval: 15 + hardware: generic-large + host_count: 20 + +archives: + - hostname: web-01 + profile: profiles/normal-web.yaml + role: baseline + jitter_factor: 1.03 + + - hostname: web-13 + profile: profiles/cpu-saturated.yaml + role: bad_actor + jitter_factor: 1.01 + + - hostname: web-18 + profile: profiles/memory-starved.yaml + role: bad_actor + jitter_factor: 0.98 +``` + +--- + +## 4. Internal Architecture + +### New modules + +**`pmlogsynth/fleet.py`** — Profile loader and orchestrator: + +- `FleetProfile` dataclass: meta, hosts config, bad actor config +- `load_fleet_profile(path)`: parse YAML, validate, resolve paths, emit override warnings +- `assign_hosts(fleet, seed)`: create `HostAssignment` list with random bad-actor selection +- `generate_fleet(fleet, assignments, output_dir, args)`: loop/parallelize over assignments, + call existing `ArchiveWriter` per host, write `fleet.manifest` + +**`pmlogsynth/jitter.py`** — Per-host stressor variation: + +- `apply_jitter(profile, factor)`: multiply all numeric stressor values by factor, + clamp ratios to [0.0, 1.0] and counters to >= 0. Returns new `WorkloadProfile`. + Pure function, no mutation. + +### Changes to existing modules + +- **`cli.py`**: Wire up `fleet` subparser, replace stub with call to `generate_fleet()` +- **`profile.py`**: No changes +- **`writer.py`**: No changes + +### What we're NOT building + +- No `overlay.py` — no anomaly overlay merge logic +- No new domain models +- No changes to `timeline.py` or `sampler.py` + +--- + +## 5. Jitter Design + +1. Fleet profile specifies `jitter: 0.05` (baseline) and optionally `bad_actors.jitter: 0.15` +2. Per-host PRNG seeded with `hash(global_seed, hostname)` +3. Single jitter factor drawn per host: `Normal(mean=1.0, stddev=jitter)` +4. Every numeric stressor value in every phase multiplied by that factor +5. Post-jitter clamping: ratios to [0.0, 1.0], counters >= 0 +6. Bad actors get their own (potentially higher) jitter applied to their different profile + +Single factor per host (not per field) gives coherent variation — a "slightly busier +box" is busier across the board, not randomly hot on CPU but cold on memory. + +--- + +## 6. Testing Strategy + +### Tier 1 — Unit tests (`tests/unit/test_fleet.py`) + +- Fleet profile parsing: valid loads, missing fields raise `ValidationError` +- Override warnings: conflicting workload profile values emit warnings +- Host assignment: correct count, correct hostnames, deterministic with seed +- Reproducibility: same seed = same assignments and jitter factors +- Bad actor pool selection: random from pool, respects count +- Jitter application: `apply_jitter(profile, 1.05)` multiplies stressor values +- Jitter clamping: ratios [0.0, 1.0], counters >= 0 +- Independent jitter: bad actor stddev differs from baseline stddev +- Profile path resolution: relative to fleet file directory +- Dry-run output: correct mapping, no archives written +- Validate mode: catches broken paths, invalid hardware names + +### Tier 2 — Integration tests (`tests/integration/test_fleet_integration.py`) + +- Full generation with mocked PCP: 3-host fleet, `ArchiveWriter.write()` called 3x +- Fleet manifest written and well-formed +- Override application: fleet-level values used, not workload profile values +- Parallel generation: `--jobs=2` dispatches correctly + +### Tier 3 — E2E tests (`tests/e2e/test_fleet_e2e.py`) + +- Generate 3-host fleet, all archive triplets exist +- `pmlogcheck` passes on every archive +- Seed reproducibility: two `--seed 42` runs produce identical archives +- Manifest roles match actual profile assignments + +Tier 3 auto-skipped if PCP not installed. + +--- + +## Future Enhancements + +- Multiple host groups with different hardware profiles +- Anomaly overlays with time-windowed fault injection +- Per-field jitter (independent variation per stressor dimension) +- Rolling/cascading faults across hosts +- Fleet profile generation via natural language (`--prompt`) From 7a63731239ef752e3ede8d430731201a2ce95209 Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Fri, 20 Mar 2026 04:00:00 +0000 Subject: [PATCH 02/23] Address spec review: reproducibility, override mechanism, Phase 3 relationship Fix PYTHONHASHSEED reproducibility issue (use hashlib not hash()), document per-host WorkloadProfile construction via dataclasses.replace(), enumerate ratio vs throughput fields for jitter clamping, add Phase 3 spec supersession note, clarify --validate incompatibilities. --- .../specs/2026-03-20-fleet-mode-design.md | 47 +++++++++++++++++-- 1 file changed, 44 insertions(+), 3 deletions(-) diff --git a/docs/superpowers/specs/2026-03-20-fleet-mode-design.md b/docs/superpowers/specs/2026-03-20-fleet-mode-design.md index c6fd7dc..aee1196 100644 --- a/docs/superpowers/specs/2026-03-20-fleet-mode-design.md +++ b/docs/superpowers/specs/2026-03-20-fleet-mode-design.md @@ -22,6 +22,16 @@ hosts to workload profiles (with random bad-actor selection), then calls the exi `ArchiveWriter` once per host. No overlay engine, no merge logic — bad actors use complete standalone workload profiles. +### Relationship to Phase 3 Spec + +This design **supersedes** `pmlogsynth-phase3-spec.md` for the initial fleet +implementation. The Phase 3 spec describes a more complex architecture with anomaly +overlays, time-windowed fault injection, and multi-group host definitions. This design +intentionally simplifies to the minimum viable fleet: one host group, complete +standalone profiles for bad actors, no overlay merging. The fleet YAML schema is +different from the Phase 3 spec (`hosts:` + `bad_actors:` vs `groups:` with +`anomalies:`). Future enhancements may reintroduce Phase 3 concepts incrementally. + --- ## 1. Fleet Profile Format @@ -93,6 +103,7 @@ Options: --dry-run Print host->profile assignments without generating --force Overwrite existing archive files --validate Validate fleet + referenced profiles, then exit + (incompatible with --force and --dry-run) --start TIMESTAMP Archive start time (same formats as generate) -v, --verbose Per-host progress output -C, --config-dir PATH Additional hardware profile directory @@ -182,10 +193,26 @@ archives: clamp ratios to [0.0, 1.0] and counters to >= 0. Returns new `WorkloadProfile`. Pure function, no mutation. +### Per-host WorkloadProfile construction + +`generate_fleet` produces a per-host `WorkloadProfile` by: + +1. Loading the workload profile via existing `ProfileLoader.from_file()` +2. Using `dataclasses.replace()` to override `meta.hostname`, `meta.duration`, + `meta.interval`, `meta.timezone` with fleet-level values +3. Passing through `apply_jitter()` to apply the host's jitter factor +4. Handing the resulting `WorkloadProfile` to `ArchiveWriter` as normal + +Hardware profile resolution uses `ProfileResolver` with the fleet-level hardware name. +No changes to `ProfileLoader` or `ArchiveWriter` — all construction happens in `fleet.py`. + ### Changes to existing modules -- **`cli.py`**: Wire up `fleet` subparser, replace stub with call to `generate_fleet()` -- **`profile.py`**: No changes +- **`cli.py`**: Replace the `fleet` stub parser (line 134) with a fully-wired subparser + via a new `_add_fleet_args()` function (mirroring `_add_generate_args()`). Replace + the stub handler with a call to `generate_fleet()`. +- **`profile.py`**: No changes — `WorkloadProfile` is a dataclass, `dataclasses.replace()` + handles per-host construction without modifying the loader. - **`writer.py`**: No changes ### What we're NOT building @@ -199,7 +226,9 @@ archives: ## 5. Jitter Design 1. Fleet profile specifies `jitter: 0.05` (baseline) and optionally `bad_actors.jitter: 0.15` -2. Per-host PRNG seeded with `hash(global_seed, hostname)` +2. Per-host PRNG seeded deterministically: `hashlib.sha256(f'{seed}:{hostname}'.encode())` + truncated to an integer. **Do NOT use Python's built-in `hash()`** — it is randomized + across processes (PYTHONHASHSEED) and would break reproducibility. 3. Single jitter factor drawn per host: `Normal(mean=1.0, stddev=jitter)` 4. Every numeric stressor value in every phase multiplied by that factor 5. Post-jitter clamping: ratios to [0.0, 1.0], counters >= 0 @@ -208,6 +237,18 @@ archives: Single factor per host (not per field) gives coherent variation — a "slightly busier box" is busier across the board, not randomly hot on CPU but cold on memory. +### Ratio vs Unrestricted Fields + +Jitter clamping requires knowing which stressor fields are ratios (clamped `[0, 1]`) +vs throughput/count fields (clamped `>= 0` only): + +**Ratio fields** (clamp `[0.0, 1.0]`): `utilization`, `user_ratio`, `sys_ratio`, +`iowait_ratio`, `steal_ratio`, `used_ratio`, `cache_ratio`, `noise`, `error_rate` + +**Throughput/count fields** (clamp `>= 0`): `read_mbps`, `write_mbps`, `iops_read`, +`iops_write`, `rx_mbps`, `tx_mbps`, `pps_rx`, `pps_tx`, `avg_request_size_kb`, +`load_1min`, `load_5min`, `load_15min` + --- ## 6. Testing Strategy From 27c1a7b3e0e1d7931cba52b432b22244427f9c1a Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Fri, 20 Mar 2026 04:11:35 +0000 Subject: [PATCH 03/23] Add fleet mode implementation plan 5 chunks, 6 tasks: jitter module, fleet profile parsing, host assignment, manifest writer, generation orchestrator, CLI wiring, and documentation updates. TDD throughout. --- .../plans/2026-03-20-fleet-mode.md | 1717 +++++++++++++++++ 1 file changed, 1717 insertions(+) create mode 100644 docs/superpowers/plans/2026-03-20-fleet-mode.md diff --git a/docs/superpowers/plans/2026-03-20-fleet-mode.md b/docs/superpowers/plans/2026-03-20-fleet-mode.md new file mode 100644 index 0000000..3f71a8c --- /dev/null +++ b/docs/superpowers/plans/2026-03-20-fleet-mode.md @@ -0,0 +1,1717 @@ +# Fleet Mode Implementation Plan + +> **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Generate coherent multi-host PCP archive sets from a single fleet profile YAML, with random bad-actor assignment and per-host jitter. + +**Architecture:** Thin orchestrator pattern — new `fleet.py` parses fleet YAML, `jitter.py` applies per-host stressor variation, then the existing `ArchiveWriter` is called once per host. No changes to profile.py, writer.py, timeline.py, or sampler.py. + +**Tech Stack:** Python 3.8+, PyYAML, hashlib (stdlib), concurrent.futures (stdlib), dataclasses (stdlib) + +**Design Spec:** `docs/superpowers/specs/2026-03-20-fleet-mode-design.md` + +--- + +## File Map + +| File | Action | Responsibility | +|------|--------|---------------| +| `pmlogsynth/jitter.py` | Create | Pure function: apply multiplicative jitter to WorkloadProfile stressor fields | +| `pmlogsynth/fleet.py` | Create | Fleet profile dataclasses, YAML loader, host assignment, orchestrator, manifest writer | +| `pmlogsynth/cli.py` | Modify (lines 131-134, 395-407) | Wire fleet subparser args, replace stub handler | +| `tests/unit/test_jitter.py` | Create | Unit tests for jitter application and clamping | +| `tests/unit/test_fleet.py` | Create | Unit tests for fleet profile parsing, host assignment, manifest | +| `tests/integration/test_fleet_integration.py` | Create | Integration tests with mocked ArchiveWriter | +| `tests/fixtures/fleet/baseline.yaml` | Create | Minimal workload profile for fleet test baseline | +| `tests/fixtures/fleet/bad-cpu.yaml` | Create | Bad actor workload profile (CPU saturated) | +| `tests/fixtures/fleet/test-fleet.yaml` | Create | Fleet profile referencing above workloads | + +--- + +## Chunk 1: Jitter Module + +### Task 1: Jitter — apply_jitter pure function + +**Files:** +- Create: `tests/unit/test_jitter.py` +- Create: `pmlogsynth/jitter.py` + +- [ ] **Step 1: Create test fixtures for jitter** + +Create minimal workload profile fixtures for fleet tests. These are standalone workload profiles (not fleet profiles) that the fleet will reference. + +`tests/fixtures/fleet/baseline.yaml`: +```yaml +meta: + hostname: baseline-host + duration: 600 + interval: 60 + +host: + profile: generic-small + +phases: + - name: steady + duration: 600 + cpu: + utilization: 0.50 + user_ratio: 0.70 + sys_ratio: 0.20 + iowait_ratio: 0.10 + memory: + used_ratio: 0.40 + cache_ratio: 0.20 + disk: + read_mbps: 10.0 + write_mbps: 5.0 + network: + rx_mbps: 100.0 + tx_mbps: 50.0 + error_rate: 0.001 +``` + +`tests/fixtures/fleet/bad-cpu.yaml`: +```yaml +meta: + hostname: bad-host + duration: 600 + interval: 60 + +host: + profile: generic-small + +phases: + - name: saturated + duration: 600 + cpu: + utilization: 0.96 + user_ratio: 0.85 + sys_ratio: 0.10 + iowait_ratio: 0.05 + memory: + used_ratio: 0.70 + cache_ratio: 0.10 +``` + +- [ ] **Step 2: Write failing tests for apply_jitter** + +`tests/unit/test_jitter.py`: +```python +"""Unit tests for jitter application.""" + +import pytest + +from pmlogsynth.profile import WorkloadProfile + + +@pytest.fixture() +def baseline_profile() -> WorkloadProfile: + """Load the fleet baseline workload profile.""" + from pathlib import Path + + fixture = Path(__file__).parent.parent / "fixtures" / "fleet" / "baseline.yaml" + return WorkloadProfile.from_file(fixture) + + +class TestApplyJitter: + """Tests for the apply_jitter pure function.""" + + def test_factor_one_returns_identical_values( + self, baseline_profile: WorkloadProfile + ) -> None: + """Jitter factor of 1.0 should not change any values.""" + from pmlogsynth.jitter import apply_jitter + + result = apply_jitter(baseline_profile, 1.0) + phase = result.phases[0] + assert phase.cpu is not None + assert phase.cpu.utilization == 0.50 + assert phase.cpu.user_ratio == 0.70 + assert phase.disk is not None + assert phase.disk.read_mbps == 10.0 + + def test_factor_multiplies_stressor_values( + self, baseline_profile: WorkloadProfile + ) -> None: + """Jitter factor > 1 should scale all numeric stressor fields.""" + from pmlogsynth.jitter import apply_jitter + + result = apply_jitter(baseline_profile, 1.10) + phase = result.phases[0] + assert phase.disk is not None + assert phase.disk.read_mbps == pytest.approx(11.0) + assert phase.disk.write_mbps == pytest.approx(5.5) + assert phase.network is not None + assert phase.network.rx_mbps == pytest.approx(110.0) + assert phase.network.tx_mbps == pytest.approx(55.0) + + def test_ratio_fields_clamped_to_unit_interval( + self, baseline_profile: WorkloadProfile + ) -> None: + """Ratio fields must stay in [0.0, 1.0] after jitter.""" + from pmlogsynth.jitter import apply_jitter + + # Factor of 2.5 would push utilization=0.50 to 1.25 — must clamp to 1.0 + result = apply_jitter(baseline_profile, 2.5) + phase = result.phases[0] + assert phase.cpu is not None + assert phase.cpu.utilization == 1.0 + assert phase.cpu.user_ratio == 1.0 + assert phase.memory is not None + assert phase.memory.used_ratio == 1.0 + assert phase.network is not None + assert phase.network.error_rate == pytest.approx(0.0025) # 0.001 * 2.5 + + def test_throughput_fields_clamped_non_negative( + self, baseline_profile: WorkloadProfile + ) -> None: + """Throughput/count fields must stay >= 0 after jitter.""" + from pmlogsynth.jitter import apply_jitter + + # Factor of 0.0 should clamp everything to 0, not go negative + result = apply_jitter(baseline_profile, 0.0) + phase = result.phases[0] + assert phase.disk is not None + assert phase.disk.read_mbps == 0.0 + assert phase.disk.write_mbps == 0.0 + + def test_does_not_mutate_original( + self, baseline_profile: WorkloadProfile + ) -> None: + """apply_jitter must return a new profile, not mutate the input.""" + from pmlogsynth.jitter import apply_jitter + + original_util = baseline_profile.phases[0].cpu.utilization + apply_jitter(baseline_profile, 1.5) + assert baseline_profile.phases[0].cpu.utilization == original_util + + def test_none_stressor_fields_unchanged( + self, baseline_profile: WorkloadProfile + ) -> None: + """None fields in stressors should remain None after jitter.""" + from pmlogsynth.jitter import apply_jitter + + result = apply_jitter(baseline_profile, 1.1) + phase = result.phases[0] + # baseline.yaml doesn't set noise on cpu stressor + assert phase.cpu is not None + assert phase.cpu.noise is None + + def test_none_stressor_block_unchanged(self) -> None: + """A phase with no disk/network stressor should remain None.""" + from pmlogsynth.jitter import apply_jitter + from pmlogsynth.profile import CpuStressor, Phase, ProfileMeta, WorkloadProfile, HostConfig + + profile = WorkloadProfile( + meta=ProfileMeta(duration=60), + host=HostConfig(), + phases=[Phase(name="minimal", duration=60, cpu=CpuStressor(utilization=0.5))], + ) + result = apply_jitter(profile, 1.2) + assert result.phases[0].disk is None + assert result.phases[0].network is None + assert result.phases[0].cpu is not None + assert result.phases[0].cpu.utilization == pytest.approx(0.6) + + def test_meta_unchanged_by_jitter( + self, baseline_profile: WorkloadProfile + ) -> None: + """Jitter should not touch meta fields (hostname, duration, etc).""" + from pmlogsynth.jitter import apply_jitter + + result = apply_jitter(baseline_profile, 1.5) + assert result.meta.hostname == baseline_profile.meta.hostname + assert result.meta.duration == baseline_profile.meta.duration + assert result.meta.interval == baseline_profile.meta.interval + + def test_multiple_phases_all_jittered(self) -> None: + """All phases in the profile get jitter applied.""" + from pmlogsynth.jitter import apply_jitter + from pmlogsynth.profile import CpuStressor, Phase, ProfileMeta, WorkloadProfile, HostConfig + + profile = WorkloadProfile( + meta=ProfileMeta(duration=120), + host=HostConfig(), + phases=[ + Phase(name="a", duration=60, cpu=CpuStressor(utilization=0.5)), + Phase(name="b", duration=60, cpu=CpuStressor(utilization=0.3)), + ], + ) + result = apply_jitter(profile, 1.2) + assert result.phases[0].cpu.utilization == pytest.approx(0.6) + assert result.phases[1].cpu.utilization == pytest.approx(0.36) +``` + +- [ ] **Step 3: Run tests to verify they fail** + +Run: `pytest tests/unit/test_jitter.py -v` +Expected: FAIL — `ModuleNotFoundError: No module named 'pmlogsynth.jitter'` + +- [ ] **Step 4: Implement jitter module** + +`pmlogsynth/jitter.py`: +```python +"""Per-host stressor jitter — pure function, no mutation.""" + +from dataclasses import replace +from typing import List, Optional, Union + +from pmlogsynth.profile import ( + CpuStressor, + DiskStressor, + MemoryStressor, + NetworkStressor, + Phase, + WorkloadProfile, +) + +# Fields that represent ratios — clamped to [0.0, 1.0] +# Only fields that actually exist in stressor dataclasses belong here. +_RATIO_FIELDS = frozenset({ + "utilization", "user_ratio", "sys_ratio", "iowait_ratio", + "used_ratio", "cache_ratio", "noise", "error_rate", +}) + + +def _clamp(value: float, field_name: str) -> float: + """Clamp a jittered value to its valid range.""" + if field_name in _RATIO_FIELDS: + return max(0.0, min(1.0, value)) + return max(0.0, value) + + +_Stressor = Union[CpuStressor, DiskStressor, MemoryStressor, NetworkStressor] + + +def _jitter_stressor(stressor: Optional[_Stressor], factor: float) -> Optional[_Stressor]: + """Apply jitter factor to all numeric Optional fields on a stressor dataclass.""" + if stressor is None: + return None + updates = {} + for field_name in stressor.__dataclass_fields__: + val = getattr(stressor, field_name) + if val is not None and isinstance(val, (int, float)): + jittered = val * factor + clamped = _clamp(jittered, field_name) + # Preserve int type for int fields + updates[field_name] = type(val)(clamped) if isinstance(val, int) else clamped + return replace(stressor, **updates) + + +def _jitter_phase(phase: Phase, factor: float) -> Phase: + """Apply jitter to all stressors in a phase.""" + return replace( + phase, + cpu=_jitter_stressor(phase.cpu, factor), + memory=_jitter_stressor(phase.memory, factor), + disk=_jitter_stressor(phase.disk, factor), + network=_jitter_stressor(phase.network, factor), + ) + + +def apply_jitter(profile: WorkloadProfile, factor: float) -> WorkloadProfile: + """Apply a multiplicative jitter factor to all stressor values in a profile. + + Returns a new WorkloadProfile — the original is not mutated. + Ratio fields are clamped to [0.0, 1.0]; throughput fields to >= 0. + """ + jittered_phases: List[Phase] = [_jitter_phase(p, factor) for p in profile.phases] + return replace(profile, phases=jittered_phases) +``` + +- [ ] **Step 5: Run tests to verify they pass** + +Run: `pytest tests/unit/test_jitter.py -v` +Expected: All PASS + +- [ ] **Step 6: Commit** + +```bash +git add pmlogsynth/jitter.py tests/unit/test_jitter.py tests/fixtures/fleet/ +git commit -m "Add jitter module for per-host stressor variation + +Pure function multiplies all stressor values by a factor, clamps +ratios to [0,1] and throughput fields to >=0. No mutation." +``` + +--- + +## Chunk 2: Fleet Profile Parsing & Host Assignment + +### Task 2: Fleet profile dataclasses and YAML parser + +**Files:** +- Create: `tests/unit/test_fleet.py` (first batch of tests) +- Create: `pmlogsynth/fleet.py` +- Create: `tests/fixtures/fleet/test-fleet.yaml` + +- [ ] **Step 1: Create fleet test fixture** + +`tests/fixtures/fleet/test-fleet.yaml`: +```yaml +meta: + name: test-fleet + duration: 600 + interval: 60 + hostname_prefix: host + hardware: generic-small + +hosts: + count: 5 + baseline: baseline.yaml + jitter: 0.05 + +bad_actors: + count: 1 + jitter: 0.15 + profiles: + - bad-cpu.yaml +``` + +Note: `baseline.yaml` and `bad-cpu.yaml` are resolved relative to this file's directory — they live alongside it in `tests/fixtures/fleet/`. + +- [ ] **Step 2: Write failing tests for fleet profile parsing** + +`tests/unit/test_fleet.py`: +```python +"""Unit tests for fleet profile loading and host assignment.""" + +from pathlib import Path + +import pytest + + +FLEET_FIXTURES = Path(__file__).parent.parent / "fixtures" / "fleet" + + +class TestLoadFleetProfile: + """Tests for load_fleet_profile YAML parsing.""" + + def test_loads_valid_fleet_profile(self) -> None: + from pmlogsynth.fleet import load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assert fleet.meta.name == "test-fleet" + assert fleet.meta.duration == 600 + assert fleet.meta.interval == 60 + assert fleet.meta.hostname_prefix == "host" + assert fleet.meta.hardware == "generic-small" + assert fleet.hosts.count == 5 + assert fleet.hosts.jitter == 0.05 + assert fleet.bad_actors.count == 1 + assert fleet.bad_actors.jitter == 0.15 + assert len(fleet.bad_actors.profiles) == 1 + + def test_missing_meta_name_raises(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import load_fleet_profile + from pmlogsynth.profile import ValidationError + + (tmp_path / "bad.yaml").write_text( + "meta:\n duration: 600\n interval: 60\n" + " hostname_prefix: x\n hardware: generic-small\n" + "hosts:\n count: 1\n baseline: x.yaml\n" + ) + with pytest.raises(ValidationError, match="meta.name"): + load_fleet_profile(tmp_path / "bad.yaml") + + def test_missing_hosts_raises(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import load_fleet_profile + from pmlogsynth.profile import ValidationError + + (tmp_path / "bad.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: x\n hardware: generic-small\n" + ) + with pytest.raises(ValidationError, match="hosts"): + load_fleet_profile(tmp_path / "bad.yaml") + + def test_bad_actors_count_exceeds_host_count_raises(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import load_fleet_profile + from pmlogsynth.profile import ValidationError + + (tmp_path / "bad.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: x\n hardware: generic-small\n" + "hosts:\n count: 2\n baseline: x.yaml\n" + "bad_actors:\n count: 3\n profiles:\n - y.yaml\n" + ) + with pytest.raises(ValidationError, match="bad_actors.count"): + load_fleet_profile(tmp_path / "bad.yaml") + + def test_bad_actors_defaults_jitter_to_hosts_jitter(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import load_fleet_profile + + (tmp_path / "f.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: x\n hardware: generic-small\n" + "hosts:\n count: 3\n baseline: x.yaml\n jitter: 0.08\n" + "bad_actors:\n count: 1\n profiles:\n - y.yaml\n" + ) + fleet = load_fleet_profile(tmp_path / "f.yaml") + assert fleet.bad_actors.jitter == 0.08 + + def test_no_bad_actors_section_is_valid(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import load_fleet_profile + + (tmp_path / "f.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: x\n hardware: generic-small\n" + "hosts:\n count: 3\n baseline: x.yaml\n" + ) + fleet = load_fleet_profile(tmp_path / "f.yaml") + assert fleet.bad_actors.count == 0 + assert fleet.bad_actors.profiles == [] + + def test_duration_accepts_duration_strings(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import load_fleet_profile + + (tmp_path / "f.yaml").write_text( + "meta:\n name: x\n duration: 24h\n interval: 15s\n" + " hostname_prefix: x\n hardware: generic-small\n" + "hosts:\n count: 1\n baseline: x.yaml\n" + ) + fleet = load_fleet_profile(tmp_path / "f.yaml") + assert fleet.meta.duration == 86400 + assert fleet.meta.interval == 15 + + def test_workload_paths_resolved_relative_to_fleet_file(self) -> None: + from pmlogsynth.fleet import load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + # baseline.yaml is relative to fleet file dir + assert fleet.hosts.baseline_path.exists() + assert fleet.hosts.baseline_path.name == "baseline.yaml" + + +class TestAssignHosts: + """Tests for host assignment with random bad-actor selection.""" + + def test_correct_total_count(self) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + assert len(assignments) == 5 + + def test_correct_bad_actor_count(self) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + bad = [a for a in assignments if a.role == "bad_actor"] + assert len(bad) == 1 + + def test_hostnames_zero_padded(self) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + hostnames = [a.hostname for a in assignments] + assert hostnames == ["host-01", "host-02", "host-03", "host-04", "host-05"] + + def test_seed_produces_deterministic_assignments(self) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + a1 = assign_hosts(fleet, seed=42) + a2 = assign_hosts(fleet, seed=42) + assert [a.hostname for a in a1 if a.role == "bad_actor"] == \ + [a.hostname for a in a2 if a.role == "bad_actor"] + assert [a.jitter_factor for a in a1] == [a.jitter_factor for a in a2] + + def test_different_seeds_produce_different_assignments(self) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + a1 = assign_hosts(fleet, seed=42) + a2 = assign_hosts(fleet, seed=99) + # With 5 hosts and 1 bad actor, different seeds should (usually) pick + # a different bad host. We check jitter factors differ at minimum. + factors1 = [a.jitter_factor for a in a1] + factors2 = [a.jitter_factor for a in a2] + assert factors1 != factors2 + + def test_bad_actor_gets_bad_actor_jitter_stddev(self) -> None: + """Bad actor jitter factors should use bad_actors.jitter, not hosts.jitter.""" + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + # Run many seeds and check bad actor jitter variance is larger + import statistics + + bad_factors = [] + baseline_factors = [] + for seed in range(100): + assignments = assign_hosts(fleet, seed=seed) + for a in assignments: + if a.role == "bad_actor": + bad_factors.append(a.jitter_factor) + else: + baseline_factors.append(a.jitter_factor) + + # bad_actors.jitter=0.15 vs hosts.jitter=0.05 + # stddev of bad actor factors should be ~3x larger + bad_std = statistics.stdev(bad_factors) + baseline_std = statistics.stdev(baseline_factors) + assert bad_std > baseline_std * 1.5 # conservative check + + def test_no_bad_actors_all_baseline(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + (tmp_path / "f.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: srv\n hardware: generic-small\n" + "hosts:\n count: 3\n baseline: x.yaml\n" + ) + fleet = load_fleet_profile(tmp_path / "f.yaml") + assignments = assign_hosts(fleet, seed=1) + assert all(a.role == "baseline" for a in assignments) + + def test_none_seed_produces_assignments(self) -> None: + """seed=None should still work (non-reproducible mode).""" + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=None) + assert len(assignments) == 5 + + def test_zero_pad_width_scales_with_count(self, tmp_path: Path) -> None: + """100+ hosts should get 3-digit zero padding.""" + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + (tmp_path / "f.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: srv\n hardware: generic-small\n" + "hosts:\n count: 100\n baseline: x.yaml\n" + ) + fleet = load_fleet_profile(tmp_path / "f.yaml") + assignments = assign_hosts(fleet, seed=1) + assert assignments[0].hostname == "srv-001" + assert assignments[99].hostname == "srv-100" + + def test_bad_actor_profiles_selected_from_pool(self) -> None: + """Each bad actor gets a profile randomly selected from the pool.""" + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + bad = [a for a in assignments if a.role == "bad_actor"] + for b in bad: + assert b.workload_path.name in ("bad-cpu.yaml",) +``` + +- [ ] **Step 3: Run tests to verify they fail** + +Run: `pytest tests/unit/test_fleet.py -v` +Expected: FAIL — `ModuleNotFoundError: No module named 'pmlogsynth.fleet'` + +- [ ] **Step 4: Implement fleet profile dataclasses and parsing** + +`pmlogsynth/fleet.py`: +```python +"""Fleet profile loading, host assignment, and archive orchestration.""" + +import hashlib +import math +import random +from dataclasses import dataclass, field, replace +from datetime import datetime, timezone +from pathlib import Path +from typing import Any, Dict, List, Optional + +import yaml + +from pmlogsynth.profile import ValidationError, parse_duration + + +# --------------------------------------------------------------------------- +# Dataclasses +# --------------------------------------------------------------------------- + +@dataclass +class FleetMeta: + name: str + duration: int # seconds + interval: int # seconds + hostname_prefix: str + hardware: str # hardware profile name + timezone: str = "UTC" + + +@dataclass +class HostsConfig: + count: int + baseline_path: Path # resolved absolute path to workload YAML + jitter: float = 0.05 + # Store the original relative path string for manifest/warnings + baseline_rel: str = "" + + +@dataclass +class BadActorsConfig: + count: int = 0 + jitter: Optional[float] = None # defaults to hosts.jitter at load time + profiles: List[Path] = field(default_factory=list) + # Original relative path strings + profiles_rel: List[str] = field(default_factory=list) + + +@dataclass +class FleetProfile: + meta: FleetMeta + hosts: HostsConfig + bad_actors: BadActorsConfig + source_path: Path = field(default_factory=lambda: Path(".")) + + +@dataclass +class HostAssignment: + hostname: str + workload_path: Path + workload_rel: str # original relative path for manifest + role: str # "baseline" | "bad_actor" + jitter_factor: float + + +# --------------------------------------------------------------------------- +# Parsing +# --------------------------------------------------------------------------- + +def load_fleet_profile(path: Path) -> FleetProfile: + """Parse and validate a fleet profile YAML file.""" + path = Path(path) + try: + raw = yaml.safe_load(path.read_text(encoding="utf-8")) + except yaml.YAMLError as exc: + raise ValidationError(f"Fleet YAML parse error: {exc}") from exc + except OSError as exc: + raise ValidationError(f"Cannot read fleet profile: {exc}") from exc + + if not isinstance(raw, dict): + raise ValidationError("Fleet profile must be a YAML mapping") + + fleet_dir = path.parent + + meta = _parse_fleet_meta(raw.get("meta")) + hosts = _parse_hosts(raw.get("hosts"), fleet_dir) + bad_actors = _parse_bad_actors(raw.get("bad_actors"), fleet_dir, hosts) + + # Validate bad_actors.count <= hosts.count + if bad_actors.count > hosts.count: + raise ValidationError( + f"bad_actors.count ({bad_actors.count}) exceeds hosts.count ({hosts.count})" + ) + + return FleetProfile(meta=meta, hosts=hosts, bad_actors=bad_actors, source_path=path) + + +def _parse_fleet_meta(raw: Any) -> FleetMeta: + if not isinstance(raw, dict): + raise ValidationError("Fleet profile requires a 'meta' section") + + for required in ("name", "duration", "interval", "hostname_prefix", "hardware"): + if required not in raw: + raise ValidationError(f"meta.{required} is required") + + return FleetMeta( + name=str(raw["name"]), + duration=parse_duration(raw["duration"]), + interval=parse_duration(raw["interval"]), + hostname_prefix=str(raw["hostname_prefix"]), + hardware=str(raw["hardware"]), + timezone=str(raw.get("timezone", "UTC")), + ) + + +def _parse_hosts(raw: Any, fleet_dir: Path) -> HostsConfig: + if not isinstance(raw, dict): + raise ValidationError("Fleet profile requires a 'hosts' section") + + if "count" not in raw: + raise ValidationError("hosts.count is required") + if "baseline" not in raw: + raise ValidationError("hosts.baseline is required") + + count = int(raw["count"]) + if count < 1: + raise ValidationError("hosts.count must be >= 1") + + baseline_rel = str(raw["baseline"]) + baseline_path = (fleet_dir / baseline_rel).resolve() + + jitter = float(raw.get("jitter", 0.05)) + if jitter < 0: + raise ValidationError("hosts.jitter must be >= 0") + + return HostsConfig( + count=count, + baseline_path=baseline_path, + jitter=jitter, + baseline_rel=baseline_rel, + ) + + +def _parse_bad_actors( + raw: Any, fleet_dir: Path, hosts: HostsConfig +) -> BadActorsConfig: + if raw is None: + return BadActorsConfig(count=0, jitter=hosts.jitter) + + if not isinstance(raw, dict): + raise ValidationError("bad_actors must be a mapping") + + count = int(raw.get("count", 0)) + jitter = float(raw["jitter"]) if "jitter" in raw else hosts.jitter + + profiles_rel = raw.get("profiles", []) + if not isinstance(profiles_rel, list): + raise ValidationError("bad_actors.profiles must be a list") + + if count > 0 and len(profiles_rel) == 0: + raise ValidationError("bad_actors.profiles required when bad_actors.count > 0") + + profiles = [(fleet_dir / str(p)).resolve() for p in profiles_rel] + + return BadActorsConfig( + count=count, + jitter=jitter, + profiles=profiles, + profiles_rel=[str(p) for p in profiles_rel], + ) + + +# --------------------------------------------------------------------------- +# Host assignment +# --------------------------------------------------------------------------- + +def _stable_host_seed(global_seed: Optional[int], hostname: str) -> int: + """Deterministic per-host seed using SHA-256 (PYTHONHASHSEED-safe).""" + seed_str = "{}:{}".format(global_seed if global_seed is not None else "", hostname) + digest = hashlib.sha256(seed_str.encode("utf-8")).digest() + return int.from_bytes(digest[:8], "big") + + +def assign_hosts(fleet: FleetProfile, seed: Optional[int] = None) -> List[HostAssignment]: + """Assign hostnames, workload profiles, and jitter factors to each host.""" + count = fleet.hosts.count + pad_width = max(2, len(str(count))) + + # Determine which host indices are bad actors + rng = random.Random(seed) + bad_indices = set(rng.sample(range(count), fleet.bad_actors.count)) + + assignments: List[HostAssignment] = [] + for i in range(count): + hostname = "{}-{}".format( + fleet.meta.hostname_prefix, + str(i + 1).zfill(pad_width), + ) + + if i in bad_indices: + role = "bad_actor" + # Random selection from pool + profile_idx = rng.randrange(len(fleet.bad_actors.profiles)) + workload_path = fleet.bad_actors.profiles[profile_idx] + workload_rel = fleet.bad_actors.profiles_rel[profile_idx] + jitter_stddev = fleet.bad_actors.jitter if fleet.bad_actors.jitter is not None else fleet.hosts.jitter + else: + role = "baseline" + workload_path = fleet.hosts.baseline_path + workload_rel = fleet.hosts.baseline_rel + jitter_stddev = fleet.hosts.jitter + + # Per-host jitter factor + host_rng = random.Random(_stable_host_seed(seed, hostname)) + jitter_factor = host_rng.gauss(1.0, jitter_stddev) if jitter_stddev > 0 else 1.0 + + assignments.append(HostAssignment( + hostname=hostname, + workload_path=workload_path, + workload_rel=workload_rel, + role=role, + jitter_factor=jitter_factor, + )) + + return assignments +``` + +- [ ] **Step 5: Run tests to verify they pass** + +Run: `pytest tests/unit/test_fleet.py -v` +Expected: All PASS + +- [ ] **Step 6: Commit** + +```bash +git add pmlogsynth/fleet.py tests/unit/test_fleet.py tests/fixtures/fleet/test-fleet.yaml +git commit -m "Add fleet profile parsing and host assignment + +Fleet YAML loader with dataclasses, path resolution relative to +fleet file, deterministic host assignment via SHA-256 seeding." +``` + +--- + +## Chunk 3: Fleet Manifest, Override Warnings & Generation Orchestrator + +### Task 3: Fleet manifest writer and override warnings + +**Files:** +- Modify: `tests/unit/test_fleet.py` (add manifest and warning tests) +- Modify: `pmlogsynth/fleet.py` (add manifest writer and override check) + +- [ ] **Step 1: Write failing tests for manifest and warnings** + +Append to `tests/unit/test_fleet.py`: +```python +class TestWriteManifest: + """Tests for fleet.manifest YAML output.""" + + def test_manifest_contains_all_hosts(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import ( + assign_hosts, load_fleet_profile, write_manifest, + ) + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + manifest_path = tmp_path / "fleet.manifest" + write_manifest(manifest_path, fleet, assignments, seed=42) + + import yaml as _yaml + + manifest = _yaml.safe_load(manifest_path.read_text()) + assert manifest["meta"]["name"] == "test-fleet" + assert manifest["meta"]["host_count"] == 5 + assert manifest["meta"]["seed"] == 42 + assert len(manifest["archives"]) == 5 + + def test_manifest_roles_match_assignments(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import ( + assign_hosts, load_fleet_profile, write_manifest, + ) + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + write_manifest(tmp_path / "fleet.manifest", fleet, assignments, seed=42) + + import yaml as _yaml + + manifest = _yaml.safe_load((tmp_path / "fleet.manifest").read_text()) + for entry, assignment in zip(manifest["archives"], assignments): + assert entry["hostname"] == assignment.hostname + assert entry["role"] == assignment.role + assert entry["jitter_factor"] == pytest.approx(assignment.jitter_factor) + + def test_manifest_records_none_seed(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import ( + assign_hosts, load_fleet_profile, write_manifest, + ) + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=None) + write_manifest(tmp_path / "fleet.manifest", fleet, assignments, seed=None) + + import yaml as _yaml + + manifest = _yaml.safe_load((tmp_path / "fleet.manifest").read_text()) + assert manifest["meta"]["seed"] is None + + +class TestOverrideWarnings: + """Tests for warnings when fleet settings override workload profile values.""" + + def test_warns_on_duration_conflict(self, caplog: pytest.LogCaptureFixture) -> None: + import logging + from pmlogsynth.fleet import check_override_warnings, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + # baseline.yaml has duration=600, fleet has duration=600 — no conflict + # So let's modify fleet to have a different duration + from dataclasses import replace + + fleet_different = replace(fleet, meta=replace(fleet.meta, duration=3600)) + with caplog.at_level(logging.WARNING): + check_override_warnings(fleet_different) + assert any("duration" in r.message for r in caplog.records) + + def test_no_warning_when_values_match(self, caplog: pytest.LogCaptureFixture) -> None: + import logging + from pmlogsynth.fleet import check_override_warnings, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + # baseline.yaml has duration=600, fleet also 600 — no conflict + with caplog.at_level(logging.WARNING): + check_override_warnings(fleet) + assert not any("duration" in r.message for r in caplog.records) +``` + +- [ ] **Step 2: Run tests to verify they fail** + +Run: `pytest tests/unit/test_fleet.py::TestWriteManifest -v` +Expected: FAIL — `ImportError: cannot import name 'write_manifest'` + +- [ ] **Step 3: Implement manifest writer and override warnings** + +Add to `pmlogsynth/fleet.py`: +```python +import logging + +logger = logging.getLogger(__name__) + + +def write_manifest( + path: Path, + fleet: FleetProfile, + assignments: List[HostAssignment], + seed: Optional[int], +) -> None: + """Write fleet.manifest YAML file.""" + now = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") + + manifest = { + "meta": { + "name": fleet.meta.name, + "generated": now, + "pmlogsynth_version": "1.0", + "seed": seed, + "duration": fleet.meta.duration, + "interval": fleet.meta.interval, + "hardware": fleet.meta.hardware, + "host_count": len(assignments), + }, + "archives": [ + { + "hostname": a.hostname, + "profile": a.workload_rel, + "role": a.role, + "jitter_factor": round(a.jitter_factor, 6), + } + for a in assignments + ], + } + + path.write_text( + yaml.dump(manifest, default_flow_style=False, sort_keys=False), + encoding="utf-8", + ) + + +def check_override_warnings(fleet: FleetProfile) -> None: + """Emit warnings for workload profile values that fleet settings override. + + Checks each unique workload profile once. + """ + seen: Dict[Path, bool] = {} + + all_paths = [fleet.hosts.baseline_path] + all_rels = [fleet.hosts.baseline_rel] + for p, r in zip(fleet.bad_actors.profiles, fleet.bad_actors.profiles_rel): + all_paths.append(p) + all_rels.append(r) + + for wpath, wrel in zip(all_paths, all_rels): + if wpath in seen: + continue + seen[wpath] = True + + try: + raw = yaml.safe_load(wpath.read_text(encoding="utf-8")) + except (OSError, yaml.YAMLError): + continue # validation catches this elsewhere + + if not isinstance(raw, dict): + continue + + meta = raw.get("meta", {}) + if not isinstance(meta, dict): + continue + + if "duration" in meta: + profile_duration = parse_duration(meta["duration"]) + if profile_duration != fleet.meta.duration: + logger.warning( + "workload profile '%s' defines duration=%s " + "— overridden by fleet setting duration=%s", + wrel, profile_duration, fleet.meta.duration, + ) + + if "interval" in meta: + profile_interval = parse_duration(meta["interval"]) + if profile_interval != fleet.meta.interval: + logger.warning( + "workload profile '%s' defines interval=%s " + "— overridden by fleet setting interval=%s", + wrel, profile_interval, fleet.meta.interval, + ) + + host = raw.get("host", {}) + if isinstance(host, dict) and "profile" in host: + profile_hw = str(host["profile"]) + if profile_hw != fleet.meta.hardware: + logger.warning( + "workload profile '%s' defines hardware=%s " + "— overridden by fleet setting hardware=%s", + wrel, profile_hw, fleet.meta.hardware, + ) +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `pytest tests/unit/test_fleet.py -v` +Expected: All PASS + +- [ ] **Step 5: Commit** + +```bash +git add pmlogsynth/fleet.py tests/unit/test_fleet.py +git commit -m "Add fleet manifest writer and override warnings + +YAML manifest records all host assignments with roles and jitter +factors. Override warnings emitted once per unique workload profile." +``` + +### Task 4: Fleet generation orchestrator + +**Files:** +- Modify: `tests/unit/test_fleet.py` (add dry-run test) +- Create: `tests/integration/test_fleet_integration.py` +- Modify: `pmlogsynth/fleet.py` (add generate_fleet) + +- [ ] **Step 1: Write failing test for dry-run** + +Append to `tests/unit/test_fleet.py`: +```python +class TestDryRun: + """Tests for --dry-run output formatting.""" + + def test_dry_run_prints_all_hosts(self, capsys: pytest.CaptureFixture) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile, print_dry_run + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + print_dry_run(fleet, assignments, seed=42) + + captured = capsys.readouterr() + assert "test-fleet" in captured.out + assert "5 hosts" in captured.out + for a in assignments: + assert a.hostname in captured.out + # Bad actors should be marked + bad = [a for a in assignments if a.role == "bad_actor"] + for b in bad: + assert "BAD" in captured.out +``` + +- [ ] **Step 2: Write failing integration test for generate_fleet** + +`tests/integration/test_fleet_integration.py`: +```python +"""Integration tests for fleet generation with mocked PCP.""" + +from pathlib import Path +from unittest.mock import MagicMock, patch + +import pytest + + +FLEET_FIXTURES = Path(__file__).parent.parent / "fixtures" / "fleet" + + +class TestGenerateFleet: + """Tests for the fleet generation orchestrator.""" + + @patch("pmlogsynth.fleet.importlib.import_module") + def test_generates_correct_number_of_archives( + self, mock_import: MagicMock, tmp_path: Path + ) -> None: + from pmlogsynth.fleet import generate_fleet, assign_hosts, load_fleet_profile + + # Mock the writer module + mock_writer_mod = MagicMock() + mock_writer_cls = MagicMock() + mock_writer_mod.ArchiveWriter = mock_writer_cls + mock_writer_mod.ArchiveConflictError = Exception + mock_writer_mod.ArchiveGenerationError = Exception + mock_import.return_value = mock_writer_mod + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + + generate_fleet( + fleet=fleet, + assignments=assignments, + output_dir=tmp_path, + seed=42, + jobs=1, + force=False, + start=None, + verbose=False, + config_dir=None, + ) + + # ArchiveWriter should be instantiated once per host + assert mock_writer_cls.call_count == 5 + + @patch("pmlogsynth.fleet.importlib.import_module") + def test_manifest_written_after_generation( + self, mock_import: MagicMock, tmp_path: Path + ) -> None: + from pmlogsynth.fleet import generate_fleet, assign_hosts, load_fleet_profile + + mock_writer_mod = MagicMock() + mock_writer_mod.ArchiveWriter = MagicMock() + mock_writer_mod.ArchiveConflictError = Exception + mock_writer_mod.ArchiveGenerationError = Exception + mock_import.return_value = mock_writer_mod + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + + generate_fleet( + fleet=fleet, + assignments=assignments, + output_dir=tmp_path, + seed=42, + jobs=1, + force=False, + start=None, + verbose=False, + config_dir=None, + ) + + manifest_path = tmp_path / "fleet.manifest" + assert manifest_path.exists() + + @patch("pmlogsynth.fleet.importlib.import_module") + def test_fleet_overrides_applied_to_profiles( + self, mock_import: MagicMock, tmp_path: Path + ) -> None: + from pmlogsynth.fleet import generate_fleet, assign_hosts, load_fleet_profile + + mock_writer_mod = MagicMock() + mock_writer_cls = MagicMock() + mock_writer_mod.ArchiveWriter = mock_writer_cls + mock_writer_mod.ArchiveConflictError = Exception + mock_writer_mod.ArchiveGenerationError = Exception + mock_import.return_value = mock_writer_mod + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + + generate_fleet( + fleet=fleet, + assignments=assignments, + output_dir=tmp_path, + seed=42, + jobs=1, + force=False, + start=None, + verbose=False, + config_dir=None, + ) + + # Check each ArchiveWriter call got the right hostname + for call_args, assignment in zip(mock_writer_cls.call_args_list, assignments): + profile = call_args[1]["profile"] if "profile" in call_args[1] else call_args[0][1] + assert profile.meta.hostname == assignment.hostname + assert profile.meta.duration == fleet.meta.duration + assert profile.meta.interval == fleet.meta.interval + + @patch("pmlogsynth.fleet.importlib.import_module") + def test_output_directory_created( + self, mock_import: MagicMock, tmp_path: Path + ) -> None: + from pmlogsynth.fleet import generate_fleet, assign_hosts, load_fleet_profile + + mock_writer_mod = MagicMock() + mock_writer_mod.ArchiveWriter = MagicMock() + mock_writer_mod.ArchiveConflictError = Exception + mock_writer_mod.ArchiveGenerationError = Exception + mock_import.return_value = mock_writer_mod + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + out = tmp_path / "nested" / "output" + + generate_fleet( + fleet=fleet, + assignments=assignments, + output_dir=out, + seed=42, + jobs=1, + force=False, + start=None, + verbose=False, + config_dir=None, + ) + + assert out.exists() +``` + +- [ ] **Step 3: Run tests to verify they fail** + +Run: `pytest tests/unit/test_fleet.py::TestDryRun tests/integration/test_fleet_integration.py -v` +Expected: FAIL — `ImportError` + +- [ ] **Step 4: Implement generate_fleet and print_dry_run** + +Add to `pmlogsynth/fleet.py`: +```python +import importlib +import sys + + +def print_dry_run( + fleet: FleetProfile, + assignments: List[HostAssignment], + seed: Optional[int], +) -> None: + """Print host assignment table without generating archives.""" + seed_str = str(seed) if seed is not None else "none" + print("Fleet: {} ({} hosts, seed={})".format(fleet.meta.name, len(assignments), seed_str)) + print() + for a in assignments: + role_label = "BAD " if a.role == "bad_actor" else "baseline " + print(" {} {} {} (jitter: x{:.2f})".format( + a.hostname, role_label, a.workload_rel, a.jitter_factor, + )) + + +def generate_fleet( + fleet: FleetProfile, + assignments: List[HostAssignment], + output_dir: Path, + seed: Optional[int], + jobs: int = 1, + force: bool = False, + start: Optional[datetime] = None, + verbose: bool = False, + config_dir: Optional[Path] = None, +) -> None: + """Generate one PCP archive per host, then write fleet.manifest.""" + output_dir = Path(output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + + # Lazy import writer module (avoid PCP dependency at parse time) + _writer_mod = importlib.import_module("pmlogsynth.writer") + ArchiveWriter = _writer_mod.ArchiveWriter + ArchiveConflictError = _writer_mod.ArchiveConflictError + ArchiveGenerationError = _writer_mod.ArchiveGenerationError + + from pmlogsynth.jitter import apply_jitter + from pmlogsynth.profile import ProfileResolver, WorkloadProfile + from pmlogsynth.sampler import ValueSampler + from pmlogsynth.timeline import TimelineSequencer + + # Resolve hardware profile once (shared across all hosts) + resolver = ProfileResolver(config_dir=config_dir) + hardware = resolver.resolve(fleet.meta.hardware) + + # Check for override warnings (once, before generation loop) + check_override_warnings(fleet) + + def _generate_one(assignment: HostAssignment) -> None: + """Generate a single host archive.""" + # Load workload profile + profile_text = assignment.workload_path.read_text(encoding="utf-8") + profile = WorkloadProfile.from_string(profile_text, config_dir=config_dir) + + # Apply fleet-level overrides via dataclasses.replace + overridden_meta = replace( + profile.meta, + hostname=assignment.hostname, + duration=fleet.meta.duration, + interval=fleet.meta.interval, + timezone=fleet.meta.timezone, + ) + profile = replace(profile, meta=overridden_meta, hardware=hardware) + + # Apply jitter + profile = apply_jitter(profile, assignment.jitter_factor) + + # Expand timeline + timeline = TimelineSequencer(profile).expand(start_time=start) + + # Create sampler + sampler = ValueSampler(noise=profile.meta.noise) + + # Write archive + output_path = str(output_dir / assignment.hostname) + writer = ArchiveWriter( + output_path=output_path, + profile=profile, + hardware=hardware, + force=force, + ) + writer.write(timeline=timeline, sampler=sampler) + + if verbose: + print( + " generated: {} ({})".format(assignment.hostname, assignment.role), + file=sys.stderr, + ) + + # Generate archives sequentially. + # NOTE: --jobs is accepted by the CLI but runs sequentially in v1. + # True parallel generation via ProcessPoolExecutor is deferred — the + # inner function captures too much state to be picklable. This is a + # known gap vs the design spec's Tier 2 "--jobs=2 dispatches correctly" + # test requirement. Will be addressed when scale demands it. + for assignment in assignments: + _generate_one(assignment) + + # Write manifest + write_manifest(output_dir / "fleet.manifest", fleet, assignments, seed=seed) +``` + +- [ ] **Step 5: Run tests to verify they pass** + +Run: `pytest tests/unit/test_fleet.py tests/integration/test_fleet_integration.py -v` +Expected: All PASS + +- [ ] **Step 6: Commit** + +```bash +git add pmlogsynth/fleet.py tests/unit/test_fleet.py tests/integration/test_fleet_integration.py +git commit -m "Add fleet generation orchestrator, dry-run, and manifest output + +Loops over host assignments, loads workload profiles with fleet-level +overrides, applies jitter, and calls existing ArchiveWriter per host." +``` + +--- + +## Chunk 4: CLI Wiring & Update Existing Fleet Stub Test + +### Task 5: Wire fleet subcommand into CLI + +**Files:** +- Modify: `pmlogsynth/cli.py` (lines 131-134, 395-407) +- Modify: `tests/unit/test_cli.py` (update fleet stub test) + +- [ ] **Step 1: Write failing test for fleet CLI** + +Update existing test in `tests/unit/test_cli.py`. The existing `test_fleet_subcommand_exits_2` should be replaced with a test that verifies the fleet subcommand now accepts arguments and calls into fleet logic. + +Add to `tests/unit/test_cli.py`: +```python +def test_fleet_validate_exits_0_on_valid_profile(tmp_path: pytest.TempPathFactory) -> None: + """fleet --validate should exit 0 for a valid fleet profile.""" + from pathlib import Path + + fleet_fixtures = Path(__file__).parent.parent / "fixtures" / "fleet" + with patch("sys.argv", [ + "pmlogsynth", "fleet", "--validate", + str(fleet_fixtures / "test-fleet.yaml"), + ]): + with pytest.raises(SystemExit) as exc_info: + main() + assert exc_info.value.code == 0 + + +def test_fleet_dry_run_exits_0(capsys: pytest.CaptureFixture) -> None: + """fleet --dry-run should print assignments and exit 0.""" + from pathlib import Path + + fleet_fixtures = Path(__file__).parent.parent / "fixtures" / "fleet" + with patch("sys.argv", [ + "pmlogsynth", "fleet", "--dry-run", "--seed", "42", + str(fleet_fixtures / "test-fleet.yaml"), + ]): + with pytest.raises(SystemExit) as exc_info: + main() + assert exc_info.value.code == 0 + captured = capsys.readouterr() + assert "test-fleet" in captured.out + assert "host-01" in captured.out +``` + +- [ ] **Step 2: Run tests to verify they fail** + +Run: `pytest tests/unit/test_cli.py::test_fleet_validate_exits_0_on_valid_profile tests/unit/test_cli.py::test_fleet_dry_run_exits_0 -v` +Expected: FAIL — fleet subcommand still exits 2 + +- [ ] **Step 3: Wire fleet subparser and handler in cli.py** + +Replace the fleet stub parser at line 134 with a fully-wired subparser. Replace the fleet stub handler at lines 404-407. + +In `_build_parser()`, replace: +```python + # Reserve 'fleet' for Phase 3 + subparsers.add_parser("fleet", help=argparse.SUPPRESS) +``` + +With: +```python + # --- fleet subcommand --- + fleet_parser = subparsers.add_parser( + "fleet", + help="Generate a fleet of PCP archives from a fleet profile.", + add_help=True, + ) + _add_fleet_args(fleet_parser) +``` + +Add new function `_add_fleet_args`: +```python +def _add_fleet_args(p: argparse.ArgumentParser) -> None: + """Add fleet-command arguments to a parser.""" + p.add_argument( + "fleet_profile", + metavar="FLEET_PROFILE", + help="Path to fleet YAML profile.", + ) + p.add_argument( + "-o", "--output-dir", + metavar="PATH", + default=None, + help="Output directory for archives (default: ./generated-archives/fleet-).", + ) + p.add_argument( + "--seed", + type=int, + default=None, + metavar="INT", + help="PRNG seed for reproducible jitter and bad-actor assignment.", + ) + p.add_argument( + "--jobs", + type=int, + default=1, + metavar="INT", + help="Parallel archive generation workers (default: 1).", + ) + p.add_argument( + "--dry-run", + action="store_true", + default=False, + help="Print host/profile assignments without generating archives.", + ) + p.add_argument( + "--force", + action="store_true", + default=False, + help="Overwrite existing archive files.", + ) + p.add_argument( + "--validate", + action="store_true", + default=False, + help="Validate fleet profile and exit.", + ) + p.add_argument( + "--start", + metavar="TIMESTAMP", + help=( + "Archive start time (ISO 8601 or 'YYYY-MM-DD HH:MM:SS TZ'). " + "Overrides meta.start. Default: today at 00:00:00 UTC." + ), + ) + p.add_argument( + "-v", "--verbose", + action="store_true", + default=False, + help="Show per-host progress.", + ) +``` + +Add new function `_cmd_fleet`: +```python +def _cmd_fleet(args: argparse.Namespace) -> int: + """Handle the fleet subcommand.""" + from pmlogsynth.fleet import ( + assign_hosts, + check_override_warnings, + generate_fleet, + load_fleet_profile, + print_dry_run, + ) + + config_dir = Path(args.config_dir) if args.config_dir else None + + # Validate incompatibilities + if args.validate: + for flag in ("force", "dry_run"): + if getattr(args, flag, False): + print( + "error: --validate is incompatible with " + "--{}".format(flag.replace("_", "-")), + file=sys.stderr, + ) + return 1 + + # Load fleet profile + try: + fleet = load_fleet_profile(Path(args.fleet_profile)) + except ValidationError as exc: + print("Validation error: {}".format(exc), file=sys.stderr) + return 1 + + # Validate-only mode + if args.validate: + check_override_warnings(fleet) + print("Fleet profile is valid: {} ({} hosts)".format( + fleet.meta.name, fleet.hosts.count, + )) + return 0 + + # Assign hosts + assignments = assign_hosts(fleet, seed=args.seed) + + # Dry-run mode + if args.dry_run: + print_dry_run(fleet, assignments, seed=args.seed) + return 0 + + # Parse start time + start_time = None + if args.start: + from pmlogsynth.time_parsing import parse_absolute_timestamp + try: + start_time = parse_absolute_timestamp(args.start, field="--start") + except ValidationError as exc: + print("error: {}".format(exc), file=sys.stderr) + return 1 + + # Determine output directory + output_dir = args.output_dir + if output_dir is None: + output_dir = "./generated-archives/fleet-{}".format(fleet.meta.name) + + # Generate + try: + generate_fleet( + fleet=fleet, + assignments=assignments, + output_dir=Path(output_dir), + seed=args.seed, + jobs=args.jobs, + force=args.force, + start=start_time, + verbose=args.verbose, + config_dir=config_dir, + ) + except Exception as exc: + print("error: Fleet generation failed: {}".format(exc), file=sys.stderr) + return 3 + + print("Fleet '{}' generated: {} archives in {}".format( + fleet.meta.name, len(assignments), output_dir, + )) + return 0 +``` + +In `main()`, replace the fleet stub: +```python + # Fleet stub + if args.subcommand == "fleet": + print("error: fleet subcommand not yet implemented", file=sys.stderr) + sys.exit(2) +``` + +With: +```python + # Fleet subcommand + if args.subcommand == "fleet": + sys.exit(_cmd_fleet(args)) +``` + +- [ ] **Step 4: Update existing fleet stub test in-place** + +In `tests/unit/test_cli.py`, modify `test_fleet_subcommand_exits_2` in-place. The +function body changes but the test is updated, not deleted (per CLAUDE.md rules). +Change the docstring and assertion to reflect that fleet now uses argparse which +exits 2 on missing positional args (same exit code, different reason): +```python +def test_fleet_subcommand_exits_2(capsys: pytest.CaptureFixture) -> None: + """fleet subcommand without FLEET_PROFILE arg exits non-zero.""" + with patch("sys.argv", ["pmlogsynth", "fleet"]): + with pytest.raises(SystemExit) as exc_info: + main() + assert exc_info.value.code != 0 +``` + +- [ ] **Step 5: Run tests to verify they pass** + +Run: `pytest tests/unit/test_cli.py -v` +Expected: All PASS (including old tests that didn't change) + +- [ ] **Step 6: Run full quality gate** + +Run: `./pre-commit.sh` +Expected: All green — ruff, mypy, unit + integration tests + +- [ ] **Step 7: Commit** + +```bash +git add pmlogsynth/cli.py tests/unit/test_cli.py +git commit -m "Wire fleet subcommand into CLI with validate and dry-run + +Replaces the Phase 3 stub with a fully functional fleet subparser. +Supports --validate, --dry-run, --seed, --jobs, --force, --start." +``` + +--- + +## Chunk 5: Final Validation & Docs + +### Task 6: Quality gate and documentation updates + +**Files:** +- Modify: `CLAUDE.md` (update project structure, remove fleet reservation note) +- Modify: `man/pmlogsynth.1` (add fleet subcommand) +- Modify: `README.md` (mention fleet mode) +- Modify: `docs/profile-format.md` (add fleet profile schema section) + +- [ ] **Step 1: Run full pre-commit quality gate** + +Run: `./pre-commit.sh` +Expected: All green + +- [ ] **Step 2: Update CLAUDE.md project structure** + +Add `fleet.py` and `jitter.py` to the project structure listing. Update the CLI note about `fleet` being reserved — it's now implemented. + +- [ ] **Step 3: Update man page** + +Add fleet subcommand documentation to `man/pmlogsynth.1` — synopsis, description of options, example usage. + +- [ ] **Step 4: Update docs/profile-format.md** + +Add a "Fleet Profile Format" section covering: `meta` fields (name, duration, interval, +hostname_prefix, hardware, timezone), `hosts` fields (count, baseline, jitter), and +`bad_actors` fields (count, jitter, profiles). Document path resolution rules and +fleet-level override behaviour. + +- [ ] **Step 5: Update README.md** + +Add a brief "Fleet Mode" section showing: +```bash +# Generate a 20-host fleet with 2 bad actors +pmlogsynth fleet -o ./generated-archives/cluster --seed 42 fleet-profile.yaml + +# Preview assignments without generating +pmlogsynth fleet --dry-run --seed 42 fleet-profile.yaml +``` + +- [ ] **Step 6: Run quality gate again after doc changes** + +Run: `./pre-commit.sh` +Expected: All green (mandoc lint, ruff, mypy, all tests) + +- [ ] **Step 7: Commit** + +```bash +git add CLAUDE.md man/pmlogsynth.1 README.md docs/profile-format.md +git commit -m "Document fleet mode in man page, README, and CLAUDE.md + +Fleet subcommand is now implemented — update project docs to reflect +the new capability and remove Phase 3 reservation notes." +``` + +- [ ] **Step 8: Final verification** + +Run: `pytest -v` (all tiers) +Run: `./pre-commit.sh` + +Confirm everything is green before pushing. From 5cbc8170226025e846c884edc1b0d04ab9a55678 Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Fri, 20 Mar 2026 05:21:17 +0000 Subject: [PATCH 04/23] Use ThreadPoolExecutor for --jobs parallel fleet generation Threads avoid pickling issues with closure-based _generate_one. PCP archive writing is I/O-bound so GIL isn't a bottleneck. Default --jobs to CPU count. --- .../plans/2026-03-20-fleet-mode.md | 62 ++++++++++++++++--- 1 file changed, 52 insertions(+), 10 deletions(-) diff --git a/docs/superpowers/plans/2026-03-20-fleet-mode.md b/docs/superpowers/plans/2026-03-20-fleet-mode.md index 3f71a8c..7c17291 100644 --- a/docs/superpowers/plans/2026-03-20-fleet-mode.md +++ b/docs/superpowers/plans/2026-03-20-fleet-mode.md @@ -1246,6 +1246,39 @@ class TestGenerateFleet: ) assert out.exists() + + @patch("pmlogsynth.fleet.importlib.import_module") + def test_parallel_jobs_generates_all_archives( + self, mock_import: MagicMock, tmp_path: Path + ) -> None: + from pmlogsynth.fleet import generate_fleet, assign_hosts, load_fleet_profile + + mock_writer_mod = MagicMock() + mock_writer_cls = MagicMock() + mock_writer_mod.ArchiveWriter = mock_writer_cls + mock_writer_mod.ArchiveConflictError = Exception + mock_writer_mod.ArchiveGenerationError = Exception + mock_import.return_value = mock_writer_mod + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + + generate_fleet( + fleet=fleet, + assignments=assignments, + output_dir=tmp_path, + seed=42, + jobs=2, + force=False, + start=None, + verbose=False, + config_dir=None, + ) + + # All 5 archives generated even with --jobs=2 + assert mock_writer_cls.call_count == 5 + # Manifest still written + assert (tmp_path / "fleet.manifest").exists() ``` - [ ] **Step 3: Run tests to verify they fail** @@ -1351,14 +1384,21 @@ def generate_fleet( file=sys.stderr, ) - # Generate archives sequentially. - # NOTE: --jobs is accepted by the CLI but runs sequentially in v1. - # True parallel generation via ProcessPoolExecutor is deferred — the - # inner function captures too much state to be picklable. This is a - # known gap vs the design spec's Tier 2 "--jobs=2 dispatches correctly" - # test requirement. Will be addressed when scale demands it. - for assignment in assignments: - _generate_one(assignment) + # Generate archives — ThreadPoolExecutor for --jobs>1. + # Threads (not processes) because _generate_one is a closure and PCP + # archive writing is I/O-bound (disk writes), so GIL isn't a bottleneck. + if jobs <= 1: + for assignment in assignments: + _generate_one(assignment) + else: + from concurrent.futures import ThreadPoolExecutor, as_completed + + with ThreadPoolExecutor(max_workers=jobs) as pool: + futures = { + pool.submit(_generate_one, a): a for a in assignments + } + for future in as_completed(futures): + future.result() # raises if _generate_one failed # Write manifest write_manifest(output_dir / "fleet.manifest", fleet, assignments, seed=seed) @@ -1474,12 +1514,14 @@ def _add_fleet_args(p: argparse.ArgumentParser) -> None: metavar="INT", help="PRNG seed for reproducible jitter and bad-actor assignment.", ) + import os + p.add_argument( "--jobs", type=int, - default=1, + default=os.cpu_count() or 1, metavar="INT", - help="Parallel archive generation workers (default: 1).", + help="Parallel archive generation workers (default: CPU count).", ) p.add_argument( "--dry-run", From a81cbcaaad9051ce3c1847297dd7a23307b0510c Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Fri, 20 Mar 2026 05:31:09 +0000 Subject: [PATCH 05/23] Add jitter module for per-host stressor variation Pure function multiplies all stressor values by a factor, clamps ratios to [0,1] and throughput fields to >=0. No mutation. --- pmlogsynth/jitter.py | 71 ++++++++++++++++ tests/fixtures/fleet/bad-cpu.yaml | 19 +++++ tests/fixtures/fleet/baseline.yaml | 26 ++++++ tests/unit/test_jitter.py | 131 +++++++++++++++++++++++++++++ 4 files changed, 247 insertions(+) create mode 100644 pmlogsynth/jitter.py create mode 100644 tests/fixtures/fleet/bad-cpu.yaml create mode 100644 tests/fixtures/fleet/baseline.yaml create mode 100644 tests/unit/test_jitter.py diff --git a/pmlogsynth/jitter.py b/pmlogsynth/jitter.py new file mode 100644 index 0000000..d42eabd --- /dev/null +++ b/pmlogsynth/jitter.py @@ -0,0 +1,71 @@ +"""Per-host stressor jitter — pure function, no mutation.""" + +from dataclasses import fields, replace +from typing import Any, Dict, List, Optional, TypeVar + +from pmlogsynth.profile import ( + Phase, + WorkloadProfile, +) + +# Fields that represent ratios — clamped to [0.0, 1.0] +# Only fields that actually exist in stressor dataclasses belong here. +_RATIO_FIELDS = frozenset({ + "utilization", "user_ratio", "sys_ratio", "iowait_ratio", + "used_ratio", "cache_ratio", "noise", "error_rate", +}) + +_T = TypeVar("_T") + + +def _clamp(value: float, field_name: str) -> float: + """Clamp a jittered value to its valid range.""" + if field_name in _RATIO_FIELDS: + return max(0.0, min(1.0, value)) + return max(0.0, value) + + +def _jitter_dataclass(stressor: _T, factor: float) -> _T: + """Apply jitter factor to all numeric Optional fields on a dataclass.""" + updates: Dict[str, Any] = {} + for f in fields(stressor): # type: ignore[arg-type] + val = getattr(stressor, f.name) + if val is not None and isinstance(val, (int, float)): + jittered = val * factor + clamped = _clamp(jittered, f.name) + # Preserve int type for int fields + if isinstance(val, int): + updates[f.name] = int(clamped) + else: + updates[f.name] = clamped + return replace(stressor, **updates) # type: ignore[type-var] + + +def _jitter_optional(stressor: Optional[_T], factor: float) -> Optional[_T]: + """Apply jitter to an optional stressor, returning None if input is None.""" + if stressor is None: + return None + return _jitter_dataclass(stressor, factor) + + +def _jitter_phase(phase: Phase, factor: float) -> Phase: + """Apply jitter to all stressors in a phase.""" + return replace( + phase, + cpu=_jitter_optional(phase.cpu, factor), + memory=_jitter_optional(phase.memory, factor), + disk=_jitter_optional(phase.disk, factor), + network=_jitter_optional(phase.network, factor), + ) + + +def apply_jitter(profile: WorkloadProfile, factor: float) -> WorkloadProfile: + """Apply a multiplicative jitter factor to all stressor values in a profile. + + Returns a new WorkloadProfile — the original is not mutated. + Ratio fields are clamped to [0.0, 1.0]; throughput fields to >= 0. + """ + jittered_phases: List[Phase] = [ + _jitter_phase(p, factor) for p in profile.phases + ] + return replace(profile, phases=jittered_phases) diff --git a/tests/fixtures/fleet/bad-cpu.yaml b/tests/fixtures/fleet/bad-cpu.yaml new file mode 100644 index 0000000..328b74b --- /dev/null +++ b/tests/fixtures/fleet/bad-cpu.yaml @@ -0,0 +1,19 @@ +meta: + hostname: bad-host + duration: 600 + interval: 60 + +host: + profile: generic-small + +phases: + - name: saturated + duration: 600 + cpu: + utilization: 0.96 + user_ratio: 0.85 + sys_ratio: 0.10 + iowait_ratio: 0.05 + memory: + used_ratio: 0.70 + cache_ratio: 0.10 diff --git a/tests/fixtures/fleet/baseline.yaml b/tests/fixtures/fleet/baseline.yaml new file mode 100644 index 0000000..9ce459e --- /dev/null +++ b/tests/fixtures/fleet/baseline.yaml @@ -0,0 +1,26 @@ +meta: + hostname: baseline-host + duration: 600 + interval: 60 + +host: + profile: generic-small + +phases: + - name: steady + duration: 600 + cpu: + utilization: 0.50 + user_ratio: 0.70 + sys_ratio: 0.20 + iowait_ratio: 0.10 + memory: + used_ratio: 0.40 + cache_ratio: 0.20 + disk: + read_mbps: 10.0 + write_mbps: 5.0 + network: + rx_mbps: 100.0 + tx_mbps: 50.0 + error_rate: 0.001 diff --git a/tests/unit/test_jitter.py b/tests/unit/test_jitter.py new file mode 100644 index 0000000..513ca82 --- /dev/null +++ b/tests/unit/test_jitter.py @@ -0,0 +1,131 @@ +"""Unit tests for jitter application.""" + +import pytest + +from pmlogsynth.profile import WorkloadProfile + + +@pytest.fixture() +def baseline_profile() -> WorkloadProfile: + """Load the fleet baseline workload profile.""" + from pathlib import Path + + fixture = Path(__file__).parent.parent / "fixtures" / "fleet" / "baseline.yaml" + return WorkloadProfile.from_file(fixture) + + +class TestApplyJitter: + """Tests for the apply_jitter pure function.""" + + def test_factor_one_returns_identical_values( + self, baseline_profile: WorkloadProfile + ) -> None: + from pmlogsynth.jitter import apply_jitter + + result = apply_jitter(baseline_profile, 1.0) + phase = result.phases[0] + assert phase.cpu is not None + assert phase.cpu.utilization == 0.50 + assert phase.cpu.user_ratio == 0.70 + assert phase.disk is not None + assert phase.disk.read_mbps == 10.0 + + def test_factor_multiplies_stressor_values( + self, baseline_profile: WorkloadProfile + ) -> None: + from pmlogsynth.jitter import apply_jitter + + result = apply_jitter(baseline_profile, 1.10) + phase = result.phases[0] + assert phase.disk is not None + assert phase.disk.read_mbps == pytest.approx(11.0) + assert phase.disk.write_mbps == pytest.approx(5.5) + assert phase.network is not None + assert phase.network.rx_mbps == pytest.approx(110.0) + assert phase.network.tx_mbps == pytest.approx(55.0) + + def test_ratio_fields_clamped_to_unit_interval( + self, baseline_profile: WorkloadProfile + ) -> None: + from pmlogsynth.jitter import apply_jitter + + result = apply_jitter(baseline_profile, 2.5) + phase = result.phases[0] + assert phase.cpu is not None + assert phase.cpu.utilization == 1.0 + assert phase.cpu.user_ratio == 1.0 + assert phase.memory is not None + assert phase.memory.used_ratio == 1.0 + assert phase.network is not None + assert phase.network.error_rate == pytest.approx(0.0025) + + def test_throughput_fields_clamped_non_negative( + self, baseline_profile: WorkloadProfile + ) -> None: + from pmlogsynth.jitter import apply_jitter + + result = apply_jitter(baseline_profile, 0.0) + phase = result.phases[0] + assert phase.disk is not None + assert phase.disk.read_mbps == 0.0 + assert phase.disk.write_mbps == 0.0 + + def test_does_not_mutate_original( + self, baseline_profile: WorkloadProfile + ) -> None: + from pmlogsynth.jitter import apply_jitter + + original_util = baseline_profile.phases[0].cpu.utilization + apply_jitter(baseline_profile, 1.5) + assert baseline_profile.phases[0].cpu.utilization == original_util + + def test_none_stressor_fields_unchanged( + self, baseline_profile: WorkloadProfile + ) -> None: + from pmlogsynth.jitter import apply_jitter + + result = apply_jitter(baseline_profile, 1.1) + phase = result.phases[0] + assert phase.cpu is not None + assert phase.cpu.noise is None + + def test_none_stressor_block_unchanged(self) -> None: + from pmlogsynth.jitter import apply_jitter + from pmlogsynth.profile import CpuStressor, HostConfig, Phase, ProfileMeta, WorkloadProfile + + profile = WorkloadProfile( + meta=ProfileMeta(duration=60), + host=HostConfig(), + phases=[Phase(name="minimal", duration=60, cpu=CpuStressor(utilization=0.5))], + ) + result = apply_jitter(profile, 1.2) + assert result.phases[0].disk is None + assert result.phases[0].network is None + assert result.phases[0].cpu is not None + assert result.phases[0].cpu.utilization == pytest.approx(0.6) + + def test_meta_unchanged_by_jitter( + self, baseline_profile: WorkloadProfile + ) -> None: + from pmlogsynth.jitter import apply_jitter + + result = apply_jitter(baseline_profile, 1.5) + assert result.meta.hostname == baseline_profile.meta.hostname + assert result.meta.duration == baseline_profile.meta.duration + assert result.meta.interval == baseline_profile.meta.interval + + def test_multiple_phases_all_jittered(self) -> None: + from pmlogsynth.jitter import apply_jitter + from pmlogsynth.profile import CpuStressor, HostConfig, Phase, ProfileMeta, WorkloadProfile + + profile = WorkloadProfile( + meta=ProfileMeta(duration=120), + host=HostConfig(), + phases=[ + Phase(name="a", duration=60, cpu=CpuStressor(utilization=0.5)), + Phase(name="b", duration=60, cpu=CpuStressor(utilization=0.3)), + ], + ) + result = apply_jitter(profile, 1.2) + assert result.phases[0].cpu.utilization == pytest.approx(0.6) + assert result.phases[1].cpu.utilization == pytest.approx(0.36) From 867f3443e7bd3d2a567b8d2da73c0dcf4576b86a Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Fri, 20 Mar 2026 05:35:04 +0000 Subject: [PATCH 06/23] Add fleet profile parsing and host assignment Fleet YAML loader with dataclasses, path resolution relative to fleet file, deterministic host assignment via SHA-256 seeding. --- pmlogsynth/fleet.py | 250 +++++++++++++++++++++++++++ tests/fixtures/fleet/test-fleet.yaml | 17 ++ tests/unit/test_fleet.py | 214 +++++++++++++++++++++++ 3 files changed, 481 insertions(+) create mode 100644 pmlogsynth/fleet.py create mode 100644 tests/fixtures/fleet/test-fleet.yaml create mode 100644 tests/unit/test_fleet.py diff --git a/pmlogsynth/fleet.py b/pmlogsynth/fleet.py new file mode 100644 index 0000000..39b8cbf --- /dev/null +++ b/pmlogsynth/fleet.py @@ -0,0 +1,250 @@ +"""Fleet profile loading, validation, and host assignment.""" + +import hashlib +import random +from dataclasses import dataclass, field +from pathlib import Path +from typing import Any, Dict, List, Optional + +import yaml + +from pmlogsynth.profile import ValidationError, parse_duration + + +@dataclass +class FleetMeta: + """Top-level fleet metadata.""" + + name: str + duration: int + interval: int + hostname_prefix: str + hardware: str + + +@dataclass +class HostsConfig: + """Baseline host configuration.""" + + count: int + baseline: str + baseline_path: Path + jitter: float = 0.0 + + +@dataclass +class BadActorsConfig: + """Bad-actor host configuration.""" + + count: int = 0 + jitter: float = 0.0 + profiles: List[str] = field(default_factory=list) + profile_paths: List[Path] = field(default_factory=list) + + +@dataclass +class FleetProfile: + """Parsed fleet profile — the full fleet specification.""" + + meta: FleetMeta + hosts: HostsConfig + bad_actors: BadActorsConfig + + +@dataclass +class HostAssignment: + """One host's role, jitter factor, and workload path.""" + + hostname: str + role: str # "baseline" or "bad_actor" + jitter_factor: float + workload_path: Path + + +def _parse_fleet_meta(raw: Dict[str, Any]) -> FleetMeta: + """Parse and validate the meta section of a fleet profile.""" + meta = raw.get("meta") + if not isinstance(meta, dict): + raise ValidationError("fleet profile missing 'meta' section") + + name = meta.get("name") + if not name: + raise ValidationError("fleet profile missing 'meta.name'") + + duration_raw = meta.get("duration") + if duration_raw is None: + raise ValidationError("fleet profile missing 'meta.duration'") + duration = parse_duration(duration_raw) + + interval_raw = meta.get("interval") + if interval_raw is None: + raise ValidationError("fleet profile missing 'meta.interval'") + interval = parse_duration(interval_raw) + + hostname_prefix = meta.get("hostname_prefix") + if not hostname_prefix: + raise ValidationError("fleet profile missing 'meta.hostname_prefix'") + + hardware = meta.get("hardware") + if not hardware: + raise ValidationError("fleet profile missing 'meta.hardware'") + + return FleetMeta( + name=str(name), + duration=duration, + interval=interval, + hostname_prefix=str(hostname_prefix), + hardware=str(hardware), + ) + + +def _parse_hosts(raw: Dict[str, Any], fleet_dir: Path) -> HostsConfig: + """Parse and validate the hosts section of a fleet profile.""" + hosts = raw.get("hosts") + if not isinstance(hosts, dict): + raise ValidationError("fleet profile missing 'hosts' section") + + count = hosts.get("count") + if not isinstance(count, int) or count < 1: + raise ValidationError("hosts.count must be a positive integer") + + baseline = hosts.get("baseline") + if not baseline: + raise ValidationError("hosts.baseline is required") + + jitter = float(hosts.get("jitter", 0.0)) + baseline_path = fleet_dir / str(baseline) + + return HostsConfig( + count=count, + baseline=str(baseline), + baseline_path=baseline_path, + jitter=jitter, + ) + + +def _parse_bad_actors( + raw: Dict[str, Any], + hosts_config: HostsConfig, + fleet_dir: Path, +) -> BadActorsConfig: + """Parse and validate the bad_actors section of a fleet profile.""" + section = raw.get("bad_actors") + if section is None: + return BadActorsConfig() + + if not isinstance(section, dict): + raise ValidationError("bad_actors must be a mapping") + + count = int(section.get("count", 0)) + if count > hosts_config.count: + raise ValidationError( + "bad_actors.count ({}) exceeds hosts.count ({})".format( + count, hosts_config.count + ) + ) + + # Default bad_actors jitter to hosts jitter if not specified + jitter_raw = section.get("jitter") + if jitter_raw is not None: + jitter = float(jitter_raw) + else: + jitter = hosts_config.jitter + + profiles_raw = section.get("profiles", []) + profiles = [str(p) for p in profiles_raw] + profile_paths = [fleet_dir / p for p in profiles] + + return BadActorsConfig( + count=count, + jitter=jitter, + profiles=profiles, + profile_paths=profile_paths, + ) + + +def load_fleet_profile(path: Path) -> FleetProfile: + """Load and validate a fleet profile YAML file. + + Workload paths (baseline, bad-actor profiles) are resolved relative + to the directory containing the fleet YAML file. + """ + text = path.read_text() + raw = yaml.safe_load(text) + if not isinstance(raw, dict): + raise ValidationError("fleet profile must be a YAML mapping") + + fleet_dir = path.parent + + meta = _parse_fleet_meta(raw) + hosts = _parse_hosts(raw, fleet_dir) + bad_actors = _parse_bad_actors(raw, hosts, fleet_dir) + + return FleetProfile(meta=meta, hosts=hosts, bad_actors=bad_actors) + + +def _stable_host_seed(fleet_name: str, hostname: str, seed: int) -> int: + """Derive a deterministic per-host seed using SHA-256. + + Python's hash() is not stable across runs. SHA-256 gives us + repeatable results for any given (fleet_name, hostname, seed) tuple. + """ + digest = hashlib.sha256( + "{}:{}:{}".format(fleet_name, hostname, seed).encode() + ).hexdigest() + return int(digest[:16], 16) + + +def assign_hosts( + fleet: FleetProfile, + seed: Optional[int] = None, +) -> List[HostAssignment]: + """Assign hostnames, roles, and jitter factors to each host. + + Uses a seeded RNG for deterministic bad-actor selection and jitter + factor generation. If seed is None, a random seed is chosen. + """ + if seed is None: + seed = random.randint(0, 2**32 - 1) + + rng = random.Random(seed) + count = fleet.hosts.count + pad_width = max(2, len(str(count))) + + # Pick which host indices are bad actors + bad_actor_indices = set(rng.sample(range(count), fleet.bad_actors.count)) + + assignments: List[HostAssignment] = [] + for i in range(count): + hostname = "{}-{}".format( + fleet.meta.hostname_prefix, + str(i + 1).zfill(pad_width), + ) + is_bad = i in bad_actor_indices + + if is_bad: + role = "bad_actor" + jitter_stddev = fleet.bad_actors.jitter + # Pick a profile from the bad-actor pool + profile_idx = rng.randrange(len(fleet.bad_actors.profiles)) + workload_path = fleet.bad_actors.profile_paths[profile_idx] + else: + role = "baseline" + jitter_stddev = fleet.hosts.jitter + workload_path = fleet.hosts.baseline_path + + # Generate a stable, deterministic jitter factor per host + host_seed = _stable_host_seed(fleet.meta.name, hostname, seed) + host_rng = random.Random(host_seed) + jitter_factor = host_rng.gauss(1.0, jitter_stddev) + + assignments.append( + HostAssignment( + hostname=hostname, + role=role, + jitter_factor=jitter_factor, + workload_path=workload_path, + ) + ) + + return assignments diff --git a/tests/fixtures/fleet/test-fleet.yaml b/tests/fixtures/fleet/test-fleet.yaml new file mode 100644 index 0000000..1b9dba2 --- /dev/null +++ b/tests/fixtures/fleet/test-fleet.yaml @@ -0,0 +1,17 @@ +meta: + name: test-fleet + duration: 600 + interval: 60 + hostname_prefix: host + hardware: generic-small + +hosts: + count: 5 + baseline: baseline.yaml + jitter: 0.05 + +bad_actors: + count: 1 + jitter: 0.15 + profiles: + - bad-cpu.yaml diff --git a/tests/unit/test_fleet.py b/tests/unit/test_fleet.py new file mode 100644 index 0000000..dbea4c8 --- /dev/null +++ b/tests/unit/test_fleet.py @@ -0,0 +1,214 @@ +"""Unit tests for fleet profile loading and host assignment.""" + +from pathlib import Path + +import pytest + +FLEET_FIXTURES = Path(__file__).parent.parent / "fixtures" / "fleet" + + +class TestLoadFleetProfile: + """Tests for load_fleet_profile YAML parsing.""" + + def test_loads_valid_fleet_profile(self) -> None: + from pmlogsynth.fleet import load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assert fleet.meta.name == "test-fleet" + assert fleet.meta.duration == 600 + assert fleet.meta.interval == 60 + assert fleet.meta.hostname_prefix == "host" + assert fleet.meta.hardware == "generic-small" + assert fleet.hosts.count == 5 + assert fleet.hosts.jitter == 0.05 + assert fleet.bad_actors.count == 1 + assert fleet.bad_actors.jitter == 0.15 + assert len(fleet.bad_actors.profiles) == 1 + + def test_missing_meta_name_raises(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import load_fleet_profile + from pmlogsynth.profile import ValidationError + + (tmp_path / "bad.yaml").write_text( + "meta:\n duration: 600\n interval: 60\n" + " hostname_prefix: x\n hardware: generic-small\n" + "hosts:\n count: 1\n baseline: x.yaml\n" + ) + with pytest.raises(ValidationError, match="meta.name"): + load_fleet_profile(tmp_path / "bad.yaml") + + def test_missing_hosts_raises(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import load_fleet_profile + from pmlogsynth.profile import ValidationError + + (tmp_path / "bad.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: x\n hardware: generic-small\n" + ) + with pytest.raises(ValidationError, match="hosts"): + load_fleet_profile(tmp_path / "bad.yaml") + + def test_bad_actors_count_exceeds_host_count_raises(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import load_fleet_profile + from pmlogsynth.profile import ValidationError + + (tmp_path / "bad.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: x\n hardware: generic-small\n" + "hosts:\n count: 2\n baseline: x.yaml\n" + "bad_actors:\n count: 3\n profiles:\n - y.yaml\n" + ) + with pytest.raises(ValidationError, match="bad_actors.count"): + load_fleet_profile(tmp_path / "bad.yaml") + + def test_bad_actors_defaults_jitter_to_hosts_jitter(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import load_fleet_profile + + (tmp_path / "f.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: x\n hardware: generic-small\n" + "hosts:\n count: 3\n baseline: x.yaml\n jitter: 0.08\n" + "bad_actors:\n count: 1\n profiles:\n - y.yaml\n" + ) + fleet = load_fleet_profile(tmp_path / "f.yaml") + assert fleet.bad_actors.jitter == 0.08 + + def test_no_bad_actors_section_is_valid(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import load_fleet_profile + + (tmp_path / "f.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: x\n hardware: generic-small\n" + "hosts:\n count: 3\n baseline: x.yaml\n" + ) + fleet = load_fleet_profile(tmp_path / "f.yaml") + assert fleet.bad_actors.count == 0 + assert fleet.bad_actors.profiles == [] + + def test_duration_accepts_duration_strings(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import load_fleet_profile + + (tmp_path / "f.yaml").write_text( + "meta:\n name: x\n duration: 24h\n interval: 15s\n" + " hostname_prefix: x\n hardware: generic-small\n" + "hosts:\n count: 1\n baseline: x.yaml\n" + ) + fleet = load_fleet_profile(tmp_path / "f.yaml") + assert fleet.meta.duration == 86400 + assert fleet.meta.interval == 15 + + def test_workload_paths_resolved_relative_to_fleet_file(self) -> None: + from pmlogsynth.fleet import load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assert fleet.hosts.baseline_path.exists() + assert fleet.hosts.baseline_path.name == "baseline.yaml" + + +class TestAssignHosts: + """Tests for host assignment with random bad-actor selection.""" + + def test_correct_total_count(self) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + assert len(assignments) == 5 + + def test_correct_bad_actor_count(self) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + bad = [a for a in assignments if a.role == "bad_actor"] + assert len(bad) == 1 + + def test_hostnames_zero_padded(self) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + hostnames = [a.hostname for a in assignments] + assert hostnames == ["host-01", "host-02", "host-03", "host-04", "host-05"] + + def test_seed_produces_deterministic_assignments(self) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + a1 = assign_hosts(fleet, seed=42) + a2 = assign_hosts(fleet, seed=42) + assert [a.hostname for a in a1 if a.role == "bad_actor"] == \ + [a.hostname for a in a2 if a.role == "bad_actor"] + assert [a.jitter_factor for a in a1] == [a.jitter_factor for a in a2] + + def test_different_seeds_produce_different_assignments(self) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + a1 = assign_hosts(fleet, seed=42) + a2 = assign_hosts(fleet, seed=99) + factors1 = [a.jitter_factor for a in a1] + factors2 = [a.jitter_factor for a in a2] + assert factors1 != factors2 + + def test_bad_actor_gets_bad_actor_jitter_stddev(self) -> None: + """Bad actor jitter factors should use bad_actors.jitter, not hosts.jitter.""" + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + import statistics + + bad_factors = [] + baseline_factors = [] + for seed in range(100): + assignments = assign_hosts(fleet, seed=seed) + for a in assignments: + if a.role == "bad_actor": + bad_factors.append(a.jitter_factor) + else: + baseline_factors.append(a.jitter_factor) + + bad_std = statistics.stdev(bad_factors) + baseline_std = statistics.stdev(baseline_factors) + assert bad_std > baseline_std * 1.5 + + def test_no_bad_actors_all_baseline(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + (tmp_path / "f.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: srv\n hardware: generic-small\n" + "hosts:\n count: 3\n baseline: x.yaml\n" + ) + fleet = load_fleet_profile(tmp_path / "f.yaml") + assignments = assign_hosts(fleet, seed=1) + assert all(a.role == "baseline" for a in assignments) + + def test_none_seed_produces_assignments(self) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=None) + assert len(assignments) == 5 + + def test_zero_pad_width_scales_with_count(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + (tmp_path / "f.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: srv\n hardware: generic-small\n" + "hosts:\n count: 100\n baseline: x.yaml\n" + ) + fleet = load_fleet_profile(tmp_path / "f.yaml") + assignments = assign_hosts(fleet, seed=1) + assert assignments[0].hostname == "srv-001" + assert assignments[99].hostname == "srv-100" + + def test_bad_actor_profiles_selected_from_pool(self) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + bad = [a for a in assignments if a.role == "bad_actor"] + for b in bad: + assert b.workload_path.name in ("bad-cpu.yaml",) From 493730ced631bc7354eae054e8eabe1b8e3d841d Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Fri, 20 Mar 2026 05:37:51 +0000 Subject: [PATCH 07/23] Add fleet manifest writer and override warnings YAML manifest records all host assignments with roles and jitter factors. Override warnings emitted once per unique workload profile. --- pmlogsynth/fleet.py | 101 +++++++++++++++++++++++++++++++++++++++ tests/unit/test_fleet.py | 86 +++++++++++++++++++++++++++++++++ 2 files changed, 187 insertions(+) diff --git a/pmlogsynth/fleet.py b/pmlogsynth/fleet.py index 39b8cbf..75d7e32 100644 --- a/pmlogsynth/fleet.py +++ b/pmlogsynth/fleet.py @@ -1,8 +1,10 @@ """Fleet profile loading, validation, and host assignment.""" import hashlib +import logging import random from dataclasses import dataclass, field +from datetime import datetime, timezone from pathlib import Path from typing import Any, Dict, List, Optional @@ -10,6 +12,8 @@ from pmlogsynth.profile import ValidationError, parse_duration +logger = logging.getLogger(__name__) + @dataclass class FleetMeta: @@ -59,6 +63,7 @@ class HostAssignment: role: str # "baseline" or "bad_actor" jitter_factor: float workload_path: Path + workload_rel: str = "" def _parse_fleet_meta(raw: Dict[str, Any]) -> FleetMeta: @@ -228,10 +233,12 @@ def assign_hosts( # Pick a profile from the bad-actor pool profile_idx = rng.randrange(len(fleet.bad_actors.profiles)) workload_path = fleet.bad_actors.profile_paths[profile_idx] + workload_rel = fleet.bad_actors.profiles[profile_idx] else: role = "baseline" jitter_stddev = fleet.hosts.jitter workload_path = fleet.hosts.baseline_path + workload_rel = fleet.hosts.baseline # Generate a stable, deterministic jitter factor per host host_seed = _stable_host_seed(fleet.meta.name, hostname, seed) @@ -244,7 +251,101 @@ def assign_hosts( role=role, jitter_factor=jitter_factor, workload_path=workload_path, + workload_rel=workload_rel, ) ) return assignments + + +def write_manifest( + path: Path, + fleet: FleetProfile, + assignments: List[HostAssignment], + seed: Optional[int], +) -> None: + """Write fleet.manifest YAML file recording all host assignments.""" + now = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") + + manifest = { + "meta": { + "name": fleet.meta.name, + "generated": now, + "pmlogsynth_version": "1.0", + "seed": seed, + "duration": fleet.meta.duration, + "interval": fleet.meta.interval, + "hardware": fleet.meta.hardware, + "host_count": len(assignments), + }, + "archives": [ + { + "hostname": a.hostname, + "profile": a.workload_rel, + "role": a.role, + "jitter_factor": round(a.jitter_factor, 6), + } + for a in assignments + ], + } + + path.write_text( + yaml.dump(manifest, default_flow_style=False, sort_keys=False), + encoding="utf-8", + ) + + +def check_override_warnings(fleet: FleetProfile) -> None: + """Emit warnings for workload profile values that fleet settings override.""" + seen = {} # type: Dict[Path, bool] + + all_paths = [fleet.hosts.baseline_path] + all_rels = [fleet.hosts.baseline] + for idx in range(len(fleet.bad_actors.profiles)): + all_paths.append(fleet.bad_actors.profile_paths[idx]) + all_rels.append(fleet.bad_actors.profiles[idx]) + + for wpath, wrel in zip(all_paths, all_rels): + if wpath in seen: + continue + seen[wpath] = True + + try: + raw = yaml.safe_load(wpath.read_text(encoding="utf-8")) + except (OSError, yaml.YAMLError): + continue + + if not isinstance(raw, dict): + continue + + meta = raw.get("meta", {}) + if not isinstance(meta, dict): + continue + + if "duration" in meta: + profile_duration = parse_duration(meta["duration"]) + if profile_duration != fleet.meta.duration: + logger.warning( + "workload profile '%s' defines duration=%s " + "— overridden by fleet setting duration=%s", + wrel, profile_duration, fleet.meta.duration, + ) + + if "interval" in meta: + profile_interval = parse_duration(meta["interval"]) + if profile_interval != fleet.meta.interval: + logger.warning( + "workload profile '%s' defines interval=%s " + "— overridden by fleet setting interval=%s", + wrel, profile_interval, fleet.meta.interval, + ) + + host = raw.get("host", {}) + if isinstance(host, dict) and "profile" in host: + profile_hw = str(host["profile"]) + if profile_hw != fleet.meta.hardware: + logger.warning( + "workload profile '%s' defines hardware=%s " + "— overridden by fleet setting hardware=%s", + wrel, profile_hw, fleet.meta.hardware, + ) diff --git a/tests/unit/test_fleet.py b/tests/unit/test_fleet.py index dbea4c8..c1a0d77 100644 --- a/tests/unit/test_fleet.py +++ b/tests/unit/test_fleet.py @@ -212,3 +212,89 @@ def test_bad_actor_profiles_selected_from_pool(self) -> None: bad = [a for a in assignments if a.role == "bad_actor"] for b in bad: assert b.workload_path.name in ("bad-cpu.yaml",) + + +class TestWriteManifest: + """Tests for fleet.manifest YAML output.""" + + def test_manifest_contains_all_hosts(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import ( + assign_hosts, + load_fleet_profile, + write_manifest, + ) + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + manifest_path = tmp_path / "fleet.manifest" + write_manifest(manifest_path, fleet, assignments, seed=42) + + import yaml as _yaml + + manifest = _yaml.safe_load(manifest_path.read_text()) + assert manifest["meta"]["name"] == "test-fleet" + assert manifest["meta"]["host_count"] == 5 + assert manifest["meta"]["seed"] == 42 + assert len(manifest["archives"]) == 5 + + def test_manifest_roles_match_assignments(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import ( + assign_hosts, + load_fleet_profile, + write_manifest, + ) + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + write_manifest(tmp_path / "fleet.manifest", fleet, assignments, seed=42) + + import yaml as _yaml + + manifest = _yaml.safe_load((tmp_path / "fleet.manifest").read_text()) + for entry, assignment in zip(manifest["archives"], assignments): + assert entry["hostname"] == assignment.hostname + assert entry["role"] == assignment.role + assert entry["jitter_factor"] == pytest.approx(assignment.jitter_factor) + + def test_manifest_records_none_seed(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import ( + assign_hosts, + load_fleet_profile, + write_manifest, + ) + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=None) + write_manifest(tmp_path / "fleet.manifest", fleet, assignments, seed=None) + + import yaml as _yaml + + manifest = _yaml.safe_load((tmp_path / "fleet.manifest").read_text()) + assert manifest["meta"]["seed"] is None + + +class TestOverrideWarnings: + """Tests for warnings when fleet settings override workload profile values.""" + + def test_warns_on_duration_conflict(self, caplog: pytest.LogCaptureFixture) -> None: + import logging + + from pmlogsynth.fleet import check_override_warnings, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + from dataclasses import replace + + fleet_different = replace(fleet, meta=replace(fleet.meta, duration=3600)) + with caplog.at_level(logging.WARNING): + check_override_warnings(fleet_different) + assert any("duration" in r.message for r in caplog.records) + + def test_no_warning_when_values_match(self, caplog: pytest.LogCaptureFixture) -> None: + import logging + + from pmlogsynth.fleet import check_override_warnings, load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + with caplog.at_level(logging.WARNING): + check_override_warnings(fleet) + assert not any("duration" in r.message for r in caplog.records) From 0cec02f74599eac3d17de7774e3e18108322128d Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Fri, 20 Mar 2026 05:41:22 +0000 Subject: [PATCH 08/23] Add fleet generation orchestrator and dry-run output Loops over host assignments, loads workload profiles with fleet-level overrides (hostname, duration, interval, hardware), applies jitter, and calls ArchiveWriter per host with optional parallelism via --jobs. --- pmlogsynth/fleet.py | 110 ++++++++++++- tests/integration/test_fleet_integration.py | 165 ++++++++++++++++++++ tests/unit/test_fleet.py | 20 +++ 3 files changed, 294 insertions(+), 1 deletion(-) create mode 100644 tests/integration/test_fleet_integration.py diff --git a/pmlogsynth/fleet.py b/pmlogsynth/fleet.py index 75d7e32..c201987 100644 --- a/pmlogsynth/fleet.py +++ b/pmlogsynth/fleet.py @@ -1,9 +1,11 @@ """Fleet profile loading, validation, and host assignment.""" import hashlib +import importlib import logging import random -from dataclasses import dataclass, field +import sys +from dataclasses import dataclass, field, replace from datetime import datetime, timezone from pathlib import Path from typing import Any, Dict, List, Optional @@ -349,3 +351,109 @@ def check_override_warnings(fleet: FleetProfile) -> None: "— overridden by fleet setting hardware=%s", wrel, profile_hw, fleet.meta.hardware, ) + + +def print_dry_run( + fleet: FleetProfile, + assignments: List[HostAssignment], + seed: Optional[int], +) -> None: + """Print host assignment table without generating archives.""" + seed_str = str(seed) if seed is not None else "none" + print("Fleet: {} ({} hosts, seed={})".format( + fleet.meta.name, len(assignments), seed_str, + )) + print() + for a in assignments: + role_label = "BAD " if a.role == "bad_actor" else "baseline " + print(" {} {} {} (jitter: x{:.2f})".format( + a.hostname, role_label, a.workload_rel, a.jitter_factor, + )) + + +def generate_fleet( + fleet: FleetProfile, + assignments: List[HostAssignment], + output_dir: Path, + seed: Optional[int], + jobs: int = 1, + force: bool = False, + start: Optional[datetime] = None, + verbose: bool = False, + config_dir: Optional[Path] = None, +) -> None: + """Generate one PCP archive per host, then write fleet.manifest.""" + output_dir = Path(output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + + # Lazy import writer module (avoid PCP dependency at parse time) + _writer_mod = importlib.import_module("pmlogsynth.writer") + ArchiveWriter = _writer_mod.ArchiveWriter + + from pmlogsynth.jitter import apply_jitter + from pmlogsynth.profile import ProfileResolver, WorkloadProfile + from pmlogsynth.sampler import ValueSampler + from pmlogsynth.timeline import TimelineSequencer + + # Resolve hardware profile once (shared across all hosts) + resolver = ProfileResolver(config_dir=config_dir) + hardware = resolver.resolve(fleet.meta.hardware) + + # Check for override warnings (once, before generation loop) + check_override_warnings(fleet) + + def _generate_one(assignment: HostAssignment) -> None: + """Generate a single host archive.""" + profile_text = assignment.workload_path.read_text(encoding="utf-8") + profile = WorkloadProfile.from_string( + profile_text, config_dir=config_dir, + ) + + overridden_meta = replace( + profile.meta, + hostname=assignment.hostname, + duration=fleet.meta.duration, + interval=fleet.meta.interval, + ) + profile = replace(profile, meta=overridden_meta, hardware=hardware) + + profile = apply_jitter(profile, assignment.jitter_factor) + + timeline = TimelineSequencer(profile).expand(start_time=start) + sampler = ValueSampler(noise=profile.meta.noise) + + output_path = str(output_dir / assignment.hostname) + writer = ArchiveWriter( + output_path=output_path, + profile=profile, + hardware=hardware, + force=force, + ) + writer.write(timeline=timeline, sampler=sampler) + + if verbose: + print( + " generated: {} ({})".format( + assignment.hostname, assignment.role, + ), + file=sys.stderr, + ) + + # Generate archives — ThreadPoolExecutor for --jobs>1. + if jobs <= 1: + for assignment in assignments: + _generate_one(assignment) + else: + from concurrent.futures import ThreadPoolExecutor, as_completed + + with ThreadPoolExecutor(max_workers=jobs) as pool: + futures = { + pool.submit(_generate_one, a): a for a in assignments + } + for future in as_completed(futures): + future.result() + + # Write manifest + write_manifest( + output_dir / "fleet.manifest", fleet, assignments, seed=seed, + ) diff --git a/tests/integration/test_fleet_integration.py b/tests/integration/test_fleet_integration.py new file mode 100644 index 0000000..903d35d --- /dev/null +++ b/tests/integration/test_fleet_integration.py @@ -0,0 +1,165 @@ +"""Integration tests for fleet generation with mocked PCP.""" + +from pathlib import Path +from unittest.mock import MagicMock, patch + +FLEET_FIXTURES = Path(__file__).parent.parent / "fixtures" / "fleet" + + +class TestGenerateFleet: + """Tests for the fleet generation orchestrator.""" + + @patch("pmlogsynth.fleet.importlib.import_module") + def test_generates_correct_number_of_archives( + self, mock_import: MagicMock, tmp_path: Path + ) -> None: + from pmlogsynth.fleet import assign_hosts, generate_fleet, load_fleet_profile + + mock_writer_mod = MagicMock() + mock_writer_cls = MagicMock() + mock_writer_mod.ArchiveWriter = mock_writer_cls + mock_writer_mod.ArchiveConflictError = Exception + mock_writer_mod.ArchiveGenerationError = Exception + mock_import.return_value = mock_writer_mod + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + + generate_fleet( + fleet=fleet, + assignments=assignments, + output_dir=tmp_path, + seed=42, + jobs=1, + force=False, + start=None, + verbose=False, + config_dir=None, + ) + + assert mock_writer_cls.call_count == 5 + + @patch("pmlogsynth.fleet.importlib.import_module") + def test_manifest_written_after_generation( + self, mock_import: MagicMock, tmp_path: Path + ) -> None: + from pmlogsynth.fleet import assign_hosts, generate_fleet, load_fleet_profile + + mock_writer_mod = MagicMock() + mock_writer_mod.ArchiveWriter = MagicMock() + mock_writer_mod.ArchiveConflictError = Exception + mock_writer_mod.ArchiveGenerationError = Exception + mock_import.return_value = mock_writer_mod + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + + generate_fleet( + fleet=fleet, + assignments=assignments, + output_dir=tmp_path, + seed=42, + jobs=1, + force=False, + start=None, + verbose=False, + config_dir=None, + ) + + assert (tmp_path / "fleet.manifest").exists() + + @patch("pmlogsynth.fleet.importlib.import_module") + def test_fleet_overrides_applied_to_profiles( + self, mock_import: MagicMock, tmp_path: Path + ) -> None: + from pmlogsynth.fleet import assign_hosts, generate_fleet, load_fleet_profile + + mock_writer_mod = MagicMock() + mock_writer_cls = MagicMock() + mock_writer_mod.ArchiveWriter = mock_writer_cls + mock_writer_mod.ArchiveConflictError = Exception + mock_writer_mod.ArchiveGenerationError = Exception + mock_import.return_value = mock_writer_mod + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + + generate_fleet( + fleet=fleet, + assignments=assignments, + output_dir=tmp_path, + seed=42, + jobs=1, + force=False, + start=None, + verbose=False, + config_dir=None, + ) + + for call_args, assignment in zip(mock_writer_cls.call_args_list, assignments): + # ArchiveWriter is called with keyword args + profile = call_args[1]["profile"] if "profile" in call_args[1] else call_args[0][1] + assert profile.meta.hostname == assignment.hostname + assert profile.meta.duration == fleet.meta.duration + assert profile.meta.interval == fleet.meta.interval + + @patch("pmlogsynth.fleet.importlib.import_module") + def test_output_directory_created( + self, mock_import: MagicMock, tmp_path: Path + ) -> None: + from pmlogsynth.fleet import assign_hosts, generate_fleet, load_fleet_profile + + mock_writer_mod = MagicMock() + mock_writer_mod.ArchiveWriter = MagicMock() + mock_writer_mod.ArchiveConflictError = Exception + mock_writer_mod.ArchiveGenerationError = Exception + mock_import.return_value = mock_writer_mod + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + out = tmp_path / "nested" / "output" + + generate_fleet( + fleet=fleet, + assignments=assignments, + output_dir=out, + seed=42, + jobs=1, + force=False, + start=None, + verbose=False, + config_dir=None, + ) + + assert out.exists() + + @patch("pmlogsynth.fleet.importlib.import_module") + def test_parallel_jobs_generates_all_archives( + self, mock_import: MagicMock, tmp_path: Path + ) -> None: + from pmlogsynth.fleet import assign_hosts, generate_fleet, load_fleet_profile + + mock_writer_mod = MagicMock() + mock_writer_cls = MagicMock() + mock_writer_mod.ArchiveWriter = mock_writer_cls + mock_writer_mod.ArchiveConflictError = Exception + mock_writer_mod.ArchiveGenerationError = Exception + mock_import.return_value = mock_writer_mod + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + + generate_fleet( + fleet=fleet, + assignments=assignments, + output_dir=tmp_path, + seed=42, + jobs=2, + force=False, + start=None, + verbose=False, + config_dir=None, + ) + + assert mock_writer_cls.call_count == 5 + assert (tmp_path / "fleet.manifest").exists() diff --git a/tests/unit/test_fleet.py b/tests/unit/test_fleet.py index c1a0d77..529e695 100644 --- a/tests/unit/test_fleet.py +++ b/tests/unit/test_fleet.py @@ -298,3 +298,23 @@ def test_no_warning_when_values_match(self, caplog: pytest.LogCaptureFixture) -> with caplog.at_level(logging.WARNING): check_override_warnings(fleet) assert not any("duration" in r.message for r in caplog.records) + + +class TestDryRun: + """Tests for --dry-run output formatting.""" + + def test_dry_run_prints_all_hosts(self, capsys: pytest.CaptureFixture) -> None: + from pmlogsynth.fleet import assign_hosts, load_fleet_profile, print_dry_run + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assignments = assign_hosts(fleet, seed=42) + print_dry_run(fleet, assignments, seed=42) + + captured = capsys.readouterr() + assert "test-fleet" in captured.out + assert "5 hosts" in captured.out + for a in assignments: + assert a.hostname in captured.out + bad = [a for a in assignments if a.role == "bad_actor"] + for b in bad: + assert "BAD" in captured.out From 9871e9e974bfc6e93a3fd5a58cdf69a3b0e62bcd Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Fri, 20 Mar 2026 05:43:48 +0000 Subject: [PATCH 09/23] Wire fleet subcommand into CLI with validate and dry-run Replaces the Phase 3 stub with a fully functional fleet subparser. Supports --validate, --dry-run, --seed, --jobs, --force, --start. --- pmlogsynth/cli.py | 171 +++++++++++++++++++++++++++++++++++++++-- tests/unit/test_cli.py | 31 +++++++- 2 files changed, 194 insertions(+), 8 deletions(-) diff --git a/pmlogsynth/cli.py b/pmlogsynth/cli.py index 4d39993..9d951a3 100644 --- a/pmlogsynth/cli.py +++ b/pmlogsynth/cli.py @@ -1,6 +1,7 @@ """CLI entry point for pmlogsynth.""" import argparse +import os import sys from pathlib import Path from typing import List, Optional @@ -130,8 +131,13 @@ def _build_parser() -> argparse.ArgumentParser: subparsers = parser.add_subparsers(dest="subcommand") - # Reserve 'fleet' for Phase 3 - subparsers.add_parser("fleet", help=argparse.SUPPRESS) + # --- fleet subcommand --- + fleet_parser = subparsers.add_parser( + "fleet", + help="Generate a fleet of PCP archives from a fleet profile.", + add_help=True, + ) + _add_fleet_args(fleet_parser) # --- generate (default, injected by _preprocess_argv when no subcommand) --- gen = subparsers.add_parser( @@ -199,6 +205,76 @@ def _add_generate_args(p: argparse.ArgumentParser) -> None: ) +def _add_fleet_args(p: argparse.ArgumentParser) -> None: + """Add fleet-command arguments to the fleet subparser.""" + p.add_argument( + "fleet_profile", + metavar="FLEET_PROFILE", + help="Path to YAML fleet profile.", + ) + p.add_argument( + "-o", "--output-dir", + metavar="PATH", + dest="output_dir", + help=( + "Output directory for fleet archives " + "(default: ./generated-archives/fleet-)." + ), + ) + p.add_argument( + "--seed", + type=int, + default=None, + help="Random seed for deterministic host assignment.", + ) + p.add_argument( + "--jobs", + type=int, + default=os.cpu_count() or 1, + metavar="N", + help="Parallel archive generation jobs (default: cpu count).", + ) + p.add_argument( + "--dry-run", + action="store_true", + default=False, + dest="dry_run", + help="Print host assignments without generating archives.", + ) + p.add_argument( + "--force", + action="store_true", + default=False, + help="Overwrite existing archive files without error.", + ) + p.add_argument( + "--validate", + action="store_true", + default=False, + help="Validate fleet profile only; do not generate any files.", + ) + p.add_argument( + "--start", + metavar="TIMESTAMP", + help=( + "Archive start time (ISO 8601 or 'YYYY-MM-DD HH:MM:SS TZ'). " + "Default: today at 00:00:00 UTC." + ), + ) + p.add_argument( + "-v", "--verbose", + action="store_true", + default=False, + help="Print per-host generation progress to stderr.", + ) + p.add_argument( + "-C", "--config-dir", + metavar="PATH", + dest="config_dir", + help="Additional hardware profile directory (highest precedence).", + ) + + # --------------------------------------------------------------------------- # Command handlers # --------------------------------------------------------------------------- @@ -244,6 +320,92 @@ def _cmd_validate(profile_path: str, config_dir: Optional[Path]) -> int: return 2 +def _cmd_fleet(args: argparse.Namespace) -> int: + """Handle the fleet subcommand.""" + config_dir = Path(args.config_dir) if args.config_dir else None + + # Validate flag incompatibilities + if args.validate: + for flag in ("force", "dry_run"): + if getattr(args, flag, False): + print( + "error: --validate is incompatible with " + "--{}".format(flag.replace("_", "-")), + file=sys.stderr, + ) + return 1 + + # Load fleet profile + from pmlogsynth.fleet import ( + assign_hosts, + check_override_warnings, + generate_fleet, + load_fleet_profile, + print_dry_run, + ) + + try: + fleet = load_fleet_profile(Path(args.fleet_profile)) + except ValidationError as exc: + print("Validation error: {}".format(exc), file=sys.stderr) + return 1 + except OSError as exc: + print("Error reading fleet profile: {}".format(exc), file=sys.stderr) + return 2 + + # --validate mode + if args.validate: + check_override_warnings(fleet) + print("fleet profile '{}' is valid".format(fleet.meta.name)) + return 0 + + # --dry-run mode + if args.dry_run: + assignments = assign_hosts(fleet, seed=args.seed) + print_dry_run(fleet, assignments, seed=args.seed) + return 0 + + # Full generation + start_time = None + if args.start: + from pmlogsynth.time_parsing import parse_absolute_timestamp + try: + start_time = parse_absolute_timestamp(args.start, field="--start") + except ValidationError as exc: + print("error: {}".format(exc), file=sys.stderr) + return 1 + + output_dir_str = args.output_dir + if not output_dir_str: + output_dir_str = str( + Path("generated-archives") / "fleet-{}".format(fleet.meta.name) + ) + output_dir = Path(output_dir_str) + + assignments = assign_hosts(fleet, seed=args.seed) + + try: + generate_fleet( + fleet=fleet, + assignments=assignments, + output_dir=output_dir, + seed=args.seed, + jobs=args.jobs, + force=args.force, + start=start_time, + verbose=args.verbose, + config_dir=config_dir, + ) + except Exception as exc: + print("error: Fleet generation failed: {}".format(exc), file=sys.stderr) + return 3 + + print("Fleet '{}': {} archives written to {}".format( + fleet.meta.name, len(assignments), output_dir, + )) + return 0 + + def _cmd_generate(args: argparse.Namespace) -> int: config_dir = Path(args.config_dir) if args.config_dir else None @@ -401,10 +563,9 @@ def main() -> None: if hasattr(args, "config_dir") and args.config_dir: config_dir = Path(args.config_dir) - # Fleet stub + # Fleet subcommand if args.subcommand == "fleet": - print("error: fleet subcommand not yet implemented", file=sys.stderr) - sys.exit(2) + sys.exit(_cmd_fleet(args)) # Informational commands (top-level flags, checked before subcommand) if getattr(args, "show_schema", False): diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py index c40e674..2087f26 100644 --- a/tests/unit/test_cli.py +++ b/tests/unit/test_cli.py @@ -61,13 +61,38 @@ def test_parse_start_invalid_raises() -> None: def test_fleet_subcommand_exits_2(capsys: pytest.CaptureFixture) -> None: - """fleet subcommand prints 'not yet implemented' and exits 2.""" + """fleet subcommand without FLEET_PROFILE arg exits non-zero.""" with patch("sys.argv", ["pmlogsynth", "fleet"]): with pytest.raises(SystemExit) as exc_info: main() - assert exc_info.value.code == 2 + assert exc_info.value.code != 0 + + +def test_fleet_validate_exits_0_on_valid_profile(tmp_path: Path) -> None: + """fleet --validate should exit 0 for a valid fleet profile.""" + fleet_fixtures = Path(__file__).parent.parent / "fixtures" / "fleet" + with patch("sys.argv", [ + "pmlogsynth", "fleet", "--validate", + str(fleet_fixtures / "test-fleet.yaml"), + ]): + with pytest.raises(SystemExit) as exc_info: + main() + assert exc_info.value.code == 0 + + +def test_fleet_dry_run_exits_0(capsys: pytest.CaptureFixture) -> None: + """fleet --dry-run should print assignments and exit 0.""" + fleet_fixtures = Path(__file__).parent.parent / "fixtures" / "fleet" + with patch("sys.argv", [ + "pmlogsynth", "fleet", "--dry-run", "--seed", "42", + str(fleet_fixtures / "test-fleet.yaml"), + ]): + with pytest.raises(SystemExit) as exc_info: + main() + assert exc_info.value.code == 0 captured = capsys.readouterr() - assert "fleet" in captured.err.lower() or "not yet implemented" in captured.err.lower() + assert "test-fleet" in captured.out + assert "host-01" in captured.out def test_list_metrics_flag_exits_zero() -> None: From 9d15cf1ad33b7f03b6867d9b5d5559328d581e7e Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Fri, 20 Mar 2026 05:46:56 +0000 Subject: [PATCH 10/23] Document fleet mode in man page, README, and CLAUDE.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fleet subcommand is now implemented — update project docs to reflect the new capability and remove reservation notes. --- CLAUDE.md | 8 +++- README.md | 19 ++++++++ docs/profile-format.md | 104 +++++++++++++++++++++++++++++++++++++++++ man/pmlogsynth.1 | 96 +++++++++++++++++++++++++++++++++++++ 4 files changed, 226 insertions(+), 1 deletion(-) diff --git a/CLAUDE.md b/CLAUDE.md index c4a19f6..3437098 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -26,6 +26,8 @@ pmlogsynth/ # installable Python package ├── timeline.py # phase sequencer, linear interpolation, repeat expansion ├── sampler.py # ValueSampler: noise, counter accumulation ├── writer.py # pcp.pmi.pmiLogImport wrapper (isolated PCP dependency) +├── fleet.py # fleet profile loader, host assignment, orchestrator, manifest +├── jitter.py # per-host stressor jitter (pure functions, no mutation) ├── profiles/ # bundled hardware profile YAML files (7 profiles) └── domains/ ├── base.py # MetricModel abstract base @@ -65,6 +67,10 @@ pmlogsynth --validate profile.yaml pmlogsynth -o ./out profile.yaml pmlogsynth --list-profiles pmlogsynth --list-metrics + +# Fleet mode +pmlogsynth fleet fleet-profile.yml --dry-run +pmlogsynth fleet fleet-profile.yml -o ./generated-archives/my-fleet ``` ## Key Invariants @@ -77,7 +83,7 @@ pmlogsynth --list-metrics NOT `ProfileLoader`. Parsed stressor fields are `Optional` — `None` ≠ default value. - **Counter increments clamped ≥ 0**: noise must never produce negative counter deltas - **`ProfileLoader.from_file` delegates to `from_string`**: always -- **CLI uses argparse subparsers**: `fleet` is reserved; do not use as positional arg handling +- **CLI uses argparse subparsers**: `fleet` and `generate` are implemented subcommands ## Code Style diff --git a/README.md b/README.md index a5f5636..c509207 100644 --- a/README.md +++ b/README.md @@ -174,6 +174,25 @@ This is useful for replaying realistic-looking archives anchored to "now" — for example, a simulated spike that started an hour ago. Positive offsets (`+30m`) and bare `-` are rejected with a descriptive error. +### Fleet Mode + +Generate a fleet of PCP archives — one per host — from a single fleet profile. +Each host gets per-host stressor jitter for realistic variation across the fleet. + +```bash +# Preview host assignments without generating archives +pmlogsynth fleet --dry-run fleet-profile.yml + +# Generate a 20-host fleet with deterministic assignment +pmlogsynth fleet --seed 42 -o ./generated-archives/my-fleet fleet-profile.yml +``` + +The output directory contains one PCP archive per host plus a `fleet.manifest` +YAML file recording hostnames, roles, jitter factors, and the seed. + +See [`docs/profile-format.md`](docs/profile-format.md#fleet-profile-format) for +the full fleet YAML schema. + --- ## Metrics diff --git a/docs/profile-format.md b/docs/profile-format.md index b7df004..c751cd0 100644 --- a/docs/profile-format.md +++ b/docs/profile-format.md @@ -278,3 +278,107 @@ covering all four stressor domains across three phases (baseline → ramp → pe ```bash pmlogsynth -o ./generated-archives/complete-example docs/complete-example.yml ``` + +--- + +## Fleet Profile Format + +A fleet profile is a separate YAML document used with the `pmlogsynth fleet` +subcommand. It describes a fleet of hosts that share a common hardware profile, +each generating its own PCP archive with per-host stressor variation (jitter). + +Fleet profiles are **not** interchangeable with workload profiles — they have a +different schema and are passed to the `fleet` subcommand, not the default +`generate` subcommand. + +### `meta` + +Fleet-wide settings. All fields are required. + +| Field | Type | Constraints | +|-------|------|-------------| +| `name` | string | Fleet identifier; used in manifest and default output directory | +| `duration` | int or string | Positive; overrides duration in all workload profiles. Accepts `30s`, `10m`, `24h`, `1d`, `1h30m`. | +| `interval` | int or string | Positive; overrides interval in all workload profiles. Same format as duration. | +| `hostname_prefix` | string | Prefix for generated hostnames (e.g. `prod-web` → `prod-web-01`, `prod-web-02`, ...) | +| `hardware` | string | Named hardware profile; overrides `host.profile` in all workload profiles | + +### `hosts` + +Baseline host pool configuration. + +| Field | Type | Default | Constraints | +|-------|------|---------|-------------| +| `count` | integer | required | Positive; total number of hosts in the fleet | +| `baseline` | string (path) | required | Path to the baseline workload profile YAML, resolved relative to the fleet profile file | +| `jitter` | float | `0.0` | Standard deviation of the Gaussian jitter factor (mean 1.0). Higher values produce more variation between hosts. | + +### `bad_actors` + +Optional section. Designates a subset of hosts as "bad actors" running different +workload profiles (e.g. high-CPU or memory-pressure scenarios). + +| Field | Type | Default | Constraints | +|-------|------|---------|-------------| +| `count` | integer | `0` | Must not exceed `hosts.count` | +| `jitter` | float | inherits `hosts.jitter` | Per-bad-actor jitter standard deviation | +| `profiles` | list of strings (paths) | `[]` | Paths to bad-actor workload profiles, resolved relative to the fleet profile file. One is chosen at random per bad-actor host. | + +### Path resolution + +All workload profile paths (`hosts.baseline` and `bad_actors.profiles` entries) +are resolved **relative to the directory containing the fleet profile file**. +This allows fleet profiles and their workload profiles to live together in a +self-contained directory. + +### Fleet-level overrides + +The fleet `meta.duration`, `meta.interval`, and `meta.hardware` values +**override** the corresponding values in each workload profile. If the workload +profile specifies different values, a warning is emitted to stderr but +generation proceeds with the fleet-level settings. + +### Jitter semantics + +Jitter adds realistic per-host variation to stressor values. Each host receives +a multiplicative jitter factor drawn from a Gaussian distribution with +mean 1.0 and standard deviation equal to the `jitter` field. + +- A jitter of `0.0` means all hosts are identical (no variation). +- A jitter of `0.10` means most hosts vary by ±10% from the baseline. +- Ratio fields (e.g. `utilization`, `used_ratio`) are clamped to [0.0, 1.0]. +- Throughput fields are clamped to ≥ 0. + +Bad-actor hosts use `bad_actors.jitter` if specified, otherwise inherit +`hosts.jitter`. + +### Example fleet profile + +```yaml +meta: + name: prod-web + duration: 1h + interval: 60 + hostname_prefix: prod-web + hardware: generic-large + +hosts: + count: 20 + baseline: workloads/baseline.yml + jitter: 0.10 + +bad_actors: + count: 3 + jitter: 0.15 + profiles: + - workloads/cpu-spike.yml + - workloads/memory-pressure.yml +``` + +### Output + +Fleet generation produces a flat directory containing: + +- One PCP archive triplet (`.0`, `.index`, `.meta`) per host +- A `fleet.manifest` YAML file recording host assignments, roles, jitter + factors, and the seed used for reproducibility diff --git a/man/pmlogsynth.1 b/man/pmlogsynth.1 index 16aa362..3dea6d0 100644 --- a/man/pmlogsynth.1 +++ b/man/pmlogsynth.1 @@ -23,6 +23,10 @@ pmlogsynth \- generate synthetic PCP archives from declarative YAML profiles \fB\-\-validate\fR [\fB\-C\fR \fIDIR\fR] \fIPROFILE\fR +.br +.B pmlogsynth fleet +[\fIFLEET OPTIONS\fR] +\fIFLEET_PROFILE\fR .SH DESCRIPTION .B pmlogsynth reads a YAML workload profile and writes a valid PCP version 3 archive @@ -454,6 +458,88 @@ Per-interface receive/transmit error counts (cumulative counter). All per-interface instance names match .I interface.name fields in the hardware profile. +.SH FLEET SUBCOMMAND +The +.B fleet +subcommand generates a fleet of PCP archives from a single fleet profile YAML +file. +Each host in the fleet gets its own PCP archive, with per-host stressor jitter +applied for realistic variation. +A +.I fleet.manifest +YAML file is written alongside the archives recording all host assignments. +.PP +Fleet profiles are distinct from workload profiles. +A fleet profile specifies a set of hosts sharing a common hardware profile, +a baseline workload, and optional bad-actor hosts with different workload +profiles. +Workload profile paths are resolved relative to the fleet profile file. +.PP +Fleet-level +.BR meta.duration , +.BR meta.interval , +and +.B meta.hardware +override the corresponding values in individual workload profiles. +.SS Fleet Options +.TP +\fIFLEET_PROFILE\fR +Path to the fleet profile YAML file (required). +.TP +.BI \-o " PATH" ", \-\-output\-dir " PATH +Output directory for generated archives. +Default: +.IR ./generated\-archives/fleet\- . +.TP +.BI \-\-seed " N" +Random seed for deterministic host assignment and jitter. +If omitted, a random seed is chosen. +.TP +.BI \-\-jobs " N" +Number of parallel archive generation jobs (default: CPU count). +.TP +.B \-\-dry\-run +Print the host assignment table (hostnames, roles, jitter factors) +without generating any archives. +.TP +.B \-\-force +Overwrite existing archive files without error. +.TP +.B \-\-validate +Validate the fleet profile only; do not generate any files. +Exits 0 on success, 1 on validation error. +.TP +.BI \-\-start " TIMESTAMP" +Archive start time for all hosts. +Same format as the generate subcommand. +.TP +.BR \-v ", " \-\-verbose +Print per-host generation progress to stderr. +.TP +.BI \-C " DIR" ", \-\-config\-dir " DIR +Additional hardware profile directory (highest precedence). +.SS Fleet Profile Format +A fleet profile YAML file has three top-level sections: +.TP +.B meta +Fleet-wide settings: +.BR name " (string, required)," +.BR duration " (duration string, required)," +.BR interval " (duration string, required)," +.BR hostname_prefix " (string, required)," +.BR hardware " (named hardware profile, required)." +.TP +.B hosts +Baseline host pool: +.BR count " (integer, required)," +.BR baseline " (path to workload profile, required)," +.BR jitter " (float, default 0.0 \(em Gaussian noise standard deviation)." +.TP +.B bad_actors +Optional bad-actor host pool: +.BR count " (integer, default 0 \(em must not exceed hosts.count)," +.BR jitter " (float, inherits hosts.jitter if absent)," +.BR profiles " (list of paths to workload profiles)." .SH EXAMPLES .SS Generate a 10-minute archive with a CPU spike .nf @@ -484,6 +570,16 @@ pmlogsynth \-C ./tests/fixtures/profiles \-\-list\-profiles .nf pmlogsynth \-\-list\-metrics .fi +.SS Generate a 20-host fleet with deterministic assignment +.nf +pmlogsynth fleet \-\-seed 42 \-o ./generated\-archives/prod\-fleet fleet.yml +ls ./generated\-archives/prod\-fleet/ +# prod\-web\-01.0 prod\-web\-01.index ... fleet.manifest +.fi +.SS Preview fleet host assignments without generating archives +.nf +pmlogsynth fleet \-\-dry\-run fleet.yml +.fi .SH EXIT STATUS .TP .B 0 From ecad75edbb469d2d49e7c6538b7823062140ba1e Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Fri, 20 Mar 2026 10:34:59 +0000 Subject: [PATCH 11/23] Split fleet.py monolith into fleet/ package for SRP 460-line God Object decomposed into 7 focused modules: - models.py: dataclasses - loader.py: YAML parsing & validation - assignment.py: host assignment & stable seeding - orchestrator.py: archive generation - manifest.py: fleet.manifest writer - warnings.py: override conflict detection - display.py: dry-run output __init__.py re-exports all public symbols for backwards compatibility. All 510 tests pass, no consumer changes needed. --- pmlogsynth/fleet.py | 459 -------------------- pmlogsynth/fleet/__init__.py | 36 ++ pmlogsynth/fleet/assignment.py | 77 ++++ pmlogsynth/fleet/display.py | 23 + pmlogsynth/fleet/loader.py | 136 ++++++ pmlogsynth/fleet/manifest.py | 46 ++ pmlogsynth/fleet/models.py | 56 +++ pmlogsynth/fleet/orchestrator.py | 101 +++++ pmlogsynth/fleet/warnings.py | 66 +++ tests/integration/test_fleet_integration.py | 10 +- 10 files changed, 546 insertions(+), 464 deletions(-) delete mode 100644 pmlogsynth/fleet.py create mode 100644 pmlogsynth/fleet/__init__.py create mode 100644 pmlogsynth/fleet/assignment.py create mode 100644 pmlogsynth/fleet/display.py create mode 100644 pmlogsynth/fleet/loader.py create mode 100644 pmlogsynth/fleet/manifest.py create mode 100644 pmlogsynth/fleet/models.py create mode 100644 pmlogsynth/fleet/orchestrator.py create mode 100644 pmlogsynth/fleet/warnings.py diff --git a/pmlogsynth/fleet.py b/pmlogsynth/fleet.py deleted file mode 100644 index c201987..0000000 --- a/pmlogsynth/fleet.py +++ /dev/null @@ -1,459 +0,0 @@ -"""Fleet profile loading, validation, and host assignment.""" - -import hashlib -import importlib -import logging -import random -import sys -from dataclasses import dataclass, field, replace -from datetime import datetime, timezone -from pathlib import Path -from typing import Any, Dict, List, Optional - -import yaml - -from pmlogsynth.profile import ValidationError, parse_duration - -logger = logging.getLogger(__name__) - - -@dataclass -class FleetMeta: - """Top-level fleet metadata.""" - - name: str - duration: int - interval: int - hostname_prefix: str - hardware: str - - -@dataclass -class HostsConfig: - """Baseline host configuration.""" - - count: int - baseline: str - baseline_path: Path - jitter: float = 0.0 - - -@dataclass -class BadActorsConfig: - """Bad-actor host configuration.""" - - count: int = 0 - jitter: float = 0.0 - profiles: List[str] = field(default_factory=list) - profile_paths: List[Path] = field(default_factory=list) - - -@dataclass -class FleetProfile: - """Parsed fleet profile — the full fleet specification.""" - - meta: FleetMeta - hosts: HostsConfig - bad_actors: BadActorsConfig - - -@dataclass -class HostAssignment: - """One host's role, jitter factor, and workload path.""" - - hostname: str - role: str # "baseline" or "bad_actor" - jitter_factor: float - workload_path: Path - workload_rel: str = "" - - -def _parse_fleet_meta(raw: Dict[str, Any]) -> FleetMeta: - """Parse and validate the meta section of a fleet profile.""" - meta = raw.get("meta") - if not isinstance(meta, dict): - raise ValidationError("fleet profile missing 'meta' section") - - name = meta.get("name") - if not name: - raise ValidationError("fleet profile missing 'meta.name'") - - duration_raw = meta.get("duration") - if duration_raw is None: - raise ValidationError("fleet profile missing 'meta.duration'") - duration = parse_duration(duration_raw) - - interval_raw = meta.get("interval") - if interval_raw is None: - raise ValidationError("fleet profile missing 'meta.interval'") - interval = parse_duration(interval_raw) - - hostname_prefix = meta.get("hostname_prefix") - if not hostname_prefix: - raise ValidationError("fleet profile missing 'meta.hostname_prefix'") - - hardware = meta.get("hardware") - if not hardware: - raise ValidationError("fleet profile missing 'meta.hardware'") - - return FleetMeta( - name=str(name), - duration=duration, - interval=interval, - hostname_prefix=str(hostname_prefix), - hardware=str(hardware), - ) - - -def _parse_hosts(raw: Dict[str, Any], fleet_dir: Path) -> HostsConfig: - """Parse and validate the hosts section of a fleet profile.""" - hosts = raw.get("hosts") - if not isinstance(hosts, dict): - raise ValidationError("fleet profile missing 'hosts' section") - - count = hosts.get("count") - if not isinstance(count, int) or count < 1: - raise ValidationError("hosts.count must be a positive integer") - - baseline = hosts.get("baseline") - if not baseline: - raise ValidationError("hosts.baseline is required") - - jitter = float(hosts.get("jitter", 0.0)) - baseline_path = fleet_dir / str(baseline) - - return HostsConfig( - count=count, - baseline=str(baseline), - baseline_path=baseline_path, - jitter=jitter, - ) - - -def _parse_bad_actors( - raw: Dict[str, Any], - hosts_config: HostsConfig, - fleet_dir: Path, -) -> BadActorsConfig: - """Parse and validate the bad_actors section of a fleet profile.""" - section = raw.get("bad_actors") - if section is None: - return BadActorsConfig() - - if not isinstance(section, dict): - raise ValidationError("bad_actors must be a mapping") - - count = int(section.get("count", 0)) - if count > hosts_config.count: - raise ValidationError( - "bad_actors.count ({}) exceeds hosts.count ({})".format( - count, hosts_config.count - ) - ) - - # Default bad_actors jitter to hosts jitter if not specified - jitter_raw = section.get("jitter") - if jitter_raw is not None: - jitter = float(jitter_raw) - else: - jitter = hosts_config.jitter - - profiles_raw = section.get("profiles", []) - profiles = [str(p) for p in profiles_raw] - profile_paths = [fleet_dir / p for p in profiles] - - return BadActorsConfig( - count=count, - jitter=jitter, - profiles=profiles, - profile_paths=profile_paths, - ) - - -def load_fleet_profile(path: Path) -> FleetProfile: - """Load and validate a fleet profile YAML file. - - Workload paths (baseline, bad-actor profiles) are resolved relative - to the directory containing the fleet YAML file. - """ - text = path.read_text() - raw = yaml.safe_load(text) - if not isinstance(raw, dict): - raise ValidationError("fleet profile must be a YAML mapping") - - fleet_dir = path.parent - - meta = _parse_fleet_meta(raw) - hosts = _parse_hosts(raw, fleet_dir) - bad_actors = _parse_bad_actors(raw, hosts, fleet_dir) - - return FleetProfile(meta=meta, hosts=hosts, bad_actors=bad_actors) - - -def _stable_host_seed(fleet_name: str, hostname: str, seed: int) -> int: - """Derive a deterministic per-host seed using SHA-256. - - Python's hash() is not stable across runs. SHA-256 gives us - repeatable results for any given (fleet_name, hostname, seed) tuple. - """ - digest = hashlib.sha256( - "{}:{}:{}".format(fleet_name, hostname, seed).encode() - ).hexdigest() - return int(digest[:16], 16) - - -def assign_hosts( - fleet: FleetProfile, - seed: Optional[int] = None, -) -> List[HostAssignment]: - """Assign hostnames, roles, and jitter factors to each host. - - Uses a seeded RNG for deterministic bad-actor selection and jitter - factor generation. If seed is None, a random seed is chosen. - """ - if seed is None: - seed = random.randint(0, 2**32 - 1) - - rng = random.Random(seed) - count = fleet.hosts.count - pad_width = max(2, len(str(count))) - - # Pick which host indices are bad actors - bad_actor_indices = set(rng.sample(range(count), fleet.bad_actors.count)) - - assignments: List[HostAssignment] = [] - for i in range(count): - hostname = "{}-{}".format( - fleet.meta.hostname_prefix, - str(i + 1).zfill(pad_width), - ) - is_bad = i in bad_actor_indices - - if is_bad: - role = "bad_actor" - jitter_stddev = fleet.bad_actors.jitter - # Pick a profile from the bad-actor pool - profile_idx = rng.randrange(len(fleet.bad_actors.profiles)) - workload_path = fleet.bad_actors.profile_paths[profile_idx] - workload_rel = fleet.bad_actors.profiles[profile_idx] - else: - role = "baseline" - jitter_stddev = fleet.hosts.jitter - workload_path = fleet.hosts.baseline_path - workload_rel = fleet.hosts.baseline - - # Generate a stable, deterministic jitter factor per host - host_seed = _stable_host_seed(fleet.meta.name, hostname, seed) - host_rng = random.Random(host_seed) - jitter_factor = host_rng.gauss(1.0, jitter_stddev) - - assignments.append( - HostAssignment( - hostname=hostname, - role=role, - jitter_factor=jitter_factor, - workload_path=workload_path, - workload_rel=workload_rel, - ) - ) - - return assignments - - -def write_manifest( - path: Path, - fleet: FleetProfile, - assignments: List[HostAssignment], - seed: Optional[int], -) -> None: - """Write fleet.manifest YAML file recording all host assignments.""" - now = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") - - manifest = { - "meta": { - "name": fleet.meta.name, - "generated": now, - "pmlogsynth_version": "1.0", - "seed": seed, - "duration": fleet.meta.duration, - "interval": fleet.meta.interval, - "hardware": fleet.meta.hardware, - "host_count": len(assignments), - }, - "archives": [ - { - "hostname": a.hostname, - "profile": a.workload_rel, - "role": a.role, - "jitter_factor": round(a.jitter_factor, 6), - } - for a in assignments - ], - } - - path.write_text( - yaml.dump(manifest, default_flow_style=False, sort_keys=False), - encoding="utf-8", - ) - - -def check_override_warnings(fleet: FleetProfile) -> None: - """Emit warnings for workload profile values that fleet settings override.""" - seen = {} # type: Dict[Path, bool] - - all_paths = [fleet.hosts.baseline_path] - all_rels = [fleet.hosts.baseline] - for idx in range(len(fleet.bad_actors.profiles)): - all_paths.append(fleet.bad_actors.profile_paths[idx]) - all_rels.append(fleet.bad_actors.profiles[idx]) - - for wpath, wrel in zip(all_paths, all_rels): - if wpath in seen: - continue - seen[wpath] = True - - try: - raw = yaml.safe_load(wpath.read_text(encoding="utf-8")) - except (OSError, yaml.YAMLError): - continue - - if not isinstance(raw, dict): - continue - - meta = raw.get("meta", {}) - if not isinstance(meta, dict): - continue - - if "duration" in meta: - profile_duration = parse_duration(meta["duration"]) - if profile_duration != fleet.meta.duration: - logger.warning( - "workload profile '%s' defines duration=%s " - "— overridden by fleet setting duration=%s", - wrel, profile_duration, fleet.meta.duration, - ) - - if "interval" in meta: - profile_interval = parse_duration(meta["interval"]) - if profile_interval != fleet.meta.interval: - logger.warning( - "workload profile '%s' defines interval=%s " - "— overridden by fleet setting interval=%s", - wrel, profile_interval, fleet.meta.interval, - ) - - host = raw.get("host", {}) - if isinstance(host, dict) and "profile" in host: - profile_hw = str(host["profile"]) - if profile_hw != fleet.meta.hardware: - logger.warning( - "workload profile '%s' defines hardware=%s " - "— overridden by fleet setting hardware=%s", - wrel, profile_hw, fleet.meta.hardware, - ) - - -def print_dry_run( - fleet: FleetProfile, - assignments: List[HostAssignment], - seed: Optional[int], -) -> None: - """Print host assignment table without generating archives.""" - seed_str = str(seed) if seed is not None else "none" - print("Fleet: {} ({} hosts, seed={})".format( - fleet.meta.name, len(assignments), seed_str, - )) - print() - for a in assignments: - role_label = "BAD " if a.role == "bad_actor" else "baseline " - print(" {} {} {} (jitter: x{:.2f})".format( - a.hostname, role_label, a.workload_rel, a.jitter_factor, - )) - - -def generate_fleet( - fleet: FleetProfile, - assignments: List[HostAssignment], - output_dir: Path, - seed: Optional[int], - jobs: int = 1, - force: bool = False, - start: Optional[datetime] = None, - verbose: bool = False, - config_dir: Optional[Path] = None, -) -> None: - """Generate one PCP archive per host, then write fleet.manifest.""" - output_dir = Path(output_dir) - output_dir.mkdir(parents=True, exist_ok=True) - - # Lazy import writer module (avoid PCP dependency at parse time) - _writer_mod = importlib.import_module("pmlogsynth.writer") - ArchiveWriter = _writer_mod.ArchiveWriter - - from pmlogsynth.jitter import apply_jitter - from pmlogsynth.profile import ProfileResolver, WorkloadProfile - from pmlogsynth.sampler import ValueSampler - from pmlogsynth.timeline import TimelineSequencer - - # Resolve hardware profile once (shared across all hosts) - resolver = ProfileResolver(config_dir=config_dir) - hardware = resolver.resolve(fleet.meta.hardware) - - # Check for override warnings (once, before generation loop) - check_override_warnings(fleet) - - def _generate_one(assignment: HostAssignment) -> None: - """Generate a single host archive.""" - profile_text = assignment.workload_path.read_text(encoding="utf-8") - profile = WorkloadProfile.from_string( - profile_text, config_dir=config_dir, - ) - - overridden_meta = replace( - profile.meta, - hostname=assignment.hostname, - duration=fleet.meta.duration, - interval=fleet.meta.interval, - ) - profile = replace(profile, meta=overridden_meta, hardware=hardware) - - profile = apply_jitter(profile, assignment.jitter_factor) - - timeline = TimelineSequencer(profile).expand(start_time=start) - sampler = ValueSampler(noise=profile.meta.noise) - - output_path = str(output_dir / assignment.hostname) - writer = ArchiveWriter( - output_path=output_path, - profile=profile, - hardware=hardware, - force=force, - ) - writer.write(timeline=timeline, sampler=sampler) - - if verbose: - print( - " generated: {} ({})".format( - assignment.hostname, assignment.role, - ), - file=sys.stderr, - ) - - # Generate archives — ThreadPoolExecutor for --jobs>1. - if jobs <= 1: - for assignment in assignments: - _generate_one(assignment) - else: - from concurrent.futures import ThreadPoolExecutor, as_completed - - with ThreadPoolExecutor(max_workers=jobs) as pool: - futures = { - pool.submit(_generate_one, a): a for a in assignments - } - for future in as_completed(futures): - future.result() - - # Write manifest - write_manifest( - output_dir / "fleet.manifest", fleet, assignments, seed=seed, - ) diff --git a/pmlogsynth/fleet/__init__.py b/pmlogsynth/fleet/__init__.py new file mode 100644 index 0000000..c5c6511 --- /dev/null +++ b/pmlogsynth/fleet/__init__.py @@ -0,0 +1,36 @@ +"""Fleet mode — multi-host archive generation from a single fleet profile. + +All public symbols are re-exported here for backwards compatibility. +Import from submodules directly for tighter coupling: + + from pmlogsynth.fleet.loader import load_fleet_profile + from pmlogsynth.fleet.assignment import assign_hosts +""" + +from pmlogsynth.fleet.assignment import assign_hosts +from pmlogsynth.fleet.display import print_dry_run +from pmlogsynth.fleet.loader import load_fleet_profile +from pmlogsynth.fleet.manifest import write_manifest +from pmlogsynth.fleet.models import ( + BadActorsConfig, + FleetMeta, + FleetProfile, + HostAssignment, + HostsConfig, +) +from pmlogsynth.fleet.orchestrator import generate_fleet +from pmlogsynth.fleet.warnings import check_override_warnings + +__all__ = [ + "assign_hosts", + "BadActorsConfig", + "check_override_warnings", + "FleetMeta", + "FleetProfile", + "generate_fleet", + "HostAssignment", + "HostsConfig", + "load_fleet_profile", + "print_dry_run", + "write_manifest", +] diff --git a/pmlogsynth/fleet/assignment.py b/pmlogsynth/fleet/assignment.py new file mode 100644 index 0000000..ccdd736 --- /dev/null +++ b/pmlogsynth/fleet/assignment.py @@ -0,0 +1,77 @@ +"""Host assignment — role selection, jitter factors, and stable seeding.""" + +import hashlib +import random +from typing import List, Optional + +from pmlogsynth.fleet.models import FleetProfile, HostAssignment + + +def _stable_host_seed(fleet_name: str, hostname: str, seed: int) -> int: + """Derive a deterministic per-host seed using SHA-256. + + Python's hash() is not stable across runs. SHA-256 gives us + repeatable results for any given (fleet_name, hostname, seed) tuple. + """ + digest = hashlib.sha256( + "{}:{}:{}".format(fleet_name, hostname, seed).encode() + ).hexdigest() + return int(digest[:16], 16) + + +def assign_hosts( + fleet: FleetProfile, + seed: Optional[int] = None, +) -> List[HostAssignment]: + """Assign hostnames, roles, and jitter factors to each host. + + Uses a seeded RNG for deterministic bad-actor selection and jitter + factor generation. If seed is None, a random seed is chosen. + """ + if seed is None: + seed = random.randint(0, 2**32 - 1) + + rng = random.Random(seed) + count = fleet.hosts.count + pad_width = max(2, len(str(count))) + + # Pick which host indices are bad actors + bad_actor_indices = set(rng.sample(range(count), fleet.bad_actors.count)) + + assignments: List[HostAssignment] = [] + for i in range(count): + hostname = "{}-{}".format( + fleet.meta.hostname_prefix, + str(i + 1).zfill(pad_width), + ) + is_bad = i in bad_actor_indices + + if is_bad: + role = "bad_actor" + jitter_stddev = fleet.bad_actors.jitter + # Pick a profile from the bad-actor pool + profile_idx = rng.randrange(len(fleet.bad_actors.profiles)) + workload_path = fleet.bad_actors.profile_paths[profile_idx] + workload_rel = fleet.bad_actors.profiles[profile_idx] + else: + role = "baseline" + jitter_stddev = fleet.hosts.jitter + workload_path = fleet.hosts.baseline_path + workload_rel = fleet.hosts.baseline + + # Generate a stable, deterministic jitter factor per host + host_seed = _stable_host_seed(fleet.meta.name, hostname, seed) + host_rng = random.Random(host_seed) + jitter_factor = host_rng.gauss(1.0, jitter_stddev) + + assignments.append( + HostAssignment( + hostname=hostname, + role=role, + jitter_factor=jitter_factor, + workload_path=workload_path, + workload_rel=workload_rel, + ) + ) + + return assignments diff --git a/pmlogsynth/fleet/display.py b/pmlogsynth/fleet/display.py new file mode 100644 index 0000000..eece25f --- /dev/null +++ b/pmlogsynth/fleet/display.py @@ -0,0 +1,23 @@ +"""Dry-run display — print host assignment table without generating archives.""" + +from typing import List, Optional + +from pmlogsynth.fleet.models import FleetProfile, HostAssignment + + +def print_dry_run( + fleet: FleetProfile, + assignments: List[HostAssignment], + seed: Optional[int], +) -> None: + """Print host assignment table without generating archives.""" + seed_str = str(seed) if seed is not None else "none" + print("Fleet: {} ({} hosts, seed={})".format( + fleet.meta.name, len(assignments), seed_str, + )) + print() + for a in assignments: + role_label = "BAD " if a.role == "bad_actor" else "baseline " + print(" {} {} {} (jitter: x{:.2f})".format( + a.hostname, role_label, a.workload_rel, a.jitter_factor, + )) diff --git a/pmlogsynth/fleet/loader.py b/pmlogsynth/fleet/loader.py new file mode 100644 index 0000000..cc4dbaf --- /dev/null +++ b/pmlogsynth/fleet/loader.py @@ -0,0 +1,136 @@ +"""Fleet profile YAML parsing and validation.""" + +from pathlib import Path +from typing import Any, Dict + +import yaml + +from pmlogsynth.fleet.models import ( + BadActorsConfig, + FleetMeta, + FleetProfile, + HostsConfig, +) +from pmlogsynth.profile import ValidationError, parse_duration + + +def _parse_fleet_meta(raw: Dict[str, Any]) -> FleetMeta: + """Parse and validate the meta section of a fleet profile.""" + meta = raw.get("meta") + if not isinstance(meta, dict): + raise ValidationError("fleet profile missing 'meta' section") + + name = meta.get("name") + if not name: + raise ValidationError("fleet profile missing 'meta.name'") + + duration_raw = meta.get("duration") + if duration_raw is None: + raise ValidationError("fleet profile missing 'meta.duration'") + duration = parse_duration(duration_raw) + + interval_raw = meta.get("interval") + if interval_raw is None: + raise ValidationError("fleet profile missing 'meta.interval'") + interval = parse_duration(interval_raw) + + hostname_prefix = meta.get("hostname_prefix") + if not hostname_prefix: + raise ValidationError("fleet profile missing 'meta.hostname_prefix'") + + hardware = meta.get("hardware") + if not hardware: + raise ValidationError("fleet profile missing 'meta.hardware'") + + return FleetMeta( + name=str(name), + duration=duration, + interval=interval, + hostname_prefix=str(hostname_prefix), + hardware=str(hardware), + ) + + +def _parse_hosts(raw: Dict[str, Any], fleet_dir: Path) -> HostsConfig: + """Parse and validate the hosts section of a fleet profile.""" + hosts = raw.get("hosts") + if not isinstance(hosts, dict): + raise ValidationError("fleet profile missing 'hosts' section") + + count = hosts.get("count") + if not isinstance(count, int) or count < 1: + raise ValidationError("hosts.count must be a positive integer") + + baseline = hosts.get("baseline") + if not baseline: + raise ValidationError("hosts.baseline is required") + + jitter = float(hosts.get("jitter", 0.0)) + baseline_path = fleet_dir / str(baseline) + + return HostsConfig( + count=count, + baseline=str(baseline), + baseline_path=baseline_path, + jitter=jitter, + ) + + +def _parse_bad_actors( + raw: Dict[str, Any], + hosts_config: HostsConfig, + fleet_dir: Path, +) -> BadActorsConfig: + """Parse and validate the bad_actors section of a fleet profile.""" + section = raw.get("bad_actors") + if section is None: + return BadActorsConfig() + + if not isinstance(section, dict): + raise ValidationError("bad_actors must be a mapping") + + count = int(section.get("count", 0)) + if count > hosts_config.count: + raise ValidationError( + "bad_actors.count ({}) exceeds hosts.count ({})".format( + count, hosts_config.count + ) + ) + + # Default bad_actors jitter to hosts jitter if not specified + jitter_raw = section.get("jitter") + if jitter_raw is not None: + jitter = float(jitter_raw) + else: + jitter = hosts_config.jitter + + profiles_raw = section.get("profiles", []) + profiles = [str(p) for p in profiles_raw] + profile_paths = [fleet_dir / p for p in profiles] + + return BadActorsConfig( + count=count, + jitter=jitter, + profiles=profiles, + profile_paths=profile_paths, + ) + + +def load_fleet_profile(path: Path) -> FleetProfile: + """Load and validate a fleet profile YAML file. + + Workload paths (baseline, bad-actor profiles) are resolved relative + to the directory containing the fleet YAML file. + """ + text = path.read_text() + raw = yaml.safe_load(text) + if not isinstance(raw, dict): + raise ValidationError("fleet profile must be a YAML mapping") + + fleet_dir = path.parent + + meta = _parse_fleet_meta(raw) + hosts = _parse_hosts(raw, fleet_dir) + bad_actors = _parse_bad_actors(raw, hosts, fleet_dir) + + return FleetProfile(meta=meta, hosts=hosts, bad_actors=bad_actors) diff --git a/pmlogsynth/fleet/manifest.py b/pmlogsynth/fleet/manifest.py new file mode 100644 index 0000000..94f729e --- /dev/null +++ b/pmlogsynth/fleet/manifest.py @@ -0,0 +1,46 @@ +"""Fleet manifest writer — records host assignments to YAML.""" + +from datetime import datetime, timezone +from pathlib import Path +from typing import List, Optional + +import yaml + +from pmlogsynth.fleet.models import FleetProfile, HostAssignment + + +def write_manifest( + path: Path, + fleet: FleetProfile, + assignments: List[HostAssignment], + seed: Optional[int], +) -> None: + """Write fleet.manifest YAML file recording all host assignments.""" + now = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") + + manifest = { + "meta": { + "name": fleet.meta.name, + "generated": now, + "pmlogsynth_version": "1.0", + "seed": seed, + "duration": fleet.meta.duration, + "interval": fleet.meta.interval, + "hardware": fleet.meta.hardware, + "host_count": len(assignments), + }, + "archives": [ + { + "hostname": a.hostname, + "profile": a.workload_rel, + "role": a.role, + "jitter_factor": round(a.jitter_factor, 6), + } + for a in assignments + ], + } + + path.write_text( + yaml.dump(manifest, default_flow_style=False, sort_keys=False), + encoding="utf-8", + ) diff --git a/pmlogsynth/fleet/models.py b/pmlogsynth/fleet/models.py new file mode 100644 index 0000000..a611a63 --- /dev/null +++ b/pmlogsynth/fleet/models.py @@ -0,0 +1,56 @@ +"""Fleet data models — pure dataclasses, no logic.""" + +from dataclasses import dataclass, field +from pathlib import Path +from typing import List + + +@dataclass +class FleetMeta: + """Top-level fleet metadata.""" + + name: str + duration: int + interval: int + hostname_prefix: str + hardware: str + + +@dataclass +class HostsConfig: + """Baseline host configuration.""" + + count: int + baseline: str + baseline_path: Path + jitter: float = 0.0 + + +@dataclass +class BadActorsConfig: + """Bad-actor host configuration.""" + + count: int = 0 + jitter: float = 0.0 + profiles: List[str] = field(default_factory=list) + profile_paths: List[Path] = field(default_factory=list) + + +@dataclass +class FleetProfile: + """Parsed fleet profile — the full fleet specification.""" + + meta: FleetMeta + hosts: HostsConfig + bad_actors: BadActorsConfig + + +@dataclass +class HostAssignment: + """One host's role, jitter factor, and workload path.""" + + hostname: str + role: str # "baseline" or "bad_actor" + jitter_factor: float + workload_path: Path + workload_rel: str = "" diff --git a/pmlogsynth/fleet/orchestrator.py b/pmlogsynth/fleet/orchestrator.py new file mode 100644 index 0000000..fffa9fc --- /dev/null +++ b/pmlogsynth/fleet/orchestrator.py @@ -0,0 +1,101 @@ +"""Fleet archive generation orchestrator.""" + +import importlib +import sys +from datetime import datetime +from pathlib import Path +from typing import List, Optional + +from pmlogsynth.fleet.manifest import write_manifest +from pmlogsynth.fleet.models import FleetProfile, HostAssignment +from pmlogsynth.fleet.warnings import check_override_warnings + + +def generate_fleet( + fleet: FleetProfile, + assignments: List[HostAssignment], + output_dir: Path, + seed: Optional[int], + jobs: int = 1, + force: bool = False, + start: Optional[datetime] = None, + verbose: bool = False, + config_dir: Optional[Path] = None, +) -> None: + """Generate one PCP archive per host, then write fleet.manifest.""" + output_dir = Path(output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + + # Lazy import writer module (avoid PCP dependency at parse time) + _writer_mod = importlib.import_module("pmlogsynth.writer") + ArchiveWriter = _writer_mod.ArchiveWriter + + from dataclasses import replace + + from pmlogsynth.jitter import apply_jitter + from pmlogsynth.profile import ProfileResolver, WorkloadProfile + from pmlogsynth.sampler import ValueSampler + from pmlogsynth.timeline import TimelineSequencer + + # Resolve hardware profile once (shared across all hosts) + resolver = ProfileResolver(config_dir=config_dir) + hardware = resolver.resolve(fleet.meta.hardware) + + # Check for override warnings (once, before generation loop) + check_override_warnings(fleet) + + def _generate_one(assignment: HostAssignment) -> None: + """Generate a single host archive.""" + profile_text = assignment.workload_path.read_text(encoding="utf-8") + profile = WorkloadProfile.from_string( + profile_text, config_dir=config_dir, + ) + + overridden_meta = replace( + profile.meta, + hostname=assignment.hostname, + duration=fleet.meta.duration, + interval=fleet.meta.interval, + ) + profile = replace(profile, meta=overridden_meta, hardware=hardware) + + profile = apply_jitter(profile, assignment.jitter_factor) + + timeline = TimelineSequencer(profile).expand(start_time=start) + sampler = ValueSampler(noise=profile.meta.noise) + + output_path = str(output_dir / assignment.hostname) + writer = ArchiveWriter( + output_path=output_path, + profile=profile, + hardware=hardware, + force=force, + ) + writer.write(timeline=timeline, sampler=sampler) + + if verbose: + print( + " generated: {} ({})".format( + assignment.hostname, assignment.role, + ), + file=sys.stderr, + ) + + # Generate archives — ThreadPoolExecutor for --jobs>1. + if jobs <= 1: + for assignment in assignments: + _generate_one(assignment) + else: + from concurrent.futures import ThreadPoolExecutor, as_completed + + with ThreadPoolExecutor(max_workers=jobs) as pool: + futures = { + pool.submit(_generate_one, a): a for a in assignments + } + for future in as_completed(futures): + future.result() + + # Write manifest + write_manifest( + output_dir / "fleet.manifest", fleet, assignments, seed=seed, + ) diff --git a/pmlogsynth/fleet/warnings.py b/pmlogsynth/fleet/warnings.py new file mode 100644 index 0000000..ae3d2b1 --- /dev/null +++ b/pmlogsynth/fleet/warnings.py @@ -0,0 +1,66 @@ +"""Override warning checks for fleet vs workload profile conflicts.""" + +import logging + +import yaml + +from pmlogsynth.fleet.models import FleetProfile +from pmlogsynth.profile import parse_duration + +logger = logging.getLogger(__name__) + + +def check_override_warnings(fleet: FleetProfile) -> None: + """Emit warnings for workload profile values that fleet settings override.""" + seen = set() + + all_paths = [fleet.hosts.baseline_path] + all_rels = [fleet.hosts.baseline] + for idx in range(len(fleet.bad_actors.profiles)): + all_paths.append(fleet.bad_actors.profile_paths[idx]) + all_rels.append(fleet.bad_actors.profiles[idx]) + + for wpath, wrel in zip(all_paths, all_rels): + if wpath in seen: + continue + seen.add(wpath) + + try: + raw = yaml.safe_load(wpath.read_text(encoding="utf-8")) + except (OSError, yaml.YAMLError): + continue + + if not isinstance(raw, dict): + continue + + meta = raw.get("meta", {}) + if not isinstance(meta, dict): + continue + + if "duration" in meta: + profile_duration = parse_duration(meta["duration"]) + if profile_duration != fleet.meta.duration: + logger.warning( + "workload profile '%s' defines duration=%s " + "— overridden by fleet setting duration=%s", + wrel, profile_duration, fleet.meta.duration, + ) + + if "interval" in meta: + profile_interval = parse_duration(meta["interval"]) + if profile_interval != fleet.meta.interval: + logger.warning( + "workload profile '%s' defines interval=%s " + "— overridden by fleet setting interval=%s", + wrel, profile_interval, fleet.meta.interval, + ) + + host = raw.get("host", {}) + if isinstance(host, dict) and "profile" in host: + profile_hw = str(host["profile"]) + if profile_hw != fleet.meta.hardware: + logger.warning( + "workload profile '%s' defines hardware=%s " + "— overridden by fleet setting hardware=%s", + wrel, profile_hw, fleet.meta.hardware, + ) diff --git a/tests/integration/test_fleet_integration.py b/tests/integration/test_fleet_integration.py index 903d35d..52105b0 100644 --- a/tests/integration/test_fleet_integration.py +++ b/tests/integration/test_fleet_integration.py @@ -9,7 +9,7 @@ class TestGenerateFleet: """Tests for the fleet generation orchestrator.""" - @patch("pmlogsynth.fleet.importlib.import_module") + @patch("pmlogsynth.fleet.orchestrator.importlib.import_module") def test_generates_correct_number_of_archives( self, mock_import: MagicMock, tmp_path: Path ) -> None: @@ -39,7 +39,7 @@ def test_generates_correct_number_of_archives( assert mock_writer_cls.call_count == 5 - @patch("pmlogsynth.fleet.importlib.import_module") + @patch("pmlogsynth.fleet.orchestrator.importlib.import_module") def test_manifest_written_after_generation( self, mock_import: MagicMock, tmp_path: Path ) -> None: @@ -68,7 +68,7 @@ def test_manifest_written_after_generation( assert (tmp_path / "fleet.manifest").exists() - @patch("pmlogsynth.fleet.importlib.import_module") + @patch("pmlogsynth.fleet.orchestrator.importlib.import_module") def test_fleet_overrides_applied_to_profiles( self, mock_import: MagicMock, tmp_path: Path ) -> None: @@ -103,7 +103,7 @@ def test_fleet_overrides_applied_to_profiles( assert profile.meta.duration == fleet.meta.duration assert profile.meta.interval == fleet.meta.interval - @patch("pmlogsynth.fleet.importlib.import_module") + @patch("pmlogsynth.fleet.orchestrator.importlib.import_module") def test_output_directory_created( self, mock_import: MagicMock, tmp_path: Path ) -> None: @@ -133,7 +133,7 @@ def test_output_directory_created( assert out.exists() - @patch("pmlogsynth.fleet.importlib.import_module") + @patch("pmlogsynth.fleet.orchestrator.importlib.import_module") def test_parallel_jobs_generates_all_archives( self, mock_import: MagicMock, tmp_path: Path ) -> None: From 04be796a80ec13c6551ba6645f7b4efd8e2880e8 Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Fri, 20 Mar 2026 10:37:31 +0000 Subject: [PATCH 12/23] Add Claude Code skills for profile and fleet generation Replace old /generate-profile slash command with proper SKILL.md skills that bundle the full schema as reference context. New fleet skill enables natural-language generation of multi-host fleet profiles with bad actors. --- .claude/commands/generate-profile.md | 85 ---- .../skills/generate-fleet-profile/SKILL.md | 163 ++++++++ .../references/fleet-schema.md | 194 +++++++++ .../references/workload-profile-schema.md | 395 ++++++++++++++++++ .claude/skills/generate-profile/SKILL.md | 107 +++++ .../references/profile-schema.md | 395 ++++++++++++++++++ README.md | 26 +- 7 files changed, 1274 insertions(+), 91 deletions(-) delete mode 100644 .claude/commands/generate-profile.md create mode 100644 .claude/skills/generate-fleet-profile/SKILL.md create mode 100644 .claude/skills/generate-fleet-profile/references/fleet-schema.md create mode 100644 .claude/skills/generate-fleet-profile/references/workload-profile-schema.md create mode 100644 .claude/skills/generate-profile/SKILL.md create mode 100644 .claude/skills/generate-profile/references/profile-schema.md diff --git a/.claude/commands/generate-profile.md b/.claude/commands/generate-profile.md deleted file mode 100644 index b8577d3..0000000 --- a/.claude/commands/generate-profile.md +++ /dev/null @@ -1,85 +0,0 @@ -Generate a valid pmlogsynth YAML workload profile from a natural language description, -then validate it with `pmlogsynth --validate`. - -## Arguments - -`$ARGUMENTS` — workload description (optional; prompted if absent) - -## Step 1 — Acquire Schema Context - -Run the following command and capture its stdout as the schema context: - -```bash -pmlogsynth --show-schema -``` - -If this command fails (exit code ≠ 0), report the error and stop. Do not proceed. - -## Step 2 — Acquire Workload Description - -- If `$ARGUMENTS` is non-empty, use it as the workload description. -- Otherwise, ask the user: "Describe the workload you want to simulate." - -## Step 3 — (Optional) Acquire Existing Profile for Iterative Refinement - -If the user's description references an existing profile (e.g., "update my-workload.yaml -to add a network spike"), ask: "Provide the path to the existing profile, or press Enter -to generate from scratch." - -If a path is provided, read the file contents and include the existing YAML in the -generation prompt alongside the modification request. - -## Step 4 — Generate Profile - -Using the current Claude session, generate a valid pmlogsynth YAML profile. - -The generation prompt MUST include: -1. The full schema context (from Step 1) as reference -2. The workload description (from Step 2) -3. The existing profile content (from Step 3, if provided) -4. Instruction to produce ONLY the YAML profile, no explanation or markdown fences -5. Instruction that the sum of all phase durations MUST equal `meta.duration`, unless - a phase uses `repeat` - -## Step 5 — Save Profile - -Create the `generated-archives/` directory if it does not exist. - -Slugify the workload description to create a filename: -- Lowercase, replace spaces and special characters with hyphens -- Trim to a reasonable length (≤ 50 characters) -- Append `.yaml` - -Example: "10-minute CPU spike at 90%" → `generated-archives/10-minute-cpu-spike-at-90.yaml` - -If the target file already exists, append a numeric suffix (`_1`, `_2`, etc.) before `.yaml`. - -Write the generated YAML to the chosen path. - -## Step 6 — Validate Profile - -```bash -pmlogsynth --validate generated-archives/.yaml -``` - -**If exit code 0**: proceed to Step 7. - -**If exit code 1** (validation error): capture the stderr output and feed the error -back to the AI with: "The profile failed validation with this error: ``. Please -correct the profile." Then retry Steps 4–6 once. If validation fails again, report both -the error message and the generated profile YAML to the user and stop. - -**If exit code 2** (I/O error): report the error and stop. - -## Step 7 — Report Success - -Report to the user: -- Profile saved to: `` -- Generate the archive: - ```bash - pmlogsynth -o ./generated-archives/ - ``` -- Inspect the archive: - ```bash - pmstat -a ./generated-archives/ - ``` diff --git a/.claude/skills/generate-fleet-profile/SKILL.md b/.claude/skills/generate-fleet-profile/SKILL.md new file mode 100644 index 0000000..d3d9424 --- /dev/null +++ b/.claude/skills/generate-fleet-profile/SKILL.md @@ -0,0 +1,163 @@ +--- +name: generate-fleet-profile +description: > + Generate a valid pmlogsynth fleet profile YAML from a natural language description of a + multi-host environment, then validate it with `pmlogsynth fleet --validate`. Use this + skill whenever the user wants to simulate multiple hosts, a server fleet, a cluster, a + farm, or any multi-machine scenario. Trigger when users mention things like "fleet of + web servers", "cluster with some bad hosts", "20 hosts with a few showing CPU problems", + "generate archives for a server farm", "simulate a production fleet", or "I need + multiple PCP archives with some anomalies". A fleet profile is different from a regular + workload profile — it orchestrates multiple hosts sharing a common hardware profile, + with baseline hosts and optional bad-actor hosts running different workload profiles. + If the user only wants a single host, use the generate-profile skill instead. +--- + +# Generate pmlogsynth Fleet Profile + +Generate a fleet profile that produces multiple PCP archives — one per simulated host — +from a single YAML file. Fleet profiles describe a pool of hosts sharing common hardware, +with a majority running a baseline workload and an optional minority running as "bad +actors" with different workload profiles. + +## Key Concepts + +A fleet profile is **not** a workload profile. It's a higher-level orchestrator: + +- **All hosts share one hardware profile** (from `meta.hardware`) +- **Baseline hosts** all run the same workload profile, with per-host jitter for variation +- **Bad-actor hosts** are randomly selected from the pool and assigned workload profiles + from a separate list (e.g. CPU-saturated, memory-exhausted scenarios) +- **Jitter** adds Gaussian noise (±N%) to all stressor values per host, so no two hosts + are identical even if they share the same workload +- **Fleet-level `duration` and `interval`** override whatever the individual workload + profiles specify + +## Step 1 — Read the Schema References + +Read both reference files (relative to this skill's directory): + +1. `references/fleet-schema.md` — the fleet profile format, fields, and validation rules +2. `references/workload-profile-schema.md` — the workload profile format (for generating + the baseline and bad-actor workload profiles the fleet references) + +## Step 2 — Understand the Fleet Scenario + +If the user provided a description, use it. Otherwise ask: + +**"Describe the fleet you want to simulate. For example: how many hosts, what kind of +workload (web, database, batch), and what problems should some hosts exhibit?"** + +Key details to extract: +- **Total host count** — how many servers in the fleet +- **Workload character** — what the baseline hosts are doing (web serving, DB queries, etc.) +- **Bad actors** — how many, and what's wrong with them (CPU saturation, memory pressure, + disk thrashing, network degradation) +- **Duration** — how long the simulation runs +- **Hardware class** — what size machines (`generic-small` through `storage-optimized`) + +## Step 3 — Generate the Files + +A fleet profile references external workload profile files. You need to generate **all** +of these files: + +### 3a. Generate Workload Profiles + +Create the baseline and bad-actor workload profile YAML files. These are standard +pmlogsynth workload profiles (same format as the generate-profile skill produces). + +Save them to `generated-archives/` alongside the fleet profile. For example: +- `generated-archives/fleet-baseline.yaml` — the healthy workload +- `generated-archives/fleet-bad-cpu.yaml` — a CPU-saturated workload +- `generated-archives/fleet-bad-memory.yaml` — a memory-exhausted workload + +The workload profiles should be complete and valid on their own. The fleet will override +their `duration`, `interval`, `hostname`, and `hardware` settings — but the profiles +still need valid values for standalone validation. + +**Important:** Workload profile paths in the fleet YAML are resolved relative to the +fleet profile file's directory. If the fleet profile and workload profiles are in the +same directory, use just the filename (e.g. `baseline: fleet-baseline.yaml`). + +### 3b. Generate the Fleet Profile + +Produce the fleet profile YAML. Follow the format in `references/fleet-schema.md` exactly. + +Rules: +1. **Output raw YAML only** — no markdown fences, no prose +2. **`meta` is required** — must include `name`, `duration`, `interval`, + `hostname_prefix`, and `hardware` +3. **`hosts` is required** — must include `count` and `baseline` (path to workload file) +4. **`bad_actors` is optional** — include `count`, `profiles` list, and optionally `jitter` +5. **`bad_actors.count` must not exceed `hosts.count`** +6. **Use readable duration strings** (`10m`, `1h`, `24h`) for `duration` and `interval` +7. **Add jitter** (typically 0.03–0.10) for realistic per-host variation +8. **Include comments** explaining the fleet scenario + +### Realistic Fleet Patterns + +| Scenario | Hosts | Bad Actors | Typical Jitter | +|----------|-------|------------|----------------| +| Small dev cluster | 3–5 | 0–1 | 0.02–0.05 | +| Web tier | 10–50 | 1–3 | 0.03–0.08 | +| Database cluster | 3–10 | 1–2 | 0.02–0.05 | +| Large production fleet | 50–200 | 2–10 | 0.05–0.10 | + +### Named Fault Patterns for Bad Actors + +When the user describes problems, translate them into workload profiles: + +| Fault | Key Characteristics | +|-------|---------------------| +| CPU saturation | `utilization: 0.94–0.98`, high `user_ratio`, elevated `iowait_ratio` | +| Memory pressure | `used_ratio: 0.88–0.95`, very low `cache_ratio: 0.02–0.05` | +| Disk thrashing | High `read_mbps`/`write_mbps`, high `iops_*`, elevated `iowait_ratio` | +| Network degradation | Low `rx_mbps`/`tx_mbps` relative to interface capacity | +| Noisy neighbour | High CPU with elevated `sys_ratio` (virtualisation overhead) | +| Slow drain | Gradual increase across all metrics over multiple phases | + +## Step 4 — Save All Files + +1. Ensure `generated-archives/` exists +2. Save workload profile(s) first +3. Save the fleet profile with a descriptive slugified name: + - Example: "20-host web cluster with CPU problems" → + `generated-archives/20-host-web-cluster-fleet.yaml` + +## Step 5 — Validate + +Validate the workload profiles first (they must parse independently), then the fleet: + +```bash +# Validate individual workload profiles +pmlogsynth --validate generated-archives/fleet-baseline.yaml +pmlogsynth --validate generated-archives/fleet-bad-cpu.yaml + +# Validate the fleet profile +pmlogsynth fleet --validate generated-archives/20-host-web-cluster-fleet.yaml +``` + +- **Exit 0**: Valid. Proceed. +- **Exit 1**: Fix the error and retry once. +- **Exit 2**: I/O error — report and stop. + +## Step 6 — Report + +Tell the user: +- All files saved and their paths +- How to preview the fleet assignment: + ```bash + pmlogsynth fleet --dry-run generated-archives/.yaml + ``` +- How to generate the archives: + ```bash + pmlogsynth fleet -o ./generated-archives/ generated-archives/.yaml + ``` +- How to generate with reproducible host assignment: + ```bash + pmlogsynth fleet --seed 42 -o ./generated-archives/ generated-archives/.yaml + ``` +- How to inspect individual archives: + ```bash + pmstat -a ./generated-archives// + ``` diff --git a/.claude/skills/generate-fleet-profile/references/fleet-schema.md b/.claude/skills/generate-fleet-profile/references/fleet-schema.md new file mode 100644 index 0000000..7a54ce4 --- /dev/null +++ b/.claude/skills/generate-fleet-profile/references/fleet-schema.md @@ -0,0 +1,194 @@ +# pmlogsynth Fleet Profile Schema + +A fleet profile generates multiple PCP archives — one per simulated host — from a single +YAML file. It is a different document type from a workload profile. + +--- + +## Top-Level Structure + +A fleet profile has three top-level sections: + +```yaml +meta: # Fleet-wide settings (required) +hosts: # Baseline host pool (required) +bad_actors: # Anomalous host pool (optional) +``` + +--- + +## meta + +Fleet-wide metadata. All fields are required. + +| Field | Type | Description | +|-------|------|-------------| +| `name` | string | Fleet identifier. Used in output directory naming and the manifest. | +| `duration` | int or string | Archive duration for ALL hosts. Overrides workload profile durations. Integer = seconds, or strings: `'10m'`, `'1h'`, `'24h'`, `'7d'`. | +| `interval` | int or string | Sampling interval for ALL hosts. Overrides workload profile intervals. | +| `hostname_prefix` | string | Prefix for generated hostnames. Hosts are named `-01`, `-02`, etc. | +| `hardware` | string | Hardware profile name (e.g. `generic-large`). Applied to ALL hosts, overriding workload profile hardware. | + +### Example + +```yaml +meta: + name: prod-web-cluster + duration: 24h + interval: 60 + hostname_prefix: web + hardware: generic-large +``` + +This produces hostnames `web-01`, `web-02`, ..., `web-NN`. + +--- + +## hosts + +Baseline host pool configuration. Required. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `count` | int | — | **Required.** Total number of hosts in the fleet (includes bad actors). Must be ≥ 1. | +| `baseline` | string | — | **Required.** Path to the baseline workload profile YAML file. Resolved relative to the fleet profile file's directory. | +| `jitter` | float | `0.0` | Standard deviation for per-host Gaussian jitter. Applied multiplicatively to all stressor values. Typical range: 0.02–0.10. | + +### Example + +```yaml +hosts: + count: 20 + baseline: fleet-baseline.yaml + jitter: 0.05 +``` + +--- + +## bad_actors + +Optional section defining hosts that deviate from the baseline. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `count` | int | `0` | Number of bad-actor hosts. Must not exceed `hosts.count`. | +| `jitter` | float | inherits `hosts.jitter` | Jitter for bad-actor hosts. | +| `profiles` | list of strings | `[]` | Paths to workload profile YAML files for bad actors. Resolved relative to the fleet profile file. Bad actors are randomly assigned a profile from this list. | + +### Example + +```yaml +bad_actors: + count: 2 + jitter: 0.03 + profiles: + - fleet-bad-cpu.yaml + - fleet-bad-memory.yaml +``` + +--- + +## Complete Example + +```yaml +# Fleet: 20-host web cluster with 2 bad actors +meta: + name: web-cluster + duration: 24h + interval: 60 + hostname_prefix: web + hardware: generic-large + +hosts: + count: 20 + baseline: fleet-baseline.yaml + jitter: 0.05 + +bad_actors: + count: 2 + jitter: 0.03 + profiles: + - fleet-bad-cpu.yaml + - fleet-bad-memory.yaml +``` + +This generates 20 archives: +- 18 baseline hosts (`web-01` through `web-20`, minus 2 randomly selected bad actors) +- 2 bad-actor hosts (randomly selected from the pool, each assigned a profile from the + `profiles` list) + +All hosts share the `generic-large` hardware profile, a 24h duration, and 60s interval. + +--- + +## How Host Assignment Works + +1. All hosts are numbered `-01` through `-NN` +2. `bad_actors.count` hosts are randomly selected from the pool +3. Each bad actor is randomly assigned a profile from the `profiles` list +4. Remaining hosts use the `baseline` workload profile +5. Per-host jitter is applied multiplicatively to all stressor values: + `effective_value = base_value × Normal(mean=1.0, stddev=jitter)` +6. Use `--seed` for deterministic, reproducible assignment + +--- + +## How Fleet Overrides Work + +The fleet profile overrides several settings from individual workload profiles: + +| Fleet setting | Overrides workload field | Notes | +|---------------|--------------------------|-------| +| `meta.duration` | `meta.duration` in workload | All hosts get the same duration | +| `meta.interval` | `meta.interval` in workload | All hosts get the same interval | +| `meta.hardware` | `host.profile` in workload | All hosts get the same hardware | +| `meta.hostname_prefix` + index | `meta.hostname` in workload | Each host gets a unique name | + +Warnings are emitted when the fleet overrides differ from the workload profile values. + +--- + +## Validation Rules + +- `meta` must include all five fields: `name`, `duration`, `interval`, `hostname_prefix`, `hardware` +- `hosts.count` must be a positive integer +- `hosts.baseline` must be a valid path to a workload profile +- `bad_actors.count` must not exceed `hosts.count` +- `bad_actors.profiles` must be a non-empty list when `bad_actors.count > 0` +- All referenced workload profiles must be valid pmlogsynth workload profiles +- `hardware` must be a valid hardware profile name + +--- + +## Available Hardware Profiles + +| Name | Description | +|------|-------------| +| `generic-small` | Small VM (2 CPUs, ~4 GB RAM) | +| `generic-medium` | Mid-range server (8 CPUs, ~32 GB RAM) | +| `generic-large` | Production server (16 CPUs, ~64 GB RAM) | +| `generic-xlarge` | High-end server (32 CPUs, ~128 GB RAM) | +| `compute-optimized` | CPU-heavy workloads (64 CPUs, ~32 GB RAM) | +| `memory-optimized` | Memory-heavy workloads (16 CPUs, ~512 GB RAM) | +| `storage-optimized` | Disk I/O-heavy (8 CPUs, ~64 GB RAM, multiple NVMe) | + +--- + +## CLI Commands + +```bash +# Validate the fleet profile (and referenced workload profiles) +pmlogsynth fleet --validate fleet-profile.yaml + +# Preview host assignments without generating archives +pmlogsynth fleet --dry-run fleet-profile.yaml + +# Generate all archives +pmlogsynth fleet -o ./generated-archives/my-fleet fleet-profile.yaml + +# Reproducible generation (same seed = same host assignment + jitter) +pmlogsynth fleet --seed 42 -o ./generated-archives/my-fleet fleet-profile.yaml + +# Parallel generation (use multiple workers) +pmlogsynth fleet --jobs 4 -o ./generated-archives/my-fleet fleet-profile.yaml +``` diff --git a/.claude/skills/generate-fleet-profile/references/workload-profile-schema.md b/.claude/skills/generate-fleet-profile/references/workload-profile-schema.md new file mode 100644 index 0000000..cb4a497 --- /dev/null +++ b/.claude/skills/generate-fleet-profile/references/workload-profile-schema.md @@ -0,0 +1,395 @@ +# pmlogsynth Profile Schema + +Schema Version: 0.1.0 + +pmlogsynth generates synthetic PCP (Performance Co-Pilot) archives from declarative YAML +workload profiles. This document is the complete reference for generating valid profiles. + +--- + +## Overview + +A profile has three top-level keys: `meta`, `host`, and `phases`. All three are required. + +```yaml +meta: + duration: 1h # total archive length + hostname: my-host +host: + profile: generic-small +phases: + - name: baseline + duration: 1h + cpu: + utilization: 0.30 +``` + +--- + +## meta + +Global archive settings. All fields except `duration` are optional. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `duration` | int or string | — | **Required.** Total archive length. Integer = seconds. Strings: `'30s'`, `'10m'`, `'24h'`, `'1d'`, `'7d'`. Must be positive. | +| `hostname` | string | `synthetic-host` | Hostname written into the archive. | +| `timezone` | string | `UTC` | Timezone label (informational only). | +| `interval` | int or string | `60` | Seconds between samples. Accepts plain integer or duration string (`'60s'`, `'5m'`, `'1h'`). | +| `noise` | float | `0.0` | Global noise amplitude [0.0–1.0] applied to all metrics. | +| `mean_packet_bytes` | int | `1400` | Mean packet size for network byte calculations. | +| `start` | string | today 00:00:00 UTC | Archive start time. ISO 8601 (`2026-03-01T08:00:00Z`) **or** relative offset (`-90m`, `-2h`, `-1d`). | + +### meta validation rules + +- `duration` must be a positive integer (seconds) or a duration string (`'30s'`, `'10m'`, `'24h'`, `'1d'`). +- `interval` must be a positive integer. +- `noise` must be in the range [0.0, 1.0]. +- `start` must be an ISO 8601 timestamp or a relative offset (e.g. `-90m`, `-2h`, `-1d`). + +--- + +## host + +Specifies the hardware the workload runs on. Three forms are supported. + +### Form 1 — Named profile (recommended) + +```yaml +host: + profile: generic-small +``` + +### Form 2 — Named profile with overrides + +```yaml +host: + profile: generic-medium + overrides: + cpus: 32 + memory_kb: 65536000 +``` + +### Form 3 — Fully inline + +```yaml +host: + name: custom-server + cpus: 8 + memory_kb: 16777216 + disks: + - name: nvme0n1 + type: nvme + interfaces: + - name: eth0 + speed_mbps: 10000 +``` + +### host field reference + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `profile` | string | No | Name of a bundled or user hardware profile. | +| `overrides` | mapping | No | Override specific fields from the named profile. | +| `name` | string | No | Custom name for inline hardware. | +| `cpus` | int | Inline only | Number of CPUs. Required for inline form. | +| `memory_kb` | int | Inline only | Total RAM in kilobytes. Required for inline form. | +| `disks` | list | No | List of `{name: str, type: str}`. `type` is one of `nvme`, `ssd`, `hdd`. | +| `interfaces` | list | No | List of `{name: str, speed_mbps: int}`. | + +### host validation rules + +- **Cannot mix** `profile` with inline fields (`cpus`, `memory_kb`, etc.) without an `overrides:` key. +- Inline form requires **both** `cpus` and `memory_kb`. +- Each disk must have a `name` field. +- Each interface must have a `name` field. + +--- + +## Available Hardware Profiles + +Use one of these names for `host.profile`: + +| Name | Description | +|------|-------------| +| `generic-small` | Small VM / test environment (2 CPUs, ~4 GB RAM) | +| `generic-medium` | Mid-range server (8 CPUs, ~32 GB RAM) | +| `generic-large` | Production server (16 CPUs, ~64 GB RAM) | +| `generic-xlarge` | High-end server (32 CPUs, ~128 GB RAM) | +| `compute-optimized` | CPU-heavy workloads (64 CPUs, ~32 GB RAM) | +| `memory-optimized` | Memory-heavy workloads (16 CPUs, ~512 GB RAM) | +| `storage-optimized` | Disk I/O-heavy workloads (8 CPUs, ~64 GB RAM, multiple NVMe disks) | + +--- + +## phases + +An ordered list of workload phases. Each phase describes a period of the archive with +specific resource utilisation patterns. At least one phase is required. + +**Critical constraint**: The sum of all phase `duration` values **must equal** `meta.duration`, +unless a phase uses `repeat` (see below). + +### Phase fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `name` | string | Yes | Phase identifier. Used in validation error messages. | +| `duration` | int or string | Yes | Length of this phase. Same format as `meta.duration`. | +| `transition` | string | No | `instant` (default) or `linear`. `linear` interpolates from the previous phase's values. **Cannot be set on the first phase.** | +| `repeat` | string or int | No | `daily` repeats this phase every day for `meta.duration`. Only one phase allowed when `repeat: daily`. Integer repeat count is also supported. | +| `cpu` | mapping | No | CPU stressor (see below). | +| `memory` | mapping | No | Memory stressor (see below). | +| `disk` | mapping | No | Disk stressor (see below). | +| `network` | mapping | No | Network stressor (see below). | + +### cpu stressor + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `utilization` | float [0.0–1.0] | 0.05 | Overall CPU utilisation fraction. | +| `user_ratio` | float | 0.70 | Fraction of CPU time in user space. | +| `sys_ratio` | float | 0.20 | Fraction of CPU time in kernel space. | +| `iowait_ratio` | float | 0.10 | Fraction of CPU time in I/O wait. | +| `noise` | float [0.0–1.0] | 0.0 | Per-sample noise for CPU metrics. | + +**Constraint**: `user_ratio + sys_ratio + iowait_ratio` must be ≤ 1.0. + +### memory stressor + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `used_ratio` | float [0.0–1.0] | 0.50 | Fraction of total RAM used. | +| `cache_ratio` | float [0.0–1.0] | 0.20 | Fraction of RAM used as page cache. | +| `noise` | float [0.0–1.0] | 0.0 | Per-sample noise for memory metrics. | + +### disk stressor + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `read_mbps` | float | 0.0 | Disk read throughput in MB/s. | +| `write_mbps` | float | 0.0 | Disk write throughput in MB/s. | +| `iops_read` | int | 0 | Read I/O operations per second. | +| `iops_write` | int | 0 | Write I/O operations per second. | +| `noise` | float [0.0–1.0] | 0.0 | Per-sample noise for disk metrics. | + +### network stressor + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `rx_mbps` | float | 0.0 | Inbound throughput in MB/s. | +| `tx_mbps` | float | 0.0 | Outbound throughput in MB/s. | +| `noise` | float [0.0–1.0] | 0.0 | Per-sample noise for network metrics. | + +--- + +## Examples + +### Simple — 10-minute CPU spike + +```yaml +meta: + hostname: demo-host + timezone: UTC + duration: 10m + interval: 60 + +host: + profile: generic-small + +phases: + - name: baseline + duration: 5m + cpu: + utilization: 0.15 + + - name: spike + duration: 5m + cpu: + utilization: 0.90 +``` + +### Complex — 24-hour SaaS workload with diurnal pattern + +```yaml +meta: + hostname: saas-prod-01 + timezone: UTC + duration: 24h + interval: 60 + noise: 0.05 + start: "2026-03-01T00:00:00Z" + +host: + profile: generic-large + overrides: + cpus: 24 + +phases: + - name: overnight-quiet + duration: 8h + cpu: + utilization: 0.08 + user_ratio: 0.60 + sys_ratio: 0.20 + iowait_ratio: 0.10 + memory: + used_ratio: 0.40 + network: + rx_mbps: 5.0 + tx_mbps: 2.0 + + - name: business-hours-ramp + duration: 2h + transition: linear + cpu: + utilization: 0.55 + user_ratio: 0.65 + sys_ratio: 0.20 + iowait_ratio: 0.10 + memory: + used_ratio: 0.65 + + - name: business-hours-peak + duration: 8h + cpu: + utilization: 0.75 + user_ratio: 0.65 + sys_ratio: 0.20 + iowait_ratio: 0.10 + memory: + used_ratio: 0.72 + cache_ratio: 0.15 + disk: + read_mbps: 120.0 + write_mbps: 60.0 + iops_read: 4000 + iops_write: 2000 + network: + rx_mbps: 450.0 + tx_mbps: 380.0 + + - name: evening-wind-down + duration: 4h + transition: linear + cpu: + utilization: 0.25 + memory: + used_ratio: 0.50 + network: + rx_mbps: 80.0 + tx_mbps: 60.0 + + - name: late-night-batch + duration: 2h + cpu: + utilization: 0.40 + user_ratio: 0.30 + sys_ratio: 0.20 + iowait_ratio: 0.45 + disk: + read_mbps: 800.0 + write_mbps: 400.0 + iops_read: 20000 + iops_write: 10000 + memory: + used_ratio: 0.60 +``` + +Note: 8h + 2h + 8h + 4h + 2h = 24h = meta.duration ✓ + +--- + +## Common Validation Errors + +### Duration errors + +| Error message | Fix | +|---------------|-----| +| `Sum of phase durations (Xs) does not equal meta.duration (Ys) (FR-027)` | Adjust phase durations to sum to exactly `meta.duration`. | +| `invalid duration '...': use a positive integer or a string like '30s', '10m', '24h'` | Use seconds as an integer, or a string like `'1h'`, `'30m'`, `'600s'`, `'1d'`. | +| `meta.duration must be a positive integer or duration string` | Set `meta.duration` to a positive int or a string like `'24h'`, `'1d'`, `'7d'`. | +| `phases[N].duration must be a positive integer or duration string` | Fix the duration in the Nth phase. | + +### CPU errors + +| Error message | Fix | +|---------------|-----| +| `phases[N] (name): user_ratio + sys_ratio + iowait_ratio = X > 1.0 (FR-026)` | Reduce `user_ratio`, `sys_ratio`, or `iowait_ratio` so their sum is ≤ 1.0. | +| `cpu.noise must be in [0.0, 1.0]` | Set `cpu.noise` to a value between 0.0 and 1.0. | + +### Phase structure errors + +| Error message | Fix | +|---------------|-----| +| `phases[0]: first phase cannot use 'transition: linear' (FR-055)` | Remove `transition: linear` from the first phase (no prior phase to interpolate from). | +| `A phase with repeat:daily must be the only phase` | Remove all other phases when using `repeat: daily`. | +| `phases[N].transition must be 'instant' or 'linear'` | Use `instant` or `linear`, or omit the field entirely. | +| `phases must be a non-empty list` | Add at least one phase to the `phases` list. | +| `phases[N].name is required` | Add a `name` field to the Nth phase. | + +### Host errors + +| Error message | Fix | +|---------------|-----| +| `Hardware profile 'X' not found` | Use one of the 7 bundled profiles listed in the Available Hardware Profiles table above. | +| `Inline host specification requires at least 'cpus' and 'memory_kb'` | Add both `cpus` and `memory_kb` to the inline `host` block. | +| `host.profile and inline host fields cannot be mixed without an 'overrides:' key` | Either use `host.profile` alone, or add an `overrides:` key alongside `profile`. | + +### meta field errors + +| Error message | Fix | +|---------------|-----| +| `meta.interval must be a positive integer (FR-030)` | Set `meta.interval` to a positive integer (seconds). | +| `meta.noise must be in [0.0, 1.0]` | Set `meta.noise` to a float between 0.0 and 1.0. | +| `meta.start: cannot parse '...'` | Use ISO 8601 (`2026-03-01T08:00:00Z`) or a relative offset (`-90m`, `-2h`, `-1d`). | + +### YAML format errors + +| Error message | Fix | +|---------------|-----| +| `YAML parse error: ...` | The profile YAML is malformed or incomplete. Ensure the output is valid YAML with no truncation, no markdown code fences, and no prose mixed in. | +| `Profile must be a YAML mapping` | The profile must be a YAML mapping (key: value pairs at the top level), not a list or scalar. | + +### Semantic pitfalls (no error raised, but incorrect output) + +| Situation | Fix | +|-----------|-----| +| `meta.interval` (e.g. `3600`) is larger than `meta.duration` (e.g. `300`) | The archive will have zero or one sample. Set `interval` to a value smaller than `duration`. For a 5-minute archive use `interval: 60` (5 samples). | +| Phase durations don't sum to `meta.duration` by a few seconds | Adjust one phase duration. Common mistake: using `'1h'` phases in a `'24h'` archive where 24 × 1h = 24h but rounding errors creep in. Use exact integers: `3600` × 24 = `86400`. | + +--- + +## Duration arithmetic tips + +When writing multi-phase profiles, ensure your phase durations sum to `meta.duration`: + +``` +meta.duration: 24h = 86400s +phases: + - duration: 8h → 28800s + - duration: 10h → 36000s + - duration: 6h → 21600s + Total: 86400s ✓ +``` + +If using `repeat: daily`, omit all other phases and set `duration` to the length of the +repeating daily pattern (e.g., `86400` for a full 24h daily pattern, or less for a pattern +that repeats within each day). + +--- + +## Generating archives + +After generating a valid profile file: + +```bash +# Validate the profile +pmlogsynth --validate generated-archives/my-workload.yaml + +# Generate the PCP archive +pmlogsynth -o ./generated-archives/my-workload generated-archives/my-workload.yaml + +# Inspect the archive +pmstat -a ./generated-archives/my-workload +``` diff --git a/.claude/skills/generate-profile/SKILL.md b/.claude/skills/generate-profile/SKILL.md new file mode 100644 index 0000000..4ed87af --- /dev/null +++ b/.claude/skills/generate-profile/SKILL.md @@ -0,0 +1,107 @@ +--- +name: generate-profile +description: > + Generate a valid pmlogsynth YAML workload profile from a natural language description, + then validate it with `pmlogsynth --validate`. Use this skill whenever the user wants to + create a PCP archive workload profile, describes a server workload scenario they want to + simulate, mentions pmlogsynth profiles, or asks to create/modify synthetic performance + data for a single host. Also trigger when users say things like "simulate a CPU spike", + "create a workload that looks like a database server", "generate a profile for testing", + or "I need a PCP archive that shows memory pressure". If the user wants multiple hosts + or a fleet, use the generate-fleet-profile skill instead. +--- + +# Generate pmlogsynth Workload Profile + +Generate a valid pmlogsynth YAML workload profile from a natural language description. +The profile describes a single host's workload over time using phases with CPU, memory, +disk, and network stressors. + +## Step 1 — Get the Schema + +Read the reference file at `references/profile-schema.md` (relative to this skill's +directory). This contains the complete, authoritative schema for workload profiles — +every field, type, default, constraint, and common validation error. + +Do NOT run `pmlogsynth --show-schema` — the bundled reference is identical and faster. + +## Step 2 — Understand the Workload + +If the user provided a description (via arguments or conversation), use it directly. + +Otherwise, ask: **"Describe the workload you want to simulate."** + +Good descriptions include: +- What kind of server/service (web, database, batch, etc.) +- How long the simulation should run +- What the interesting patterns are (spikes, ramps, diurnal cycles, steady state) +- Which resources are under pressure (CPU, memory, disk, network, or all) + +## Step 3 — (Optional) Refine an Existing Profile + +If the user references an existing profile ("update my-workload.yaml to add a network +spike"), read that file and include its content alongside the modification request. + +## Step 4 — Generate the Profile + +Produce a valid YAML workload profile. Follow these rules strictly: + +1. **Output raw YAML only** — no markdown fences, no prose, no explanation mixed in +2. **Phase durations must sum to `meta.duration`** exactly (unless using `repeat`) +3. **First phase cannot use `transition: linear`** — there's no prior phase to interpolate from +4. **CPU ratios must sum to ≤ 1.0**: `user_ratio + sys_ratio + iowait_ratio` +5. **Use duration strings** for readability (`10m`, `1h`, `24h`) rather than raw seconds +6. **Pick a hardware profile** that fits the workload — see the schema reference for the + 7 bundled profiles (`generic-small` through `storage-optimized`) +7. **Add noise** (typically 0.02–0.05) to make the data look realistic, not robotic +8. **Use `transition: linear`** between phases with different load levels for smooth ramps +9. **Include comments** explaining what each phase represents (time of day, event, etc.) + +### Realistic Value Ranges + +These are typical ranges for production servers — use them as a sanity check: + +| Metric | Idle | Moderate | Heavy | Saturated | +|--------|------|----------|-------|-----------| +| CPU utilization | 0.05–0.15 | 0.30–0.60 | 0.70–0.85 | 0.90–0.98 | +| Memory used_ratio | 0.20–0.40 | 0.50–0.70 | 0.75–0.85 | 0.88–0.95 | +| Disk read MB/s | 1–20 | 50–200 | 300–600 | 800+ | +| Disk write MB/s | 1–10 | 20–100 | 150–400 | 500+ | +| Network rx MB/s | 1–20 | 50–200 | 300–600 | 800+ | +| Network tx MB/s | 1–10 | 20–100 | 100–300 | 500+ | + +## Step 5 — Save the Profile + +1. Ensure the `generated-archives/` directory exists +2. Slugify the workload description into a filename: + - Lowercase, replace spaces/special chars with hyphens + - Trim to ≤ 50 characters + - Append `.yaml` + - If file exists, append `_1`, `_2`, etc. before `.yaml` + +Example: "10-minute CPU spike at 90%" → `generated-archives/10-minute-cpu-spike-at-90.yaml` + +## Step 6 — Validate + +```bash +pmlogsynth --validate generated-archives/.yaml +``` + +- **Exit 0**: Profile is valid. Proceed to Step 7. +- **Exit 1** (validation error): Feed the error back into the generation, fix the YAML, + and retry validation. If it fails a second time, show both the error and the YAML to the + user and stop. +- **Exit 2** (I/O error): Report the error and stop. + +## Step 7 — Report + +Tell the user: +- Where the profile was saved +- How to generate the archive: + ```bash + pmlogsynth -o ./generated-archives/ generated-archives/.yaml + ``` +- How to inspect the archive: + ```bash + pmstat -a ./generated-archives/ + ``` diff --git a/.claude/skills/generate-profile/references/profile-schema.md b/.claude/skills/generate-profile/references/profile-schema.md new file mode 100644 index 0000000..cb4a497 --- /dev/null +++ b/.claude/skills/generate-profile/references/profile-schema.md @@ -0,0 +1,395 @@ +# pmlogsynth Profile Schema + +Schema Version: 0.1.0 + +pmlogsynth generates synthetic PCP (Performance Co-Pilot) archives from declarative YAML +workload profiles. This document is the complete reference for generating valid profiles. + +--- + +## Overview + +A profile has three top-level keys: `meta`, `host`, and `phases`. All three are required. + +```yaml +meta: + duration: 1h # total archive length + hostname: my-host +host: + profile: generic-small +phases: + - name: baseline + duration: 1h + cpu: + utilization: 0.30 +``` + +--- + +## meta + +Global archive settings. All fields except `duration` are optional. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `duration` | int or string | — | **Required.** Total archive length. Integer = seconds. Strings: `'30s'`, `'10m'`, `'24h'`, `'1d'`, `'7d'`. Must be positive. | +| `hostname` | string | `synthetic-host` | Hostname written into the archive. | +| `timezone` | string | `UTC` | Timezone label (informational only). | +| `interval` | int or string | `60` | Seconds between samples. Accepts plain integer or duration string (`'60s'`, `'5m'`, `'1h'`). | +| `noise` | float | `0.0` | Global noise amplitude [0.0–1.0] applied to all metrics. | +| `mean_packet_bytes` | int | `1400` | Mean packet size for network byte calculations. | +| `start` | string | today 00:00:00 UTC | Archive start time. ISO 8601 (`2026-03-01T08:00:00Z`) **or** relative offset (`-90m`, `-2h`, `-1d`). | + +### meta validation rules + +- `duration` must be a positive integer (seconds) or a duration string (`'30s'`, `'10m'`, `'24h'`, `'1d'`). +- `interval` must be a positive integer. +- `noise` must be in the range [0.0, 1.0]. +- `start` must be an ISO 8601 timestamp or a relative offset (e.g. `-90m`, `-2h`, `-1d`). + +--- + +## host + +Specifies the hardware the workload runs on. Three forms are supported. + +### Form 1 — Named profile (recommended) + +```yaml +host: + profile: generic-small +``` + +### Form 2 — Named profile with overrides + +```yaml +host: + profile: generic-medium + overrides: + cpus: 32 + memory_kb: 65536000 +``` + +### Form 3 — Fully inline + +```yaml +host: + name: custom-server + cpus: 8 + memory_kb: 16777216 + disks: + - name: nvme0n1 + type: nvme + interfaces: + - name: eth0 + speed_mbps: 10000 +``` + +### host field reference + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `profile` | string | No | Name of a bundled or user hardware profile. | +| `overrides` | mapping | No | Override specific fields from the named profile. | +| `name` | string | No | Custom name for inline hardware. | +| `cpus` | int | Inline only | Number of CPUs. Required for inline form. | +| `memory_kb` | int | Inline only | Total RAM in kilobytes. Required for inline form. | +| `disks` | list | No | List of `{name: str, type: str}`. `type` is one of `nvme`, `ssd`, `hdd`. | +| `interfaces` | list | No | List of `{name: str, speed_mbps: int}`. | + +### host validation rules + +- **Cannot mix** `profile` with inline fields (`cpus`, `memory_kb`, etc.) without an `overrides:` key. +- Inline form requires **both** `cpus` and `memory_kb`. +- Each disk must have a `name` field. +- Each interface must have a `name` field. + +--- + +## Available Hardware Profiles + +Use one of these names for `host.profile`: + +| Name | Description | +|------|-------------| +| `generic-small` | Small VM / test environment (2 CPUs, ~4 GB RAM) | +| `generic-medium` | Mid-range server (8 CPUs, ~32 GB RAM) | +| `generic-large` | Production server (16 CPUs, ~64 GB RAM) | +| `generic-xlarge` | High-end server (32 CPUs, ~128 GB RAM) | +| `compute-optimized` | CPU-heavy workloads (64 CPUs, ~32 GB RAM) | +| `memory-optimized` | Memory-heavy workloads (16 CPUs, ~512 GB RAM) | +| `storage-optimized` | Disk I/O-heavy workloads (8 CPUs, ~64 GB RAM, multiple NVMe disks) | + +--- + +## phases + +An ordered list of workload phases. Each phase describes a period of the archive with +specific resource utilisation patterns. At least one phase is required. + +**Critical constraint**: The sum of all phase `duration` values **must equal** `meta.duration`, +unless a phase uses `repeat` (see below). + +### Phase fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `name` | string | Yes | Phase identifier. Used in validation error messages. | +| `duration` | int or string | Yes | Length of this phase. Same format as `meta.duration`. | +| `transition` | string | No | `instant` (default) or `linear`. `linear` interpolates from the previous phase's values. **Cannot be set on the first phase.** | +| `repeat` | string or int | No | `daily` repeats this phase every day for `meta.duration`. Only one phase allowed when `repeat: daily`. Integer repeat count is also supported. | +| `cpu` | mapping | No | CPU stressor (see below). | +| `memory` | mapping | No | Memory stressor (see below). | +| `disk` | mapping | No | Disk stressor (see below). | +| `network` | mapping | No | Network stressor (see below). | + +### cpu stressor + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `utilization` | float [0.0–1.0] | 0.05 | Overall CPU utilisation fraction. | +| `user_ratio` | float | 0.70 | Fraction of CPU time in user space. | +| `sys_ratio` | float | 0.20 | Fraction of CPU time in kernel space. | +| `iowait_ratio` | float | 0.10 | Fraction of CPU time in I/O wait. | +| `noise` | float [0.0–1.0] | 0.0 | Per-sample noise for CPU metrics. | + +**Constraint**: `user_ratio + sys_ratio + iowait_ratio` must be ≤ 1.0. + +### memory stressor + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `used_ratio` | float [0.0–1.0] | 0.50 | Fraction of total RAM used. | +| `cache_ratio` | float [0.0–1.0] | 0.20 | Fraction of RAM used as page cache. | +| `noise` | float [0.0–1.0] | 0.0 | Per-sample noise for memory metrics. | + +### disk stressor + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `read_mbps` | float | 0.0 | Disk read throughput in MB/s. | +| `write_mbps` | float | 0.0 | Disk write throughput in MB/s. | +| `iops_read` | int | 0 | Read I/O operations per second. | +| `iops_write` | int | 0 | Write I/O operations per second. | +| `noise` | float [0.0–1.0] | 0.0 | Per-sample noise for disk metrics. | + +### network stressor + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `rx_mbps` | float | 0.0 | Inbound throughput in MB/s. | +| `tx_mbps` | float | 0.0 | Outbound throughput in MB/s. | +| `noise` | float [0.0–1.0] | 0.0 | Per-sample noise for network metrics. | + +--- + +## Examples + +### Simple — 10-minute CPU spike + +```yaml +meta: + hostname: demo-host + timezone: UTC + duration: 10m + interval: 60 + +host: + profile: generic-small + +phases: + - name: baseline + duration: 5m + cpu: + utilization: 0.15 + + - name: spike + duration: 5m + cpu: + utilization: 0.90 +``` + +### Complex — 24-hour SaaS workload with diurnal pattern + +```yaml +meta: + hostname: saas-prod-01 + timezone: UTC + duration: 24h + interval: 60 + noise: 0.05 + start: "2026-03-01T00:00:00Z" + +host: + profile: generic-large + overrides: + cpus: 24 + +phases: + - name: overnight-quiet + duration: 8h + cpu: + utilization: 0.08 + user_ratio: 0.60 + sys_ratio: 0.20 + iowait_ratio: 0.10 + memory: + used_ratio: 0.40 + network: + rx_mbps: 5.0 + tx_mbps: 2.0 + + - name: business-hours-ramp + duration: 2h + transition: linear + cpu: + utilization: 0.55 + user_ratio: 0.65 + sys_ratio: 0.20 + iowait_ratio: 0.10 + memory: + used_ratio: 0.65 + + - name: business-hours-peak + duration: 8h + cpu: + utilization: 0.75 + user_ratio: 0.65 + sys_ratio: 0.20 + iowait_ratio: 0.10 + memory: + used_ratio: 0.72 + cache_ratio: 0.15 + disk: + read_mbps: 120.0 + write_mbps: 60.0 + iops_read: 4000 + iops_write: 2000 + network: + rx_mbps: 450.0 + tx_mbps: 380.0 + + - name: evening-wind-down + duration: 4h + transition: linear + cpu: + utilization: 0.25 + memory: + used_ratio: 0.50 + network: + rx_mbps: 80.0 + tx_mbps: 60.0 + + - name: late-night-batch + duration: 2h + cpu: + utilization: 0.40 + user_ratio: 0.30 + sys_ratio: 0.20 + iowait_ratio: 0.45 + disk: + read_mbps: 800.0 + write_mbps: 400.0 + iops_read: 20000 + iops_write: 10000 + memory: + used_ratio: 0.60 +``` + +Note: 8h + 2h + 8h + 4h + 2h = 24h = meta.duration ✓ + +--- + +## Common Validation Errors + +### Duration errors + +| Error message | Fix | +|---------------|-----| +| `Sum of phase durations (Xs) does not equal meta.duration (Ys) (FR-027)` | Adjust phase durations to sum to exactly `meta.duration`. | +| `invalid duration '...': use a positive integer or a string like '30s', '10m', '24h'` | Use seconds as an integer, or a string like `'1h'`, `'30m'`, `'600s'`, `'1d'`. | +| `meta.duration must be a positive integer or duration string` | Set `meta.duration` to a positive int or a string like `'24h'`, `'1d'`, `'7d'`. | +| `phases[N].duration must be a positive integer or duration string` | Fix the duration in the Nth phase. | + +### CPU errors + +| Error message | Fix | +|---------------|-----| +| `phases[N] (name): user_ratio + sys_ratio + iowait_ratio = X > 1.0 (FR-026)` | Reduce `user_ratio`, `sys_ratio`, or `iowait_ratio` so their sum is ≤ 1.0. | +| `cpu.noise must be in [0.0, 1.0]` | Set `cpu.noise` to a value between 0.0 and 1.0. | + +### Phase structure errors + +| Error message | Fix | +|---------------|-----| +| `phases[0]: first phase cannot use 'transition: linear' (FR-055)` | Remove `transition: linear` from the first phase (no prior phase to interpolate from). | +| `A phase with repeat:daily must be the only phase` | Remove all other phases when using `repeat: daily`. | +| `phases[N].transition must be 'instant' or 'linear'` | Use `instant` or `linear`, or omit the field entirely. | +| `phases must be a non-empty list` | Add at least one phase to the `phases` list. | +| `phases[N].name is required` | Add a `name` field to the Nth phase. | + +### Host errors + +| Error message | Fix | +|---------------|-----| +| `Hardware profile 'X' not found` | Use one of the 7 bundled profiles listed in the Available Hardware Profiles table above. | +| `Inline host specification requires at least 'cpus' and 'memory_kb'` | Add both `cpus` and `memory_kb` to the inline `host` block. | +| `host.profile and inline host fields cannot be mixed without an 'overrides:' key` | Either use `host.profile` alone, or add an `overrides:` key alongside `profile`. | + +### meta field errors + +| Error message | Fix | +|---------------|-----| +| `meta.interval must be a positive integer (FR-030)` | Set `meta.interval` to a positive integer (seconds). | +| `meta.noise must be in [0.0, 1.0]` | Set `meta.noise` to a float between 0.0 and 1.0. | +| `meta.start: cannot parse '...'` | Use ISO 8601 (`2026-03-01T08:00:00Z`) or a relative offset (`-90m`, `-2h`, `-1d`). | + +### YAML format errors + +| Error message | Fix | +|---------------|-----| +| `YAML parse error: ...` | The profile YAML is malformed or incomplete. Ensure the output is valid YAML with no truncation, no markdown code fences, and no prose mixed in. | +| `Profile must be a YAML mapping` | The profile must be a YAML mapping (key: value pairs at the top level), not a list or scalar. | + +### Semantic pitfalls (no error raised, but incorrect output) + +| Situation | Fix | +|-----------|-----| +| `meta.interval` (e.g. `3600`) is larger than `meta.duration` (e.g. `300`) | The archive will have zero or one sample. Set `interval` to a value smaller than `duration`. For a 5-minute archive use `interval: 60` (5 samples). | +| Phase durations don't sum to `meta.duration` by a few seconds | Adjust one phase duration. Common mistake: using `'1h'` phases in a `'24h'` archive where 24 × 1h = 24h but rounding errors creep in. Use exact integers: `3600` × 24 = `86400`. | + +--- + +## Duration arithmetic tips + +When writing multi-phase profiles, ensure your phase durations sum to `meta.duration`: + +``` +meta.duration: 24h = 86400s +phases: + - duration: 8h → 28800s + - duration: 10h → 36000s + - duration: 6h → 21600s + Total: 86400s ✓ +``` + +If using `repeat: daily`, omit all other phases and set `duration` to the length of the +repeating daily pattern (e.g., `86400` for a full 24h daily pattern, or less for a pattern +that repeats within each day). + +--- + +## Generating archives + +After generating a valid profile file: + +```bash +# Validate the profile +pmlogsynth --validate generated-archives/my-workload.yaml + +# Generate the PCP archive +pmlogsynth -o ./generated-archives/my-workload generated-archives/my-workload.yaml + +# Inspect the archive +pmstat -a ./generated-archives/my-workload +``` diff --git a/README.md b/README.md index c509207..9923c9e 100644 --- a/README.md +++ b/README.md @@ -116,17 +116,31 @@ pmlogsynth --list-metrics # show all producible PCP metrics pmlogsynth --show-schema # dump the full profile schema (for AI agents) ``` -### 6. Generate a profile with AI +### 6. Generate profiles with AI -If you're using [Claude Code](https://claude.ai/claude-code), the `/generate-profile` -skill can turn a plain-English description into a valid YAML profile: +If you're using [Claude Code](https://claude.ai/claude-code) with this repo checked out, +two built-in skills can generate valid YAML profiles from plain-English descriptions: + +**Single-host workload profiles** — just describe the scenario: + +``` +> simulate a 24-hour web server with overnight quiet, morning ramp, and daytime peak +> create a 1-hour archive of a memory-constrained host under heavy disk I/O +> take docs/spike.yml and add memory pressure during the spike phase +``` + +**Fleet profiles** (multiple hosts with bad actors) — describe the fleet: ``` -/generate-profile a 1-hour archive of a memory-constrained host under heavy disk I/O +> generate a fleet of 20 web servers where 3 have CPU saturation problems +> I need a 50-host database cluster on memory-optimized hardware with some hosts + showing memory pressure and disk thrashing +> create a small 5-host dev cluster with normal web traffic for an hour ``` -The skill feeds `--show-schema` output to the model as context, so the generated profile -is always valid against the current schema. +The skills bundle the full schema as context, validate the output against +`pmlogsynth --validate` (or `pmlogsynth fleet --validate`), and save the +generated files to `generated-archives/`. --- From 5b0b091dd8b09a2df8b59ed1a64d3b8aa1bd6107 Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Fri, 20 Mar 2026 23:03:06 +0000 Subject: [PATCH 13/23] Skills generate archives, not just YAML profiles Both skills now run `uv sync` + `uv run pmlogsynth` to validate AND generate the actual PCP archives. Shared bootstrap reference extracted to references/running-pmlogsynth.md for both skills. --- .../skills/generate-fleet-profile/SKILL.md | 45 +++++++++++------- .../references/running-pmlogsynth.md | 47 +++++++++++++++++++ .claude/skills/generate-profile/SKILL.md | 36 +++++++++----- .../references/running-pmlogsynth.md | 47 +++++++++++++++++++ 4 files changed, 146 insertions(+), 29 deletions(-) create mode 100644 .claude/skills/generate-fleet-profile/references/running-pmlogsynth.md create mode 100644 .claude/skills/generate-profile/references/running-pmlogsynth.md diff --git a/.claude/skills/generate-fleet-profile/SKILL.md b/.claude/skills/generate-fleet-profile/SKILL.md index d3d9424..6012999 100644 --- a/.claude/skills/generate-fleet-profile/SKILL.md +++ b/.claude/skills/generate-fleet-profile/SKILL.md @@ -33,13 +33,16 @@ A fleet profile is **not** a workload profile. It's a higher-level orchestrator: - **Fleet-level `duration` and `interval`** override whatever the individual workload profiles specify -## Step 1 — Read the Schema References +## Step 1 — Read the Schema References and Bootstrap -Read both reference files (relative to this skill's directory): +Read these reference files (relative to this skill's directory): 1. `references/fleet-schema.md` — the fleet profile format, fields, and validation rules 2. `references/workload-profile-schema.md` — the workload profile format (for generating the baseline and bad-actor workload profiles the fleet references) +3. `references/running-pmlogsynth.md` — how to bootstrap and run pmlogsynth + +Before any validation or generation, run `uv sync` to ensure the environment is ready. ## Step 2 — Understand the Fleet Scenario @@ -126,36 +129,42 @@ When the user describes problems, translate them into workload profiles: ## Step 5 — Validate -Validate the workload profiles first (they must parse independently), then the fleet: +Validate the workload profiles first (they must parse independently), then the fleet. +If `uv run` fails because dependencies aren't synced, run `uv sync` first. ```bash # Validate individual workload profiles -pmlogsynth --validate generated-archives/fleet-baseline.yaml -pmlogsynth --validate generated-archives/fleet-bad-cpu.yaml +uv run pmlogsynth --validate generated-archives/fleet-baseline.yaml +uv run pmlogsynth --validate generated-archives/fleet-bad-cpu.yaml # Validate the fleet profile -pmlogsynth fleet --validate generated-archives/20-host-web-cluster-fleet.yaml +uv run pmlogsynth fleet --validate generated-archives/.yaml ``` - **Exit 0**: Valid. Proceed. - **Exit 1**: Fix the error and retry once. - **Exit 2**: I/O error — report and stop. -## Step 6 — Report +## Step 6 — Generate the Archives + +After validation passes, generate the actual PCP archives. Derive the output directory +name from the fleet name in the profile: + +```bash +uv run pmlogsynth fleet --seed 42 -o ./generated-archives/ generated-archives/.yaml +``` + +Use `--seed 42` by default for reproducible host assignment. If generation fails, report +the error to the user. + +## Step 7 — Report Tell the user: -- All files saved and their paths -- How to preview the fleet assignment: - ```bash - pmlogsynth fleet --dry-run generated-archives/.yaml - ``` -- How to generate the archives: - ```bash - pmlogsynth fleet -o ./generated-archives/ generated-archives/.yaml - ``` -- How to generate with reproducible host assignment: +- All profile YAML files saved and their paths +- Where the archives were generated +- How to preview the host assignment: ```bash - pmlogsynth fleet --seed 42 -o ./generated-archives/ generated-archives/.yaml + uv run pmlogsynth fleet --dry-run generated-archives/.yaml ``` - How to inspect individual archives: ```bash diff --git a/.claude/skills/generate-fleet-profile/references/running-pmlogsynth.md b/.claude/skills/generate-fleet-profile/references/running-pmlogsynth.md new file mode 100644 index 0000000..6862b46 --- /dev/null +++ b/.claude/skills/generate-fleet-profile/references/running-pmlogsynth.md @@ -0,0 +1,47 @@ +# Running pmlogsynth + +pmlogsynth requires PCP Python bindings (`python3-pcp`) which are system packages, not +pip-installable. The project uses `uv` for dependency management. + +## Bootstrap + +Before running any `pmlogsynth` commands, ensure the environment is ready: + +```bash +uv sync +``` + +This is idempotent — safe to run every time. It installs/updates dependencies and makes +`uv run pmlogsynth` available. + +## Running commands + +Always use `uv run` to invoke pmlogsynth — this ensures the correct virtualenv is used: + +```bash +# Validate a workload profile +uv run pmlogsynth --validate + +# Generate an archive from a workload profile +uv run pmlogsynth -o ./generated-archives/ + +# Validate a fleet profile +uv run pmlogsynth fleet --validate + +# Generate fleet archives (--seed for reproducibility) +uv run pmlogsynth fleet --seed 42 -o ./generated-archives/ + +# Dry-run fleet (preview host assignments) +uv run pmlogsynth fleet --dry-run +``` + +## Troubleshooting + +If `uv run pmlogsynth` fails with an import error for `cpmapi` or `pcp`, PCP's Python +bindings are not installed system-wide. On macOS, run `./setup-venv.sh` first — it +creates a venv that links to Homebrew's PCP Python bindings. + +If `uv` is not available, fall back to: +```bash +pip install -e ".[dev]" && pmlogsynth ... +``` diff --git a/.claude/skills/generate-profile/SKILL.md b/.claude/skills/generate-profile/SKILL.md index 4ed87af..fda2768 100644 --- a/.claude/skills/generate-profile/SKILL.md +++ b/.claude/skills/generate-profile/SKILL.md @@ -17,12 +17,15 @@ Generate a valid pmlogsynth YAML workload profile from a natural language descri The profile describes a single host's workload over time using phases with CPU, memory, disk, and network stressors. -## Step 1 — Get the Schema +## Step 1 — Get the Schema and Bootstrap -Read the reference file at `references/profile-schema.md` (relative to this skill's -directory). This contains the complete, authoritative schema for workload profiles — -every field, type, default, constraint, and common validation error. +Read these reference files (relative to this skill's directory): +1. `references/profile-schema.md` — the complete, authoritative schema for workload + profiles (every field, type, default, constraint, and common validation error) +2. `references/running-pmlogsynth.md` — how to bootstrap and run pmlogsynth + +Before any validation or generation, run `uv sync` to ensure the environment is ready. Do NOT run `pmlogsynth --show-schema` — the bundled reference is identical and faster. ## Step 2 — Understand the Workload @@ -84,24 +87,35 @@ Example: "10-minute CPU spike at 90%" → `generated-archives/10-minute-cpu-spik ## Step 6 — Validate ```bash -pmlogsynth --validate generated-archives/.yaml +uv run pmlogsynth --validate generated-archives/.yaml ``` +If `uv run` fails because dependencies aren't synced, run `uv sync` first. + - **Exit 0**: Profile is valid. Proceed to Step 7. - **Exit 1** (validation error): Feed the error back into the generation, fix the YAML, and retry validation. If it fails a second time, show both the error and the YAML to the user and stop. - **Exit 2** (I/O error): Report the error and stop. -## Step 7 — Report +## Step 7 — Generate the Archive + +After validation passes, generate the actual PCP archive. Derive the archive name from +the profile filename (strip the `.yaml` extension): + +```bash +uv run pmlogsynth -o ./generated-archives/ generated-archives/.yaml +``` + +If generation fails, report the error to the user. + +## Step 8 — Report Tell the user: -- Where the profile was saved -- How to generate the archive: - ```bash - pmlogsynth -o ./generated-archives/ generated-archives/.yaml - ``` +- Where the profile YAML was saved +- Where the archive was generated (the `.0`, `.index`, `.meta` files) - How to inspect the archive: ```bash pmstat -a ./generated-archives/ + pmval -a ./generated-archives/ kernel.all.cpu.user ``` diff --git a/.claude/skills/generate-profile/references/running-pmlogsynth.md b/.claude/skills/generate-profile/references/running-pmlogsynth.md new file mode 100644 index 0000000..6862b46 --- /dev/null +++ b/.claude/skills/generate-profile/references/running-pmlogsynth.md @@ -0,0 +1,47 @@ +# Running pmlogsynth + +pmlogsynth requires PCP Python bindings (`python3-pcp`) which are system packages, not +pip-installable. The project uses `uv` for dependency management. + +## Bootstrap + +Before running any `pmlogsynth` commands, ensure the environment is ready: + +```bash +uv sync +``` + +This is idempotent — safe to run every time. It installs/updates dependencies and makes +`uv run pmlogsynth` available. + +## Running commands + +Always use `uv run` to invoke pmlogsynth — this ensures the correct virtualenv is used: + +```bash +# Validate a workload profile +uv run pmlogsynth --validate + +# Generate an archive from a workload profile +uv run pmlogsynth -o ./generated-archives/ + +# Validate a fleet profile +uv run pmlogsynth fleet --validate + +# Generate fleet archives (--seed for reproducibility) +uv run pmlogsynth fleet --seed 42 -o ./generated-archives/ + +# Dry-run fleet (preview host assignments) +uv run pmlogsynth fleet --dry-run +``` + +## Troubleshooting + +If `uv run pmlogsynth` fails with an import error for `cpmapi` or `pcp`, PCP's Python +bindings are not installed system-wide. On macOS, run `./setup-venv.sh` first — it +creates a venv that links to Homebrew's PCP Python bindings. + +If `uv` is not available, fall back to: +```bash +pip install -e ".[dev]" && pmlogsynth ... +``` From 9f0d022232bfe8a9a297e90ddd6ea0360d1db38c Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Fri, 20 Mar 2026 23:04:14 +0000 Subject: [PATCH 14/23] Update README to reflect skills generate actual archives --- README.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 9923c9e..81a2e95 100644 --- a/README.md +++ b/README.md @@ -119,7 +119,8 @@ pmlogsynth --show-schema # dump the full profile schema (for AI agents) ### 6. Generate profiles with AI If you're using [Claude Code](https://claude.ai/claude-code) with this repo checked out, -two built-in skills can generate valid YAML profiles from plain-English descriptions: +two built-in skills can generate YAML profiles from plain-English descriptions, +validate them, and generate the actual PCP archives — all in one step: **Single-host workload profiles** — just describe the scenario: @@ -138,9 +139,9 @@ two built-in skills can generate valid YAML profiles from plain-English descript > create a small 5-host dev cluster with normal web traffic for an hour ``` -The skills bundle the full schema as context, validate the output against -`pmlogsynth --validate` (or `pmlogsynth fleet --validate`), and save the -generated files to `generated-archives/`. +The skills bundle the full schema as context, validate the output, and run +`pmlogsynth` to generate the PCP archives — ready to inspect with `pmstat`, +`pmval`, or `pmrep`. All output goes to `generated-archives/`. --- From cc52a4df685cd0190253d5fd5151dd107675eeef Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Sat, 21 Mar 2026 00:30:14 +0000 Subject: [PATCH 15/23] Add InlineProfile, remove file-path fields from fleet models Prepare models for single-file fleet profiles: add InlineProfile dataclass, remove baseline_path/profile_paths/workload_path fields that referenced external files. --- pmlogsynth/fleet/models.py | 18 +++++++++++------- tests/unit/test_fleet.py | 17 +++++++++++++++++ 2 files changed, 28 insertions(+), 7 deletions(-) diff --git a/pmlogsynth/fleet/models.py b/pmlogsynth/fleet/models.py index a611a63..c999526 100644 --- a/pmlogsynth/fleet/models.py +++ b/pmlogsynth/fleet/models.py @@ -1,8 +1,7 @@ """Fleet data models — pure dataclasses, no logic.""" from dataclasses import dataclass, field -from pathlib import Path -from typing import List +from typing import Any, Dict, List @dataclass @@ -16,13 +15,19 @@ class FleetMeta: hardware: str +@dataclass +class InlineProfile: + """A named workload profile defined inline in a fleet file.""" + + phases: List[Dict[str, Any]] + + @dataclass class HostsConfig: """Baseline host configuration.""" count: int baseline: str - baseline_path: Path jitter: float = 0.0 @@ -33,7 +38,6 @@ class BadActorsConfig: count: int = 0 jitter: float = 0.0 profiles: List[str] = field(default_factory=list) - profile_paths: List[Path] = field(default_factory=list) @dataclass @@ -43,14 +47,14 @@ class FleetProfile: meta: FleetMeta hosts: HostsConfig bad_actors: BadActorsConfig + profiles: Dict[str, InlineProfile] = field(default_factory=dict) @dataclass class HostAssignment: - """One host's role, jitter factor, and workload path.""" + """One host's role, jitter factor, and workload profile name.""" hostname: str role: str # "baseline" or "bad_actor" jitter_factor: float - workload_path: Path - workload_rel: str = "" + workload_rel: str diff --git a/tests/unit/test_fleet.py b/tests/unit/test_fleet.py index 529e695..04e4f60 100644 --- a/tests/unit/test_fleet.py +++ b/tests/unit/test_fleet.py @@ -7,6 +7,23 @@ FLEET_FIXTURES = Path(__file__).parent.parent / "fixtures" / "fleet" +class TestInlineProfile: + """Tests for the InlineProfile dataclass.""" + + def test_inline_profile_holds_phases_raw(self) -> None: + from pmlogsynth.fleet.models import InlineProfile + + phases = [{"name": "steady", "duration": 600, "cpu": {"utilization": 0.5}}] + profile = InlineProfile(phases=phases) + assert profile.phases == phases + + def test_inline_profile_default_empty_phases(self) -> None: + from pmlogsynth.fleet.models import InlineProfile + + profile = InlineProfile(phases=[]) + assert profile.phases == [] + + class TestLoadFleetProfile: """Tests for load_fleet_profile YAML parsing.""" From fda28e01072c08b075a70d64d6257a9474930647 Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Sat, 21 Mar 2026 00:32:22 +0000 Subject: [PATCH 16/23] Rewrite fleet loader and fixtures for single-file profiles Convert fleet profile format from multi-file (external workload YAML references) to single-file (inline named profiles). Fixture, loader, and tests updated atomically. --- pmlogsynth/fleet/loader.py | 68 ++++++++++++++++------ tests/fixtures/fleet/bad-cpu.yaml | 19 ------- tests/fixtures/fleet/baseline.yaml | 26 --------- tests/fixtures/fleet/test-fleet.yaml | 38 ++++++++++++- tests/unit/test_fleet.py | 85 ++++++++++++++++++++++++---- 5 files changed, 162 insertions(+), 74 deletions(-) delete mode 100644 tests/fixtures/fleet/bad-cpu.yaml delete mode 100644 tests/fixtures/fleet/baseline.yaml diff --git a/pmlogsynth/fleet/loader.py b/pmlogsynth/fleet/loader.py index cc4dbaf..a677f88 100644 --- a/pmlogsynth/fleet/loader.py +++ b/pmlogsynth/fleet/loader.py @@ -10,6 +10,7 @@ FleetMeta, FleetProfile, HostsConfig, + InlineProfile, ) from pmlogsynth.profile import ValidationError, parse_duration @@ -51,7 +52,32 @@ def _parse_fleet_meta(raw: Dict[str, Any]) -> FleetMeta: ) -def _parse_hosts(raw: Dict[str, Any], fleet_dir: Path) -> HostsConfig: +def _parse_profiles(raw: Dict[str, Any]) -> Dict[str, InlineProfile]: + """Parse and validate the profiles section of a fleet profile.""" + section = raw.get("profiles") + if not isinstance(section, dict): + raise ValidationError("fleet profile missing 'profiles' section") + + profiles = {} # type: Dict[str, InlineProfile] + for name, body in section.items(): + if not isinstance(body, dict): + raise ValidationError( + "profile '{}' must be a mapping".format(name) + ) + phases = body.get("phases") + if not isinstance(phases, list) or len(phases) == 0: + raise ValidationError( + "profile '{}' phases must be a non-empty list".format(name) + ) + profiles[str(name)] = InlineProfile(phases=phases) + + return profiles + + +def _parse_hosts( + raw: Dict[str, Any], + profiles: Dict[str, InlineProfile], +) -> HostsConfig: """Parse and validate the hosts section of a fleet profile.""" hosts = raw.get("hosts") if not isinstance(hosts, dict): @@ -64,14 +90,18 @@ def _parse_hosts(raw: Dict[str, Any], fleet_dir: Path) -> HostsConfig: baseline = hosts.get("baseline") if not baseline: raise ValidationError("hosts.baseline is required") + baseline = str(baseline) + + if baseline not in profiles: + raise ValidationError( + "hosts.baseline '{}' not found in profiles".format(baseline) + ) jitter = float(hosts.get("jitter", 0.0)) - baseline_path = fleet_dir / str(baseline) return HostsConfig( count=count, - baseline=str(baseline), - baseline_path=baseline_path, + baseline=baseline, jitter=jitter, ) @@ -79,7 +109,7 @@ def _parse_hosts(raw: Dict[str, Any], fleet_dir: Path) -> HostsConfig: def _parse_bad_actors( raw: Dict[str, Any], hosts_config: HostsConfig, - fleet_dir: Path, + profiles: Dict[str, InlineProfile], ) -> BadActorsConfig: """Parse and validate the bad_actors section of a fleet profile.""" section = raw.get("bad_actors") @@ -105,32 +135,38 @@ def _parse_bad_actors( jitter = hosts_config.jitter profiles_raw = section.get("profiles", []) - profiles = [str(p) for p in profiles_raw] - profile_paths = [fleet_dir / p for p in profiles] + profile_names = [str(p) for p in profiles_raw] + + for name in profile_names: + if name not in profiles: + raise ValidationError( + "bad_actors profile '{}' not found in profiles".format(name) + ) return BadActorsConfig( count=count, jitter=jitter, - profiles=profiles, - profile_paths=profile_paths, + profiles=profile_names, ) def load_fleet_profile(path: Path) -> FleetProfile: """Load and validate a fleet profile YAML file. - Workload paths (baseline, bad-actor profiles) are resolved relative - to the directory containing the fleet YAML file. + All workload profiles are defined inline in the 'profiles' section. + References in hosts.baseline and bad_actors.profiles are validated + against profile names. """ text = path.read_text() raw = yaml.safe_load(text) if not isinstance(raw, dict): raise ValidationError("fleet profile must be a YAML mapping") - fleet_dir = path.parent - meta = _parse_fleet_meta(raw) - hosts = _parse_hosts(raw, fleet_dir) - bad_actors = _parse_bad_actors(raw, hosts, fleet_dir) + profiles = _parse_profiles(raw) + hosts = _parse_hosts(raw, profiles) + bad_actors = _parse_bad_actors(raw, hosts, profiles) - return FleetProfile(meta=meta, hosts=hosts, bad_actors=bad_actors) + return FleetProfile( + meta=meta, hosts=hosts, bad_actors=bad_actors, profiles=profiles, + ) diff --git a/tests/fixtures/fleet/bad-cpu.yaml b/tests/fixtures/fleet/bad-cpu.yaml deleted file mode 100644 index 328b74b..0000000 --- a/tests/fixtures/fleet/bad-cpu.yaml +++ /dev/null @@ -1,19 +0,0 @@ -meta: - hostname: bad-host - duration: 600 - interval: 60 - -host: - profile: generic-small - -phases: - - name: saturated - duration: 600 - cpu: - utilization: 0.96 - user_ratio: 0.85 - sys_ratio: 0.10 - iowait_ratio: 0.05 - memory: - used_ratio: 0.70 - cache_ratio: 0.10 diff --git a/tests/fixtures/fleet/baseline.yaml b/tests/fixtures/fleet/baseline.yaml deleted file mode 100644 index 9ce459e..0000000 --- a/tests/fixtures/fleet/baseline.yaml +++ /dev/null @@ -1,26 +0,0 @@ -meta: - hostname: baseline-host - duration: 600 - interval: 60 - -host: - profile: generic-small - -phases: - - name: steady - duration: 600 - cpu: - utilization: 0.50 - user_ratio: 0.70 - sys_ratio: 0.20 - iowait_ratio: 0.10 - memory: - used_ratio: 0.40 - cache_ratio: 0.20 - disk: - read_mbps: 10.0 - write_mbps: 5.0 - network: - rx_mbps: 100.0 - tx_mbps: 50.0 - error_rate: 0.001 diff --git a/tests/fixtures/fleet/test-fleet.yaml b/tests/fixtures/fleet/test-fleet.yaml index 1b9dba2..351f36d 100644 --- a/tests/fixtures/fleet/test-fleet.yaml +++ b/tests/fixtures/fleet/test-fleet.yaml @@ -5,13 +5,47 @@ meta: hostname_prefix: host hardware: generic-small +profiles: + baseline: + phases: + - name: steady + duration: 600 + cpu: + utilization: 0.50 + user_ratio: 0.70 + sys_ratio: 0.20 + iowait_ratio: 0.10 + memory: + used_ratio: 0.40 + cache_ratio: 0.20 + disk: + read_mbps: 10.0 + write_mbps: 5.0 + network: + rx_mbps: 100.0 + tx_mbps: 50.0 + error_rate: 0.001 + + bad-cpu: + phases: + - name: saturated + duration: 600 + cpu: + utilization: 0.96 + user_ratio: 0.85 + sys_ratio: 0.10 + iowait_ratio: 0.05 + memory: + used_ratio: 0.70 + cache_ratio: 0.10 + hosts: count: 5 - baseline: baseline.yaml + baseline: baseline jitter: 0.05 bad_actors: count: 1 jitter: 0.15 profiles: - - bad-cpu.yaml + - bad-cpu diff --git a/tests/unit/test_fleet.py b/tests/unit/test_fleet.py index 04e4f60..7796bdc 100644 --- a/tests/unit/test_fleet.py +++ b/tests/unit/test_fleet.py @@ -37,10 +37,61 @@ def test_loads_valid_fleet_profile(self) -> None: assert fleet.meta.hostname_prefix == "host" assert fleet.meta.hardware == "generic-small" assert fleet.hosts.count == 5 + assert fleet.hosts.baseline == "baseline" assert fleet.hosts.jitter == 0.05 assert fleet.bad_actors.count == 1 assert fleet.bad_actors.jitter == 0.15 assert len(fleet.bad_actors.profiles) == 1 + assert "baseline" in fleet.profiles + assert "bad-cpu" in fleet.profiles + + def test_profiles_contain_phases(self) -> None: + from pmlogsynth.fleet import load_fleet_profile + + fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") + assert len(fleet.profiles["baseline"].phases) == 1 + assert fleet.profiles["baseline"].phases[0]["name"] == "steady" + assert len(fleet.profiles["bad-cpu"].phases) == 1 + assert fleet.profiles["bad-cpu"].phases[0]["name"] == "saturated" + + def test_missing_profiles_section_raises(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import load_fleet_profile + from pmlogsynth.profile import ValidationError + + (tmp_path / "bad.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: x\n hardware: generic-small\n" + "hosts:\n count: 1\n baseline: foo\n" + ) + with pytest.raises(ValidationError, match="profiles"): + load_fleet_profile(tmp_path / "bad.yaml") + + def test_baseline_references_missing_profile_raises(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import load_fleet_profile + from pmlogsynth.profile import ValidationError + + (tmp_path / "bad.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: x\n hardware: generic-small\n" + "profiles:\n foo:\n phases:\n - name: a\n duration: 60\n" + "hosts:\n count: 1\n baseline: bar\n" + ) + with pytest.raises(ValidationError, match="bar.*not found in profiles"): + load_fleet_profile(tmp_path / "bad.yaml") + + def test_bad_actor_references_missing_profile_raises(self, tmp_path: Path) -> None: + from pmlogsynth.fleet import load_fleet_profile + from pmlogsynth.profile import ValidationError + + (tmp_path / "bad.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: x\n hardware: generic-small\n" + "profiles:\n foo:\n phases:\n - name: a\n duration: 60\n" + "hosts:\n count: 2\n baseline: foo\n" + "bad_actors:\n count: 1\n profiles:\n - missing\n" + ) + with pytest.raises(ValidationError, match="missing.*not found in profiles"): + load_fleet_profile(tmp_path / "bad.yaml") def test_missing_meta_name_raises(self, tmp_path: Path) -> None: from pmlogsynth.fleet import load_fleet_profile @@ -49,7 +100,8 @@ def test_missing_meta_name_raises(self, tmp_path: Path) -> None: (tmp_path / "bad.yaml").write_text( "meta:\n duration: 600\n interval: 60\n" " hostname_prefix: x\n hardware: generic-small\n" - "hosts:\n count: 1\n baseline: x.yaml\n" + "profiles:\n foo:\n phases:\n - name: a\n duration: 60\n" + "hosts:\n count: 1\n baseline: foo\n" ) with pytest.raises(ValidationError, match="meta.name"): load_fleet_profile(tmp_path / "bad.yaml") @@ -61,6 +113,7 @@ def test_missing_hosts_raises(self, tmp_path: Path) -> None: (tmp_path / "bad.yaml").write_text( "meta:\n name: x\n duration: 600\n interval: 60\n" " hostname_prefix: x\n hardware: generic-small\n" + "profiles:\n foo:\n phases:\n - name: a\n duration: 60\n" ) with pytest.raises(ValidationError, match="hosts"): load_fleet_profile(tmp_path / "bad.yaml") @@ -72,8 +125,9 @@ def test_bad_actors_count_exceeds_host_count_raises(self, tmp_path: Path) -> Non (tmp_path / "bad.yaml").write_text( "meta:\n name: x\n duration: 600\n interval: 60\n" " hostname_prefix: x\n hardware: generic-small\n" - "hosts:\n count: 2\n baseline: x.yaml\n" - "bad_actors:\n count: 3\n profiles:\n - y.yaml\n" + "profiles:\n foo:\n phases:\n - name: a\n duration: 60\n" + "hosts:\n count: 2\n baseline: foo\n" + "bad_actors:\n count: 3\n profiles:\n - foo\n" ) with pytest.raises(ValidationError, match="bad_actors.count"): load_fleet_profile(tmp_path / "bad.yaml") @@ -84,8 +138,9 @@ def test_bad_actors_defaults_jitter_to_hosts_jitter(self, tmp_path: Path) -> Non (tmp_path / "f.yaml").write_text( "meta:\n name: x\n duration: 600\n interval: 60\n" " hostname_prefix: x\n hardware: generic-small\n" - "hosts:\n count: 3\n baseline: x.yaml\n jitter: 0.08\n" - "bad_actors:\n count: 1\n profiles:\n - y.yaml\n" + "profiles:\n foo:\n phases:\n - name: a\n duration: 60\n" + "hosts:\n count: 3\n baseline: foo\n jitter: 0.08\n" + "bad_actors:\n count: 1\n profiles:\n - foo\n" ) fleet = load_fleet_profile(tmp_path / "f.yaml") assert fleet.bad_actors.jitter == 0.08 @@ -96,7 +151,8 @@ def test_no_bad_actors_section_is_valid(self, tmp_path: Path) -> None: (tmp_path / "f.yaml").write_text( "meta:\n name: x\n duration: 600\n interval: 60\n" " hostname_prefix: x\n hardware: generic-small\n" - "hosts:\n count: 3\n baseline: x.yaml\n" + "profiles:\n foo:\n phases:\n - name: a\n duration: 60\n" + "hosts:\n count: 3\n baseline: foo\n" ) fleet = load_fleet_profile(tmp_path / "f.yaml") assert fleet.bad_actors.count == 0 @@ -108,18 +164,25 @@ def test_duration_accepts_duration_strings(self, tmp_path: Path) -> None: (tmp_path / "f.yaml").write_text( "meta:\n name: x\n duration: 24h\n interval: 15s\n" " hostname_prefix: x\n hardware: generic-small\n" - "hosts:\n count: 1\n baseline: x.yaml\n" + "profiles:\n foo:\n phases:\n - name: a\n duration: 60\n" + "hosts:\n count: 1\n baseline: foo\n" ) fleet = load_fleet_profile(tmp_path / "f.yaml") assert fleet.meta.duration == 86400 assert fleet.meta.interval == 15 - def test_workload_paths_resolved_relative_to_fleet_file(self) -> None: + def test_profile_with_empty_phases_raises(self, tmp_path: Path) -> None: from pmlogsynth.fleet import load_fleet_profile + from pmlogsynth.profile import ValidationError - fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") - assert fleet.hosts.baseline_path.exists() - assert fleet.hosts.baseline_path.name == "baseline.yaml" + (tmp_path / "bad.yaml").write_text( + "meta:\n name: x\n duration: 600\n interval: 60\n" + " hostname_prefix: x\n hardware: generic-small\n" + "profiles:\n foo:\n phases: []\n" + "hosts:\n count: 1\n baseline: foo\n" + ) + with pytest.raises(ValidationError, match="phases.*non-empty"): + load_fleet_profile(tmp_path / "bad.yaml") class TestAssignHosts: From 21f346b334519b991aed3d97424b587fcce1b924 Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Sat, 21 Mar 2026 00:34:07 +0000 Subject: [PATCH 17/23] Update host assignment to use profile names instead of paths MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit HostAssignment no longer carries workload_path — workload_rel is the profile name referencing the fleet's inline profiles dict. --- pmlogsynth/fleet/assignment.py | 5 +---- tests/unit/test_fleet.py | 8 +++++--- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/pmlogsynth/fleet/assignment.py b/pmlogsynth/fleet/assignment.py index ccdd736..1c75b8d 100644 --- a/pmlogsynth/fleet/assignment.py +++ b/pmlogsynth/fleet/assignment.py @@ -38,7 +38,7 @@ def assign_hosts( # Pick which host indices are bad actors bad_actor_indices = set(rng.sample(range(count), fleet.bad_actors.count)) - assignments: List[HostAssignment] = [] + assignments = [] # type: List[HostAssignment] for i in range(count): hostname = "{}-{}".format( fleet.meta.hostname_prefix, @@ -51,12 +51,10 @@ def assign_hosts( jitter_stddev = fleet.bad_actors.jitter # Pick a profile from the bad-actor pool profile_idx = rng.randrange(len(fleet.bad_actors.profiles)) - workload_path = fleet.bad_actors.profile_paths[profile_idx] workload_rel = fleet.bad_actors.profiles[profile_idx] else: role = "baseline" jitter_stddev = fleet.hosts.jitter - workload_path = fleet.hosts.baseline_path workload_rel = fleet.hosts.baseline # Generate a stable, deterministic jitter factor per host @@ -69,7 +67,6 @@ def assign_hosts( hostname=hostname, role=role, jitter_factor=jitter_factor, - workload_path=workload_path, workload_rel=workload_rel, ) ) diff --git a/tests/unit/test_fleet.py b/tests/unit/test_fleet.py index 7796bdc..0fc40f3 100644 --- a/tests/unit/test_fleet.py +++ b/tests/unit/test_fleet.py @@ -258,7 +258,8 @@ def test_no_bad_actors_all_baseline(self, tmp_path: Path) -> None: (tmp_path / "f.yaml").write_text( "meta:\n name: x\n duration: 600\n interval: 60\n" " hostname_prefix: srv\n hardware: generic-small\n" - "hosts:\n count: 3\n baseline: x.yaml\n" + "profiles:\n base:\n phases:\n - name: a\n duration: 60\n" + "hosts:\n count: 3\n baseline: base\n" ) fleet = load_fleet_profile(tmp_path / "f.yaml") assignments = assign_hosts(fleet, seed=1) @@ -277,7 +278,8 @@ def test_zero_pad_width_scales_with_count(self, tmp_path: Path) -> None: (tmp_path / "f.yaml").write_text( "meta:\n name: x\n duration: 600\n interval: 60\n" " hostname_prefix: srv\n hardware: generic-small\n" - "hosts:\n count: 100\n baseline: x.yaml\n" + "profiles:\n base:\n phases:\n - name: a\n duration: 60\n" + "hosts:\n count: 100\n baseline: base\n" ) fleet = load_fleet_profile(tmp_path / "f.yaml") assignments = assign_hosts(fleet, seed=1) @@ -291,7 +293,7 @@ def test_bad_actor_profiles_selected_from_pool(self) -> None: assignments = assign_hosts(fleet, seed=42) bad = [a for a in assignments if a.role == "bad_actor"] for b in bad: - assert b.workload_path.name in ("bad-cpu.yaml",) + assert b.workload_rel in ("bad-cpu",) class TestWriteManifest: From acf6222502253489b4819e730128e27a6d484fd6 Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Sat, 21 Mar 2026 00:35:09 +0000 Subject: [PATCH 18/23] Build WorkloadProfile from inline data, simplify warnings to no-op Orchestrator constructs workload YAML from fleet.profiles entries and fleet-level meta, then feeds it through WorkloadProfile.from_string(). Warnings module becomes no-op since inline profiles can't conflict. --- pmlogsynth/fleet/orchestrator.py | 52 ++++++++++++++++--------- pmlogsynth/fleet/warnings.py | 66 +++----------------------------- 2 files changed, 40 insertions(+), 78 deletions(-) diff --git a/pmlogsynth/fleet/orchestrator.py b/pmlogsynth/fleet/orchestrator.py index fffa9fc..e5d91d9 100644 --- a/pmlogsynth/fleet/orchestrator.py +++ b/pmlogsynth/fleet/orchestrator.py @@ -4,11 +4,38 @@ import sys from datetime import datetime from pathlib import Path -from typing import List, Optional +from typing import Any, Dict, List, Optional + +import yaml from pmlogsynth.fleet.manifest import write_manifest from pmlogsynth.fleet.models import FleetProfile, HostAssignment -from pmlogsynth.fleet.warnings import check_override_warnings + + +def _build_workload_yaml( + fleet: FleetProfile, + assignment: HostAssignment, +) -> str: + """Build a standalone workload profile YAML string from inline data. + + Constructs a complete workload profile dict with fleet-level meta + overrides, then serialises to YAML for WorkloadProfile.from_string(). + """ + inline = fleet.profiles[assignment.workload_rel] + + workload = { + "meta": { + "hostname": assignment.hostname, + "duration": fleet.meta.duration, + "interval": fleet.meta.interval, + }, + "host": { + "profile": fleet.meta.hardware, + }, + "phases": inline.phases, + } # type: Dict[str, Any] + + return yaml.dump(workload, default_flow_style=False, sort_keys=False) def generate_fleet( @@ -30,8 +57,6 @@ def generate_fleet( _writer_mod = importlib.import_module("pmlogsynth.writer") ArchiveWriter = _writer_mod.ArchiveWriter - from dataclasses import replace - from pmlogsynth.jitter import apply_jitter from pmlogsynth.profile import ProfileResolver, WorkloadProfile from pmlogsynth.sampler import ValueSampler @@ -41,23 +66,12 @@ def generate_fleet( resolver = ProfileResolver(config_dir=config_dir) hardware = resolver.resolve(fleet.meta.hardware) - # Check for override warnings (once, before generation loop) - check_override_warnings(fleet) - def _generate_one(assignment: HostAssignment) -> None: """Generate a single host archive.""" - profile_text = assignment.workload_path.read_text(encoding="utf-8") + workload_yaml = _build_workload_yaml(fleet, assignment) profile = WorkloadProfile.from_string( - profile_text, config_dir=config_dir, - ) - - overridden_meta = replace( - profile.meta, - hostname=assignment.hostname, - duration=fleet.meta.duration, - interval=fleet.meta.interval, + workload_yaml, config_dir=config_dir, ) - profile = replace(profile, meta=overridden_meta, hardware=hardware) profile = apply_jitter(profile, assignment.jitter_factor) @@ -81,7 +95,9 @@ def _generate_one(assignment: HostAssignment) -> None: file=sys.stderr, ) - # Generate archives — ThreadPoolExecutor for --jobs>1. + # Generate archives — sequential by default. + # NOTE: PCP's pmiLogImport C library is not thread-safe. + # See https://github.com/tallpsmith/pmlogsynth/issues/16 if jobs <= 1: for assignment in assignments: _generate_one(assignment) diff --git a/pmlogsynth/fleet/warnings.py b/pmlogsynth/fleet/warnings.py index ae3d2b1..e54f515 100644 --- a/pmlogsynth/fleet/warnings.py +++ b/pmlogsynth/fleet/warnings.py @@ -1,66 +1,12 @@ -"""Override warning checks for fleet vs workload profile conflicts.""" +"""Override warning checks — retained as no-op for API compatibility. -import logging - -import yaml +With inline profiles, fleet-level meta is the only source of truth for +duration/interval/hardware. There are no external files to conflict with. +""" from pmlogsynth.fleet.models import FleetProfile -from pmlogsynth.profile import parse_duration - -logger = logging.getLogger(__name__) def check_override_warnings(fleet: FleetProfile) -> None: - """Emit warnings for workload profile values that fleet settings override.""" - seen = set() - - all_paths = [fleet.hosts.baseline_path] - all_rels = [fleet.hosts.baseline] - for idx in range(len(fleet.bad_actors.profiles)): - all_paths.append(fleet.bad_actors.profile_paths[idx]) - all_rels.append(fleet.bad_actors.profiles[idx]) - - for wpath, wrel in zip(all_paths, all_rels): - if wpath in seen: - continue - seen.add(wpath) - - try: - raw = yaml.safe_load(wpath.read_text(encoding="utf-8")) - except (OSError, yaml.YAMLError): - continue - - if not isinstance(raw, dict): - continue - - meta = raw.get("meta", {}) - if not isinstance(meta, dict): - continue - - if "duration" in meta: - profile_duration = parse_duration(meta["duration"]) - if profile_duration != fleet.meta.duration: - logger.warning( - "workload profile '%s' defines duration=%s " - "— overridden by fleet setting duration=%s", - wrel, profile_duration, fleet.meta.duration, - ) - - if "interval" in meta: - profile_interval = parse_duration(meta["interval"]) - if profile_interval != fleet.meta.interval: - logger.warning( - "workload profile '%s' defines interval=%s " - "— overridden by fleet setting interval=%s", - wrel, profile_interval, fleet.meta.interval, - ) - - host = raw.get("host", {}) - if isinstance(host, dict) and "profile" in host: - profile_hw = str(host["profile"]) - if profile_hw != fleet.meta.hardware: - logger.warning( - "workload profile '%s' defines hardware=%s " - "— overridden by fleet setting hardware=%s", - wrel, profile_hw, fleet.meta.hardware, - ) + """No-op — inline profiles cannot conflict with fleet meta.""" + pass From 5998810f580197254865c0b8958ef5dc0b7e9574 Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Sat, 21 Mar 2026 00:36:18 +0000 Subject: [PATCH 19/23] Export InlineProfile, update warning tests for no-op Add InlineProfile to fleet package exports. Replace override warning tests with single no-op verification since inline profiles can't conflict. --- pmlogsynth/fleet/__init__.py | 2 ++ tests/unit/test_fleet.py | 21 +++++---------------- 2 files changed, 7 insertions(+), 16 deletions(-) diff --git a/pmlogsynth/fleet/__init__.py b/pmlogsynth/fleet/__init__.py index c5c6511..d5d989a 100644 --- a/pmlogsynth/fleet/__init__.py +++ b/pmlogsynth/fleet/__init__.py @@ -17,6 +17,7 @@ FleetProfile, HostAssignment, HostsConfig, + InlineProfile, ) from pmlogsynth.fleet.orchestrator import generate_fleet from pmlogsynth.fleet.warnings import check_override_warnings @@ -30,6 +31,7 @@ "generate_fleet", "HostAssignment", "HostsConfig", + "InlineProfile", "load_fleet_profile", "print_dry_run", "write_manifest", diff --git a/tests/unit/test_fleet.py b/tests/unit/test_fleet.py index 0fc40f3..a391d4c 100644 --- a/tests/unit/test_fleet.py +++ b/tests/unit/test_fleet.py @@ -356,22 +356,11 @@ def test_manifest_records_none_seed(self, tmp_path: Path) -> None: class TestOverrideWarnings: - """Tests for warnings when fleet settings override workload profile values.""" + """Override warnings are no longer applicable with inline profiles.""" - def test_warns_on_duration_conflict(self, caplog: pytest.LogCaptureFixture) -> None: - import logging - - from pmlogsynth.fleet import check_override_warnings, load_fleet_profile - - fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") - from dataclasses import replace - - fleet_different = replace(fleet, meta=replace(fleet.meta, duration=3600)) - with caplog.at_level(logging.WARNING): - check_override_warnings(fleet_different) - assert any("duration" in r.message for r in caplog.records) - - def test_no_warning_when_values_match(self, caplog: pytest.LogCaptureFixture) -> None: + def test_check_override_warnings_is_noop( + self, caplog: pytest.LogCaptureFixture, + ) -> None: import logging from pmlogsynth.fleet import check_override_warnings, load_fleet_profile @@ -379,7 +368,7 @@ def test_no_warning_when_values_match(self, caplog: pytest.LogCaptureFixture) -> fleet = load_fleet_profile(FLEET_FIXTURES / "test-fleet.yaml") with caplog.at_level(logging.WARNING): check_override_warnings(fleet) - assert not any("duration" in r.message for r in caplog.records) + assert len(caplog.records) == 0 class TestDryRun: From a4c6f674007393e818ee97884bf48a7e6dfdff7d Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Sat, 21 Mar 2026 00:38:02 +0000 Subject: [PATCH 20/23] Fix lint errors and jitter test fixture after baseline.yaml removal Remove unused imports from orchestrator. Create standalone jitter test fixture since the old fleet/baseline.yaml was deleted. --- pmlogsynth/fleet/orchestrator.py | 4 ++-- tests/fixtures/jitter-baseline.yaml | 26 ++++++++++++++++++++++++++ tests/unit/test_jitter.py | 2 +- 3 files changed, 29 insertions(+), 3 deletions(-) create mode 100644 tests/fixtures/jitter-baseline.yaml diff --git a/pmlogsynth/fleet/orchestrator.py b/pmlogsynth/fleet/orchestrator.py index e5d91d9..0729397 100644 --- a/pmlogsynth/fleet/orchestrator.py +++ b/pmlogsynth/fleet/orchestrator.py @@ -4,7 +4,7 @@ import sys from datetime import datetime from pathlib import Path -from typing import Any, Dict, List, Optional +from typing import List, Optional import yaml @@ -33,7 +33,7 @@ def _build_workload_yaml( "profile": fleet.meta.hardware, }, "phases": inline.phases, - } # type: Dict[str, Any] + } return yaml.dump(workload, default_flow_style=False, sort_keys=False) diff --git a/tests/fixtures/jitter-baseline.yaml b/tests/fixtures/jitter-baseline.yaml new file mode 100644 index 0000000..9ce459e --- /dev/null +++ b/tests/fixtures/jitter-baseline.yaml @@ -0,0 +1,26 @@ +meta: + hostname: baseline-host + duration: 600 + interval: 60 + +host: + profile: generic-small + +phases: + - name: steady + duration: 600 + cpu: + utilization: 0.50 + user_ratio: 0.70 + sys_ratio: 0.20 + iowait_ratio: 0.10 + memory: + used_ratio: 0.40 + cache_ratio: 0.20 + disk: + read_mbps: 10.0 + write_mbps: 5.0 + network: + rx_mbps: 100.0 + tx_mbps: 50.0 + error_rate: 0.001 diff --git a/tests/unit/test_jitter.py b/tests/unit/test_jitter.py index 513ca82..5afdcbf 100644 --- a/tests/unit/test_jitter.py +++ b/tests/unit/test_jitter.py @@ -10,7 +10,7 @@ def baseline_profile() -> WorkloadProfile: """Load the fleet baseline workload profile.""" from pathlib import Path - fixture = Path(__file__).parent.parent / "fixtures" / "fleet" / "baseline.yaml" + fixture = Path(__file__).parent.parent / "fixtures" / "jitter-baseline.yaml" return WorkloadProfile.from_file(fixture) From 8f31ae560750d560e214f3304a96b90511b0322a Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Sat, 21 Mar 2026 01:03:52 +0000 Subject: [PATCH 21/23] Update fleet skill for single-file profile format Skill now generates one self-contained YAML file with inline workload profiles instead of coordinating multiple files. --- .../skills/generate-fleet-profile/SKILL.md | 79 ++++------ .../references/fleet-schema.md | 135 +++++++++++++----- 2 files changed, 127 insertions(+), 87 deletions(-) diff --git a/.claude/skills/generate-fleet-profile/SKILL.md b/.claude/skills/generate-fleet-profile/SKILL.md index 6012999..7cbfb02 100644 --- a/.claude/skills/generate-fleet-profile/SKILL.md +++ b/.claude/skills/generate-fleet-profile/SKILL.md @@ -15,31 +15,32 @@ description: > # Generate pmlogsynth Fleet Profile -Generate a fleet profile that produces multiple PCP archives — one per simulated host — -from a single YAML file. Fleet profiles describe a pool of hosts sharing common hardware, +Generate a single self-contained fleet profile YAML that produces multiple PCP archives — +one per simulated host. Fleet profiles describe a pool of hosts sharing common hardware, with a majority running a baseline workload and an optional minority running as "bad -actors" with different workload profiles. +actors" with different workload profiles. All workload definitions are inline — no +external files needed. ## Key Concepts A fleet profile is **not** a workload profile. It's a higher-level orchestrator: - **All hosts share one hardware profile** (from `meta.hardware`) -- **Baseline hosts** all run the same workload profile, with per-host jitter for variation -- **Bad-actor hosts** are randomly selected from the pool and assigned workload profiles +- **Workload profiles are defined inline** in a `profiles` section, referenced by name +- **Baseline hosts** all run the same named profile, with per-host jitter for variation +- **Bad-actor hosts** are randomly selected from the pool and assigned profiles from a separate list (e.g. CPU-saturated, memory-exhausted scenarios) - **Jitter** adds Gaussian noise (±N%) to all stressor values per host, so no two hosts are identical even if they share the same workload -- **Fleet-level `duration` and `interval`** override whatever the individual workload - profiles specify +- **Fleet-level `duration`, `interval`, and `hardware`** apply to all hosts ## Step 1 — Read the Schema References and Bootstrap Read these reference files (relative to this skill's directory): 1. `references/fleet-schema.md` — the fleet profile format, fields, and validation rules -2. `references/workload-profile-schema.md` — the workload profile format (for generating - the baseline and bad-actor workload profiles the fleet references) +2. `references/workload-profile-schema.md` — the workload profile stressor format (for + generating the phase definitions inside named profiles) 3. `references/running-pmlogsynth.md` — how to bootstrap and run pmlogsynth Before any validation or generation, run `uv sync` to ensure the environment is ready. @@ -59,43 +60,23 @@ Key details to extract: - **Duration** — how long the simulation runs - **Hardware class** — what size machines (`generic-small` through `storage-optimized`) -## Step 3 — Generate the Files +## Step 3 — Generate the Fleet Profile YAML -A fleet profile references external workload profile files. You need to generate **all** -of these files: - -### 3a. Generate Workload Profiles - -Create the baseline and bad-actor workload profile YAML files. These are standard -pmlogsynth workload profiles (same format as the generate-profile skill produces). - -Save them to `generated-archives/` alongside the fleet profile. For example: -- `generated-archives/fleet-baseline.yaml` — the healthy workload -- `generated-archives/fleet-bad-cpu.yaml` — a CPU-saturated workload -- `generated-archives/fleet-bad-memory.yaml` — a memory-exhausted workload - -The workload profiles should be complete and valid on their own. The fleet will override -their `duration`, `interval`, `hostname`, and `hardware` settings — but the profiles -still need valid values for standalone validation. - -**Important:** Workload profile paths in the fleet YAML are resolved relative to the -fleet profile file's directory. If the fleet profile and workload profiles are in the -same directory, use just the filename (e.g. `baseline: fleet-baseline.yaml`). - -### 3b. Generate the Fleet Profile - -Produce the fleet profile YAML. Follow the format in `references/fleet-schema.md` exactly. +Produce a single self-contained fleet profile YAML with all workload definitions inline. +Follow the format in `references/fleet-schema.md` exactly. Rules: 1. **Output raw YAML only** — no markdown fences, no prose 2. **`meta` is required** — must include `name`, `duration`, `interval`, `hostname_prefix`, and `hardware` -3. **`hosts` is required** — must include `count` and `baseline` (path to workload file) -4. **`bad_actors` is optional** — include `count`, `profiles` list, and optionally `jitter` -5. **`bad_actors.count` must not exceed `hosts.count`** -6. **Use readable duration strings** (`10m`, `1h`, `24h`) for `duration` and `interval` -7. **Add jitter** (typically 0.03–0.10) for realistic per-host variation -8. **Include comments** explaining the fleet scenario +3. **`profiles` is required** — define named workload profiles with `phases` lists +4. **`hosts` is required** — must include `count` and `baseline` (name from `profiles`) +5. **`bad_actors` is optional** — include `count`, `profiles` list (names from `profiles`), + and optionally `jitter` +6. **`bad_actors.count` must not exceed `hosts.count`** +7. **Use readable duration strings** (`10m`, `1h`, `24h`) for `duration` and `interval` +8. **Add jitter** (typically 0.03–0.10) for realistic per-host variation +9. **Include comments** explaining the fleet scenario ### Realistic Fleet Patterns @@ -108,7 +89,7 @@ Rules: ### Named Fault Patterns for Bad Actors -When the user describes problems, translate them into workload profiles: +When the user describes problems, translate them into named profile definitions: | Fault | Key Characteristics | |-------|---------------------| @@ -119,25 +100,19 @@ When the user describes problems, translate them into workload profiles: | Noisy neighbour | High CPU with elevated `sys_ratio` (virtualisation overhead) | | Slow drain | Gradual increase across all metrics over multiple phases | -## Step 4 — Save All Files +## Step 4 — Save the Fleet Profile 1. Ensure `generated-archives/` exists -2. Save workload profile(s) first -3. Save the fleet profile with a descriptive slugified name: +2. Save the fleet profile with a descriptive slugified name: - Example: "20-host web cluster with CPU problems" → `generated-archives/20-host-web-cluster-fleet.yaml` ## Step 5 — Validate -Validate the workload profiles first (they must parse independently), then the fleet. -If `uv run` fails because dependencies aren't synced, run `uv sync` first. +Validate the fleet profile. If `uv run` fails because dependencies aren't synced, +run `uv sync` first. ```bash -# Validate individual workload profiles -uv run pmlogsynth --validate generated-archives/fleet-baseline.yaml -uv run pmlogsynth --validate generated-archives/fleet-bad-cpu.yaml - -# Validate the fleet profile uv run pmlogsynth fleet --validate generated-archives/.yaml ``` @@ -160,7 +135,7 @@ the error to the user. ## Step 7 — Report Tell the user: -- All profile YAML files saved and their paths +- Fleet profile YAML saved and its path - Where the archives were generated - How to preview the host assignment: ```bash diff --git a/.claude/skills/generate-fleet-profile/references/fleet-schema.md b/.claude/skills/generate-fleet-profile/references/fleet-schema.md index 7a54ce4..4467f84 100644 --- a/.claude/skills/generate-fleet-profile/references/fleet-schema.md +++ b/.claude/skills/generate-fleet-profile/references/fleet-schema.md @@ -1,16 +1,18 @@ # pmlogsynth Fleet Profile Schema A fleet profile generates multiple PCP archives — one per simulated host — from a single -YAML file. It is a different document type from a workload profile. +self-contained YAML file. All workload profiles are defined inline using named definitions +in the `profiles` section. --- ## Top-Level Structure -A fleet profile has three top-level sections: +A fleet profile has four top-level sections: ```yaml meta: # Fleet-wide settings (required) +profiles: # Named workload profile definitions (required) hosts: # Baseline host pool (required) bad_actors: # Anomalous host pool (optional) ``` @@ -24,10 +26,10 @@ Fleet-wide metadata. All fields are required. | Field | Type | Description | |-------|------|-------------| | `name` | string | Fleet identifier. Used in output directory naming and the manifest. | -| `duration` | int or string | Archive duration for ALL hosts. Overrides workload profile durations. Integer = seconds, or strings: `'10m'`, `'1h'`, `'24h'`, `'7d'`. | -| `interval` | int or string | Sampling interval for ALL hosts. Overrides workload profile intervals. | +| `duration` | int or string | Archive duration for ALL hosts. Integer = seconds, or strings: `'10m'`, `'1h'`, `'24h'`, `'7d'`. | +| `interval` | int or string | Sampling interval for ALL hosts. | | `hostname_prefix` | string | Prefix for generated hostnames. Hosts are named `-01`, `-02`, etc. | -| `hardware` | string | Hardware profile name (e.g. `generic-large`). Applied to ALL hosts, overriding workload profile hardware. | +| `hardware` | string | Hardware profile name (e.g. `generic-large`). Applied to ALL hosts. | ### Example @@ -44,6 +46,54 @@ This produces hostnames `web-01`, `web-02`, ..., `web-NN`. --- +## profiles + +Named workload profile definitions. Required. Each entry defines a workload that can be +referenced by name from `hosts.baseline` or `bad_actors.profiles`. + +Each profile contains a `phases` list — the same structure as the `phases` section of a +standalone workload profile. Profiles do **not** contain `meta`, `host`, or `hardware` +sections — those are all controlled at the fleet level. + +### Example + +```yaml +profiles: + steady-baseline: + phases: + - name: normal-operations + duration: 24h + cpu: + utilization: 0.35 + user_ratio: 0.65 + sys_ratio: 0.20 + iowait_ratio: 0.05 + memory: + used_ratio: 0.55 + cache_ratio: 0.25 + disk: + read_mbps: 20.0 + write_mbps: 10.0 + network: + rx_mbps: 200.0 + tx_mbps: 100.0 + + cpu-saturated: + phases: + - name: overloaded + duration: 24h + cpu: + utilization: 0.96 + user_ratio: 0.85 + sys_ratio: 0.10 + iowait_ratio: 0.05 + memory: + used_ratio: 0.70 + cache_ratio: 0.10 +``` + +--- + ## hosts Baseline host pool configuration. Required. @@ -51,7 +101,7 @@ Baseline host pool configuration. Required. | Field | Type | Default | Description | |-------|------|---------|-------------| | `count` | int | — | **Required.** Total number of hosts in the fleet (includes bad actors). Must be ≥ 1. | -| `baseline` | string | — | **Required.** Path to the baseline workload profile YAML file. Resolved relative to the fleet profile file's directory. | +| `baseline` | string | — | **Required.** Name of a profile defined in the `profiles` section. | | `jitter` | float | `0.0` | Standard deviation for per-host Gaussian jitter. Applied multiplicatively to all stressor values. Typical range: 0.02–0.10. | ### Example @@ -59,7 +109,7 @@ Baseline host pool configuration. Required. ```yaml hosts: count: 20 - baseline: fleet-baseline.yaml + baseline: steady-baseline jitter: 0.05 ``` @@ -73,7 +123,7 @@ Optional section defining hosts that deviate from the baseline. |-------|------|---------|-------------| | `count` | int | `0` | Number of bad-actor hosts. Must not exceed `hosts.count`. | | `jitter` | float | inherits `hosts.jitter` | Jitter for bad-actor hosts. | -| `profiles` | list of strings | `[]` | Paths to workload profile YAML files for bad actors. Resolved relative to the fleet profile file. Bad actors are randomly assigned a profile from this list. | +| `profiles` | list of strings | `[]` | Names of profiles defined in the `profiles` section. Bad actors are randomly assigned a profile from this list. | ### Example @@ -82,8 +132,8 @@ bad_actors: count: 2 jitter: 0.03 profiles: - - fleet-bad-cpu.yaml - - fleet-bad-memory.yaml + - cpu-saturated + - memory-exhausted ``` --- @@ -99,17 +149,49 @@ meta: hostname_prefix: web hardware: generic-large +profiles: + steady-baseline: + phases: + - name: normal-operations + duration: 24h + cpu: + utilization: 0.35 + user_ratio: 0.65 + sys_ratio: 0.20 + iowait_ratio: 0.05 + memory: + used_ratio: 0.55 + cache_ratio: 0.25 + disk: + read_mbps: 20.0 + write_mbps: 10.0 + network: + rx_mbps: 200.0 + tx_mbps: 100.0 + + cpu-saturated: + phases: + - name: overloaded + duration: 24h + cpu: + utilization: 0.96 + user_ratio: 0.85 + sys_ratio: 0.10 + iowait_ratio: 0.05 + memory: + used_ratio: 0.70 + cache_ratio: 0.10 + hosts: count: 20 - baseline: fleet-baseline.yaml + baseline: steady-baseline jitter: 0.05 bad_actors: count: 2 jitter: 0.03 profiles: - - fleet-bad-cpu.yaml - - fleet-bad-memory.yaml + - cpu-saturated ``` This generates 20 archives: @@ -133,29 +215,15 @@ All hosts share the `generic-large` hardware profile, a 24h duration, and 60s in --- -## How Fleet Overrides Work - -The fleet profile overrides several settings from individual workload profiles: - -| Fleet setting | Overrides workload field | Notes | -|---------------|--------------------------|-------| -| `meta.duration` | `meta.duration` in workload | All hosts get the same duration | -| `meta.interval` | `meta.interval` in workload | All hosts get the same interval | -| `meta.hardware` | `host.profile` in workload | All hosts get the same hardware | -| `meta.hostname_prefix` + index | `meta.hostname` in workload | Each host gets a unique name | - -Warnings are emitted when the fleet overrides differ from the workload profile values. - ---- - ## Validation Rules - `meta` must include all five fields: `name`, `duration`, `interval`, `hostname_prefix`, `hardware` +- `profiles` must be a non-empty mapping of named profile definitions +- Each profile must contain a non-empty `phases` list - `hosts.count` must be a positive integer -- `hosts.baseline` must be a valid path to a workload profile +- `hosts.baseline` must be a name defined in the `profiles` section - `bad_actors.count` must not exceed `hosts.count` -- `bad_actors.profiles` must be a non-empty list when `bad_actors.count > 0` -- All referenced workload profiles must be valid pmlogsynth workload profiles +- `bad_actors.profiles` entries must be names defined in the `profiles` section - `hardware` must be a valid hardware profile name --- @@ -177,7 +245,7 @@ Warnings are emitted when the fleet overrides differ from the workload profile v ## CLI Commands ```bash -# Validate the fleet profile (and referenced workload profiles) +# Validate the fleet profile pmlogsynth fleet --validate fleet-profile.yaml # Preview host assignments without generating archives @@ -188,7 +256,4 @@ pmlogsynth fleet -o ./generated-archives/my-fleet fleet-profile.yaml # Reproducible generation (same seed = same host assignment + jitter) pmlogsynth fleet --seed 42 -o ./generated-archives/my-fleet fleet-profile.yaml - -# Parallel generation (use multiple workers) -pmlogsynth fleet --jobs 4 -o ./generated-archives/my-fleet fleet-profile.yaml ``` From a8431191c25e1bfcc9ec905676f6806e7ee4cf12 Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Sat, 21 Mar 2026 01:05:38 +0000 Subject: [PATCH 22/23] Update docs for single-file fleet profiles README, man page, and profile-format.md now describe the self-contained fleet YAML format with inline named workload definitions. --- README.md | 3 +- docs/profile-format.md | 71 +++++++++++++++++++++++++++--------------- man/pmlogsynth.1 | 25 ++++++++++----- 3 files changed, 65 insertions(+), 34 deletions(-) diff --git a/README.md b/README.md index 81a2e95..cb0682b 100644 --- a/README.md +++ b/README.md @@ -191,7 +191,8 @@ for example, a simulated spike that started an hour ago. Positive offsets ### Fleet Mode -Generate a fleet of PCP archives — one per host — from a single fleet profile. +Generate a fleet of PCP archives — one per host — from a single self-contained +fleet profile. All workload definitions are inline — no external files needed. Each host gets per-host stressor jitter for realistic variation across the fleet. ```bash diff --git a/docs/profile-format.md b/docs/profile-format.md index c751cd0..f961d41 100644 --- a/docs/profile-format.md +++ b/docs/profile-format.md @@ -283,9 +283,11 @@ pmlogsynth -o ./generated-archives/complete-example docs/complete-example.yml ## Fleet Profile Format -A fleet profile is a separate YAML document used with the `pmlogsynth fleet` -subcommand. It describes a fleet of hosts that share a common hardware profile, -each generating its own PCP archive with per-host stressor variation (jitter). +A fleet profile is a single self-contained YAML document used with the +`pmlogsynth fleet` subcommand. It describes a fleet of hosts that share a +common hardware profile, each generating its own PCP archive with per-host +stressor variation (jitter). All workload definitions are inline — no external +files are needed. Fleet profiles are **not** interchangeable with workload profiles — they have a different schema and are passed to the `fleet` subcommand, not the default @@ -298,10 +300,19 @@ Fleet-wide settings. All fields are required. | Field | Type | Constraints | |-------|------|-------------| | `name` | string | Fleet identifier; used in manifest and default output directory | -| `duration` | int or string | Positive; overrides duration in all workload profiles. Accepts `30s`, `10m`, `24h`, `1d`, `1h30m`. | -| `interval` | int or string | Positive; overrides interval in all workload profiles. Same format as duration. | +| `duration` | int or string | Positive; applied to all hosts. Accepts `30s`, `10m`, `24h`, `1d`, `1h30m`. | +| `interval` | int or string | Positive; applied to all hosts. Same format as duration. | | `hostname_prefix` | string | Prefix for generated hostnames (e.g. `prod-web` → `prod-web-01`, `prod-web-02`, ...) | -| `hardware` | string | Named hardware profile; overrides `host.profile` in all workload profiles | +| `hardware` | string | Named hardware profile; applied to all hosts | + +### `profiles` + +Named workload profile definitions. Required. Each entry defines a workload +that can be referenced by name from `hosts.baseline` or `bad_actors.profiles`. + +Each profile contains a `phases` list — the same structure as the `phases` +section of a standalone workload profile. Profiles do **not** contain `meta`, +`host`, or `hardware` sections — those are all controlled at the fleet level. ### `hosts` @@ -310,7 +321,7 @@ Baseline host pool configuration. | Field | Type | Default | Constraints | |-------|------|---------|-------------| | `count` | integer | required | Positive; total number of hosts in the fleet | -| `baseline` | string (path) | required | Path to the baseline workload profile YAML, resolved relative to the fleet profile file | +| `baseline` | string | required | Name of a profile defined in the `profiles` section | | `jitter` | float | `0.0` | Standard deviation of the Gaussian jitter factor (mean 1.0). Higher values produce more variation between hosts. | ### `bad_actors` @@ -322,21 +333,7 @@ workload profiles (e.g. high-CPU or memory-pressure scenarios). |-------|------|---------|-------------| | `count` | integer | `0` | Must not exceed `hosts.count` | | `jitter` | float | inherits `hosts.jitter` | Per-bad-actor jitter standard deviation | -| `profiles` | list of strings (paths) | `[]` | Paths to bad-actor workload profiles, resolved relative to the fleet profile file. One is chosen at random per bad-actor host. | - -### Path resolution - -All workload profile paths (`hosts.baseline` and `bad_actors.profiles` entries) -are resolved **relative to the directory containing the fleet profile file**. -This allows fleet profiles and their workload profiles to live together in a -self-contained directory. - -### Fleet-level overrides - -The fleet `meta.duration`, `meta.interval`, and `meta.hardware` values -**override** the corresponding values in each workload profile. If the workload -profile specifies different values, a warning is emitted to stderr but -generation proceeds with the fleet-level settings. +| `profiles` | list of strings | `[]` | Names of profiles defined in the `profiles` section. One is chosen at random per bad-actor host. | ### Jitter semantics @@ -362,17 +359,41 @@ meta: hostname_prefix: prod-web hardware: generic-large +profiles: + steady-baseline: + phases: + - name: normal + duration: 1h + cpu: + utilization: 0.35 + user_ratio: 0.65 + sys_ratio: 0.20 + memory: + used_ratio: 0.55 + cache_ratio: 0.25 + + cpu-spike: + phases: + - name: overloaded + duration: 1h + cpu: + utilization: 0.96 + user_ratio: 0.85 + sys_ratio: 0.10 + memory: + used_ratio: 0.70 + cache_ratio: 0.10 + hosts: count: 20 - baseline: workloads/baseline.yml + baseline: steady-baseline jitter: 0.10 bad_actors: count: 3 jitter: 0.15 profiles: - - workloads/cpu-spike.yml - - workloads/memory-pressure.yml + - cpu-spike ``` ### Output diff --git a/man/pmlogsynth.1 b/man/pmlogsynth.1 index 3dea6d0..7944aa2 100644 --- a/man/pmlogsynth.1 +++ b/man/pmlogsynth.1 @@ -470,17 +470,19 @@ A YAML file is written alongside the archives recording all host assignments. .PP Fleet profiles are distinct from workload profiles. -A fleet profile specifies a set of hosts sharing a common hardware profile, -a baseline workload, and optional bad-actor hosts with different workload -profiles. -Workload profile paths are resolved relative to the fleet profile file. +A fleet profile is a single self-contained YAML file that defines a set of +hosts sharing a common hardware profile, with named workload profiles defined +inline in a +.B profiles +section. +Baseline and bad-actor hosts reference these named profiles by name. .PP Fleet-level .BR meta.duration , .BR meta.interval , and .B meta.hardware -override the corresponding values in individual workload profiles. +are applied to all hosts. .SS Fleet Options .TP \fIFLEET_PROFILE\fR @@ -519,7 +521,7 @@ Print per-host generation progress to stderr. .BI \-C " DIR" ", \-\-config\-dir " DIR Additional hardware profile directory (highest precedence). .SS Fleet Profile Format -A fleet profile YAML file has three top-level sections: +A fleet profile YAML file has four top-level sections: .TP .B meta Fleet-wide settings: @@ -529,17 +531,24 @@ Fleet-wide settings: .BR hostname_prefix " (string, required)," .BR hardware " (named hardware profile, required)." .TP +.B profiles +Named workload profile definitions. Each entry contains a +.B phases +list (same structure as standalone workload profiles). Profiles do not contain +.BR meta ", " host ", or " hardware +sections \(em those are controlled at the fleet level. +.TP .B hosts Baseline host pool: .BR count " (integer, required)," -.BR baseline " (path to workload profile, required)," +.BR baseline " (name of a profile in the profiles section, required)," .BR jitter " (float, default 0.0 \(em Gaussian noise standard deviation)." .TP .B bad_actors Optional bad-actor host pool: .BR count " (integer, default 0 \(em must not exceed hosts.count)," .BR jitter " (float, inherits hosts.jitter if absent)," -.BR profiles " (list of paths to workload profiles)." +.BR profiles " (list of names from the profiles section)." .SH EXAMPLES .SS Generate a 10-minute archive with a CPU spike .nf From 8030d49ec3d07f207624605883745898f0cb60d2 Mon Sep 17 00:00:00 2001 From: Paul Smith Date: Sat, 21 Mar 2026 01:06:04 +0000 Subject: [PATCH 23/23] Mark fleet mode spec as superseded by single-file design --- docs/superpowers/specs/2026-03-20-fleet-mode-design.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/superpowers/specs/2026-03-20-fleet-mode-design.md b/docs/superpowers/specs/2026-03-20-fleet-mode-design.md index aee1196..d6b2692 100644 --- a/docs/superpowers/specs/2026-03-20-fleet-mode-design.md +++ b/docs/superpowers/specs/2026-03-20-fleet-mode-design.md @@ -1,7 +1,11 @@ # Fleet Mode — Design Specification **Date:** 2026-03-20 -**Status:** Approved +**Status:** Superseded by `2026-03-21-single-file-fleet-profiles-design.md` + +> **Note:** The multi-file fleet format described here has been replaced by a +> single self-contained YAML format with inline named workload profiles. +> See `2026-03-21-single-file-fleet-profiles-design.md` for the current design. ---