Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
deeae5d
docs: record observability baseline
bma-d May 21, 2026
c84a577
feat: add diagnostics contract
bma-d May 21, 2026
a5197d9
feat: add state validation diagnostics
bma-d May 21, 2026
65ffc9e
feat: add parser contract diagnostics
bma-d May 21, 2026
be91235
feat: validate agent plan payloads
bma-d May 21, 2026
b787fd9
feat: add session state diagnostics
bma-d May 21, 2026
56a2aff
test: add observability diagnostics e2e coverage
bma-d May 21, 2026
563868f
fix: resolve observability review findings
bma-d May 22, 2026
10826ee
fix: address observability review findings
bma-d May 22, 2026
8fb0235
fix: normalize agent config rendering
bma-d May 22, 2026
8cdecc7
fix: validate complexity override config
bma-d May 22, 2026
98e3633
fix: validate nested complexity overrides
bma-d May 22, 2026
40927f1
fix: validate frontmatter complexity overrides
bma-d May 22, 2026
a8f7761
fix: handle complexity frontmatter edge cases
bma-d May 22, 2026
87948b3
fix: reject empty complexity override fields
bma-d May 22, 2026
17d5c1d
fix: reject list complexity overrides
bma-d May 22, 2026
0d179cc
fix: reject malformed complexity indentation
bma-d May 22, 2026
0304fd1
fix: harden state agent config parsing
bma-d May 22, 2026
01e350a
fix: reject misparsed agent config sections
bma-d May 22, 2026
ac305b6
fix: reject scalar agent config headers
bma-d May 22, 2026
bfce687
fix: reject tabbed agent config frontmatter
bma-d May 22, 2026
0552274
fix: accept inline empty agent config maps
bma-d May 22, 2026
5a771f9
fix: validate complexity override value types
bma-d May 22, 2026
cfe403f
fix: reject unindented agent config sections
bma-d May 22, 2026
3ce0b0f
fix: validate complexity override keys
bma-d May 22, 2026
56b120d
fix: address coderabbit diagnostics
bma-d May 22, 2026
7a10d7a
refactor: simplify agent config boundaries
bma-d May 22, 2026
181b42b
fix: preserve agent plan compatibility
bma-d May 22, 2026
c605c44
fix: complete observability validation remediation
bma-d May 22, 2026
56c2ab0
docs: add observability phase 08 follow-up plan
bma-d May 22, 2026
bccbeb5
fix: address bot diagnostics review items
bma-d May 22, 2026
1c43cdf
fix: complete diagnostics redaction follow-ups
bma-d May 23, 2026
0cc71ca
fix: close observability review gaps
bma-d May 23, 2026
94d74cc
fix: address review validation gaps
bma-d May 23, 2026
5d3bd6a
fix: harden diagnostic command error paths
bma-d May 25, 2026
f4792a9
fix: address coderabbit diagnostics findings
bma-d May 25, 2026
86c2dd8
refactor: split orchestrator state update
bma-d May 25, 2026
1b4998d
fix: harden agent complexity build path
bma-d May 25, 2026
29c07fd
fix: address augment validation findings
bma-d May 25, 2026
f71139f
fix: preserve tmux session compatibility
bma-d May 25, 2026
5e591eb
fix: preserve legacy project session listing
bma-d May 25, 2026
7d09ad4
fix: address pr review redaction and tmux filtering
bma-d May 26, 2026
7bcbff7
fix: address augment diagnostics findings
bma-d May 26, 2026
3418d8a
fix: address review loop edge cases
bma-d May 26, 2026
bfe4401
test: align epic completion with non-numeric epics
bma-d Jun 2, 2026
5523e9a
fix: address augment review findings
bma-d Jun 2, 2026
bd14253
fix: defer monitor session diagnostics
bma-d Jun 3, 2026
9b5ab03
fix: address PR review feedback
bma-d Jun 4, 2026
79838ba
fix: address PR review feedback
bma-d Jun 4, 2026
ddf8ad3
fix: address bot review feedback
bma-d Jun 4, 2026
7615bd5
fix: address Augment review feedback
bma-d Jun 5, 2026
cd5c9e0
fix: address PR review feedback
bma-d Jun 12, 2026
fb4154d
fix: address diagnostics review comments
bma-d Jun 12, 2026
e07591f
docs: include all structured validation issues
bma-d Jun 12, 2026
b634aea
fix: address PR review feedback
bma-d Jun 17, 2026
087e281
fix: address PR comment edge cases
bma-d Jun 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/agents-and-monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ flowchart TD

The generated agents file is a runtime artifact, not just display text.

Agent-plan boundaries validate generated JSON before use. Malformed complexity
or agents-plan payloads return `structuredIssues` with field paths such as
`stories[0].complexity.level` or `stories[0].tasks.dev`.

## Child-Session Command Build

The helper CLI generates step-specific commands with `tmux-wrapper build-cmd`.
Expand Down Expand Up @@ -116,6 +120,10 @@ Important distinctions:
- `stuck` means no valid progress signal within the allowed window
- `incomplete` is a review-specific result, not a generic session state

`monitor-session --json` may include `structuredIssues` when malformed persisted
runner state affects the result. CSV status helpers keep the documented columns
unchanged.

## Review Verification

Review sessions add extra verification:
Expand Down
14 changes: 14 additions & 0 deletions docs/cli-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,18 @@ Use these during preflight to keep story selection and complexity scoring determ

Use these to create, inspect, and validate orchestration state.

`validate-state` preserves the legacy response fields:

- `ok`
- `structure`
- `issues`

It also adds `structuredIssues` and `issueCount` for field-specific diagnostics. Consumers should prefer `structuredIssues` when present and keep `issues` as the legacy fallback.

## Diagnostic Events

Command stdout stays backward-compatible. Set `STORY_AUTOMATOR_DIAGNOSTICS_FILE=/path/to/events.jsonl` to opt in to structured diagnostic events. The helper appends one redacted JSON object per line for orchestration-stage parse results, state transitions, monitor-session lifecycle results, and policy load failures.

## tmux Commands

- `tmux-wrapper spawn`
Expand Down Expand Up @@ -71,6 +83,8 @@ Critical rule:

These commands are the orchestration control plane.

`orchestrator-helper state-update <file> --set status=<value>` validates status transitions before writing. Invalid transitions return `ok:false`, `error:"invalid_status_transition"`, `currentStatus`, `attemptedStatus`, `allowedTransitions`, legacy `issues`, and `structuredIssues`. Non-status updates keep the existing `ok` and `updated` response shape.

## Agent Config Commands

- `agent-config list`
Expand Down
4 changes: 4 additions & 0 deletions docs/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,10 @@ sequenceDiagram

The helper CLI exists so the skill does not need to do everything through raw shell parsing or manual markdown edits.

For observability, helper failures preserve their legacy result fields and add
`structuredIssues` where a field-specific diagnostic is available. Parse failure
payloads keep `status` and `reason`; successful parse payloads stay unchanged.

## Why The State Document Matters

The state document is the control plane for the run.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Phase 00 - Baseline And Plan Reconciliation

## Clean Context Start

Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and relevant prior handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.

## Goal

Establish a reproducible baseline and confirm the Oracle feedback has been incorporated. This phase is not a blocking external-review phase; Oracle feedback is already available and applied to this packet.

## Inputs

- GitHub issue `bmad-code-org/bmad-automator#5`
- Current branch `bma-d/e2e-tests`
- Oracle feedback recorded in [implementation-notes.md](./implementation-notes.md)
- Critical source paths listed in [README.md](./README.md)

## Implementation Steps

1. Confirm working tree, branch, and HEAD:
```bash
git status --short --branch
git rev-parse --short HEAD
```
2. Run baseline Python tests:
```bash
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
```
3. Verify CLI import/help baseline:
```bash
PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator --help
```
4. Optionally run `npm run verify` if baseline time is acceptable. Otherwise defer it to Phase 06.
5. Record baseline results and any blockers in [handoff-log.md](./handoff-log.md).

## Verification

```bash
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator --help
```

## Exit Criteria

- Baseline status is recorded.
- Revised phase order is confirmed.
- Any blocked command has an exact error and next action.
- Phase 01 can start without waiting for Oracle.

## Implementation Notes Requirements

Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record any baseline surprises, command substitutions, or changes to phase scope.

## Handoff Requirements

Append a Phase 00 entry to [handoff-log.md](./handoff-log.md) with commands run, results, current SHA, blockers, and the next recommended command for Phase 01.
61 changes: 61 additions & 0 deletions docs/plans/observability-validation/01-diagnostics-contract.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Phase 01 - Diagnostics Contract

## Clean Context Start

Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and the Phase 00 handoff. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.

## Goal

Add reusable diagnostics objects and serialization helpers without changing command behavior.

## Inputs

- `skills/bmad-story-automator/src/story_automator/core/runtime_policy.py`
- `skills/bmad-story-automator/src/story_automator/core/utils.py`
- Existing tests in `tests/`
- Oracle feedback in [implementation-notes.md](./implementation-notes.md)

## Implementation Steps

1. Add `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`.
2. Define `DiagnosticIssue` with first-class fields:
- `type`
- `field`
- `expected`
- `actual`
- `message`
- `recovery`
- `code`
- `severity`
- `source`
3. Define `DiagnosticEvent` for structured observability context, but do not emit standalone event lines to stdout by default.
4. Add serialization helpers:
- `serialize_issue(issue) -> dict`
- `serialize_issues(issues) -> list[dict]`
- `legacy_issue_message(issue) -> str`
- `issues_from_exception(exc, source, field="")`
5. Add `redact_actual(value)` for long strings, absolute paths, env-like keys, nested dict/list payloads, and other oversized or sensitive values.
6. Add `tests/test_diagnostics.py`.
7. Do not touch command outputs yet.

## Verification

```bash
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
```

## Exit Criteria

- Diagnostics serialize to compact JSON-compatible dictionaries.
- Redaction behavior is tested.
- No CLI output shape changes.
- `severity` and `source` are present from day one.

## Implementation Notes Requirements

Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record field-name decisions, redaction tradeoffs, event-output decisions, and compatibility constraints.

## Handoff Requirements

Append a Phase 01 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, exact diagnostics shape, compatibility notes, blockers, and the next recommended command for Phase 02.
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Phase 02 - State Validation And Transitions

## Clean Context Start

Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.

## Goal

Fix the most visible docs/runtime mismatch by adding field-specific state diagnostics, and guard orchestration status updates against invalid transitions.

## Inputs

- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`
- `skills/bmad-story-automator/src/story_automator/commands/state.py`
- `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py`
- `skills/bmad-story-automator/src/story_automator/core/frontmatter.py`
- `skills/bmad-story-automator/templates/state-document.md`
- `skills/bmad-story-automator/steps-v/step-v-01-check.md`
- `docs/state-and-resume.md`
- `docs/cli-reference.md`
- `tests/test_state_policy_metadata.py`
- `tests/test_replacement_unicode.py`

## Implementation Steps

1. Add `skills/bmad-story-automator/src/story_automator/core/state_validation.py`.
2. Validate state frontmatter fields with structured issues:
- `epic`
- `epicName`
- `storyRange`
- `status`
- `lastUpdated`
- runtime command config through `aiCommand` or usable `agentConfig`
- policy snapshot metadata
3. Preserve `validate-state` compatibility:
- keep `ok`
- keep `structure`
- keep `issues: list[str]`
- add `structuredIssues: list[object]`
- add `issueCount`
4. Add `ALLOWED_STATUS_TRANSITIONS`:
```python
ALLOWED_STATUS_TRANSITIONS = {
"INITIALIZING": {"INITIALIZING", "READY", "ABORTED"},
"READY": {"READY", "IN_PROGRESS", "PAUSED", "ABORTED"},
"IN_PROGRESS": {"IN_PROGRESS", "PAUSED", "EXECUTION_COMPLETE", "COMPLETE", "ABORTED"},
"PAUSED": {"PAUSED", "IN_PROGRESS", "ABORTED"},
"EXECUTION_COMPLETE": {"EXECUTION_COMPLETE", "COMPLETE", "ABORTED"},
"COMPLETE": {"COMPLETE"},
"ABORTED": {"ABORTED"},
}
```
5. Update `orchestrator-helper state-update` so `status=<value>` changes are checked before writing.
6. Invalid transitions must return `ok: false`, `error: "invalid_status_transition"`, `currentStatus`, `attemptedStatus`, `allowedTransitions`, legacy `issues`, and `structuredIssues`.
7. Update `steps-v/step-v-01-check.md` to read `.structuredIssues[]?` first and fall back to legacy `.issues[]?` strings.
8. Update `docs/state-and-resume.md` and `docs/cli-reference.md` for additive diagnostics and transition rules.
9. Add `tests/test_state_validation.py` for focused state validation and transition coverage. Existing state tests may also be extended, but this phase must create the focused module because verification depends on it.

## Verification

```bash
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_policy_metadata tests.test_replacement_unicode
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_validation
```

## Exit Criteria

- `validate-state` returns field-specific diagnostics without replacing legacy string issues.
- Docs/runtime mismatch around state validation issue shape is resolved.
- `state-update` blocks invalid status regressions with actionable diagnostics.
- Legacy states remain valid where intended.

## Implementation Notes Requirements

Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record the exact compatibility choice for `issues` versus `structuredIssues`, the transition table, and any allowed compatibility compromises such as `IN_PROGRESS -> COMPLETE`.

## Handoff Requirements

Append a Phase 02 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, transition table, docs changes, blockers, and the next recommended command for Phase 03.
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Phase 03 - Parser And Contract Boundaries

## Clean Context Start

Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.

## Goal

Make LLM parse failures and verifier contract failures field-specific while keeping existing parse contracts and successful output unchanged.

## Inputs

- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`
- `skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py`
- `skills/bmad-story-automator/src/story_automator/core/success_verifiers.py`
- `skills/bmad-story-automator/src/story_automator/core/review_verify.py`
- `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py`
- `skills/bmad-story-automator/src/story_automator/commands/tmux.py`
- `skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py`
- `skills/bmad-story-automator/data/parse/*.json`
- `skills/bmad-story-automator-review/contract.json`
- `tests/test_orchestrator_parse.py`
- `tests/test_success_verifiers.py`

## Implementation Steps

1. Add `skills/bmad-story-automator/src/story_automator/core/parse_contracts.py`.
2. Move parse schema/payload validation out of command code.
3. Replace boolean schema checks with diagnostics for:
- missing required key
- wrong nested type
- invalid enum
- empty string
- invalid `path or null`
4. Preserve parse success output exactly as-is. Do not add diagnostics or events to valid parsed payloads.
5. On parse failure, preserve `status: "error"` and legacy `reason`, and add `structuredIssues`.
6. Wrap success verifier contract failures into structured issues at command boundaries where safe.
7. Add or update tests for field paths such as `issues_found.critical`.

## Verification

```bash
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_orchestrator_parse tests.test_success_verifiers
```

## Exit Criteria

- Parser boundary reports specific field-level diagnostics.
- Existing parse success payloads are unchanged.
- Legacy failure `reason` values remain available.
- Verifier contract failures expose structured diagnostics where command outputs already carry errors.

## Implementation Notes Requirements

Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record any compatibility choice around legacy `reason` values, whether events are returned in failure JSON, and parse schema expressiveness limits.

## Handoff Requirements

Append a Phase 03 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, schema issue examples, compatibility notes, blockers, and the next recommended command for Phase 04.
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Phase 04 - Agent Complexity And Story Boundaries

## Clean Context Start

Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.

## Goal

Stop raw agent-plan and complexity JSON from failing late inside command handlers, and strengthen story/epic parse seams without touching tmux/session runtime behavior.

## Inputs

- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`
- `skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py`
- `skills/bmad-story-automator/src/story_automator/core/agent_config.py`
- `skills/bmad-story-automator/src/story_automator/core/epic_parser.py`
- `skills/bmad-story-automator/src/story_automator/core/story_keys.py`
- `skills/bmad-story-automator/src/story_automator/core/sprint.py`
- `tests/test_retro_agent.py`
- `tests/test_runtime_layout.py`

## Implementation Steps

1. Add `skills/bmad-story-automator/src/story_automator/core/agent_plan.py`.
2. Move duplicated agent config/plan behavior from `commands/orchestrator_epic_agents.py` toward core helpers.
3. Implement validators:
- `validate_complexity_payload(payload) -> list[DiagnosticIssue]`
- `validate_agents_plan_payload(payload) -> list[DiagnosticIssue]`
- `load_complexity_payload(path) -> tuple[payload, issues]`
- `load_agents_plan(path) -> tuple[payload, issues]`
4. Validation rules:
- root must be an object
- `stories` must be an array
- each story needs string `storyId`
- `complexity.level` normalizes to `low`, `medium`, or `high`
- task selections cover `create`, `dev`, `auto`, and `review`
- each task selection has string `primary`
- `fallback` may be false or string and must normalize like current code
- unknown fields are allowed unless harmful
5. Keep `StoryKey` and `SprintStatus` mostly unchanged; they are already useful typed seams.
6. Optionally add small dataclasses/helpers in `epic_parser.py` if they preserve current returned JSON shape.
7. Add `tests/test_agent_plan.py` for focused complexity and agents-plan payload coverage. Existing agent config tests may also be extended, but this phase must create the focused module because verification depends on it.

## Verification

```bash
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_retro_agent tests.test_runtime_layout
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_agent_plan
```

## Exit Criteria

- Agent plan and complexity file boundaries fail with field-specific diagnostics.
- Existing fallback normalization and retro override behavior remain unchanged.
- Story/epic parse improvements preserve current CLI JSON shape.
- Tmux/session runtime work is left for Phase 05.

## Implementation Notes Requirements

Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record module-boundary decisions, any accepted unknown fields, and remaining loose payloads.

## Handoff Requirements

Append a Phase 04 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, remaining loose payloads, compatibility risks, blockers, and the next recommended command for Phase 05.
Loading
Loading