diff --git a/CHANGELOG.md b/CHANGELOG.md index aad03fc..cc1f1c2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,96 @@ +## [0.15.0] — 2026-05-28 + +### Breaking Changes + +- ``eval-manifest.yaml`` is no longer auto-discovered by ``raki run``. Only + ``raki.yaml`` is recognized. Rename any existing ``eval-manifest.yaml`` files + to ``raki.yaml``. + +### Features + +- Add incremental evaluation mode (``raki run --incremental``). + + ``raki run --incremental`` (short: ``-i``) now skips sessions that were + already evaluated in a prior run, based on the ``session_ids`` field written + to ``history.jsonl`` after each run. Exit code 2 is returned when there are + no new sessions to evaluate. + + ``raki run --rerun-all`` evaluates all sessions regardless of history and + suppresses the new deprecation warning that fires when prior session history + exists and neither flag is provided. + + Implementation details: + + - ``HistoryEntry`` gains a ``session_ids: list[str]`` field (schema-backwards-compatible). + - ``append_history_entry()`` now populates ``session_ids`` from ``report.sample_results``. + - New ``load_seen_session_ids(path, *, manifest=None)`` helper in ``history.py``. + - New ``raki.report.incremental`` module exposes ``filter_new_samples(dataset, seen_ids)``. + + (#293) +- Split ``first_pass_success_rate`` into review-rework vs corrective-patch dimensions. + + New ``patch_cycles`` field on :class:`SessionMeta` (default ``0``) tracks the + number of verify/CI-triggered corrective iterations — a subset of ``rework_cycles``. + + New ``ReviewReworkRate`` metric (``review_rework_rate``) measures the fraction + of sessions that avoided *review-triggered* rework, ignoring CI/verify + corrective patches. Unlike ``FirstPassSuccessRate`` (which counts any rework), + this metric focuses on the human-review feedback loop. + + - :class:`SessionMeta` gains ``patch_cycles: int = 0`` (ticket #295). + - :class:`SessionSchemaAdapter` reads ``patch_cycles`` from ``meta.json``. + - :class:`AlcovePipelineAdapter` counts ``verify.Failed`` / ``await-ci.Failed`` + triggered corrective steps as ``patch_cycles`` (review-triggered corrective steps are excluded). + - ``ReviewReworkRate`` is registered in ``ALL_OPERATIONAL``, ``METRIC_METADATA``, + and ``OPERATIONAL_METRICS`` so it appears in HTML and CLI reports. + + (#295) +- History entries now use the manifest ``name:`` field as the match key for sparkline trends and incremental filtering, instead of the manifest filename. Projects that set ``name: my-project`` in their manifest YAML get a stable, rename-proof identity across runs. Projects without a ``name:`` field continue to use the filename (backward-compatible). (#320) + +### Bug Fixes + +- Extract ``phase_dot_class()`` from Jinja2 template into a Python function so + that dot-color logic is unit-tested directly rather than through full-HTML + string matching, which previously matched CSS class definitions in the + ``