decko · decko · May 28, 2026 · May 28, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,96 @@
+## [0.15.0] — 2026-05-28
+
+### Breaking Changes
+
+- ``eval-manifest.yaml`` is no longer auto-discovered by ``raki run``. Only
+  ``raki.yaml`` is recognized. Rename any existing ``eval-manifest.yaml`` files
+  to ``raki.yaml``.
+
+### Features
+
+- Add incremental evaluation mode (``raki run --incremental``).
+
+  ``raki run --incremental`` (short: ``-i``) now skips sessions that were
+  already evaluated in a prior run, based on the ``session_ids`` field written
+  to ``history.jsonl`` after each run.  Exit code 2 is returned when there are
+  no new sessions to evaluate.
+
+  ``raki run --rerun-all`` evaluates all sessions regardless of history and
+  suppresses the new deprecation warning that fires when prior session history
+  exists and neither flag is provided.
+
+  Implementation details:
+
+  - ``HistoryEntry`` gains a ``session_ids: list[str]`` field (schema-backwards-compatible).
+  - ``append_history_entry()`` now populates ``session_ids`` from ``report.sample_results``.
+  - New ``load_seen_session_ids(path, *, manifest=None)`` helper in ``history.py``.
+  - New ``raki.report.incremental`` module exposes ``filter_new_samples(dataset, seen_ids)``.
+
+  (#293)
+- Split ``first_pass_success_rate`` into review-rework vs corrective-patch dimensions.
+
+  New ``patch_cycles`` field on :class:`SessionMeta` (default ``0``) tracks the
+  number of verify/CI-triggered corrective iterations — a subset of ``rework_cycles``.
+
+  New ``ReviewReworkRate`` metric (``review_rework_rate``) measures the fraction
+  of sessions that avoided *review-triggered* rework, ignoring CI/verify
+  corrective patches.  Unlike ``FirstPassSuccessRate`` (which counts any rework),
+  this metric focuses on the human-review feedback loop.
+
+  - :class:`SessionMeta` gains ``patch_cycles: int = 0`` (ticket #295).
+  - :class:`SessionSchemaAdapter` reads ``patch_cycles`` from ``meta.json``.
+  - :class:`AlcovePipelineAdapter` counts ``verify.Failed`` / ``await-ci.Failed``
+    triggered corrective steps as ``patch_cycles`` (review-triggered corrective steps are excluded).
+  - ``ReviewReworkRate`` is registered in ``ALL_OPERATIONAL``, ``METRIC_METADATA``,
+    and ``OPERATIONAL_METRICS`` so it appears in HTML and CLI reports.
+
+  (#295)
+- History entries now use the manifest ``name:`` field as the match key for sparkline trends and incremental filtering, instead of the manifest filename. Projects that set ``name: my-project`` in their manifest YAML get a stable, rename-proof identity across runs. Projects without a ``name:`` field continue to use the filename (backward-compatible). (#320)
+
+### Bug Fixes
+
+- Extract ``phase_dot_class()`` from Jinja2 template into a Python function so
+  that dot-color logic is unit-tested directly rather than through full-HTML
+  string matching, which previously matched CSS class definitions in the
+  ``<style>`` block (vacuous assertions).  The vacuous ``test_superseded_phase_css_rule_defined``
+  assertion (``".phase-status-superseded" in content``) is replaced with a
+  line-level check that confirms the CSS rule body includes ``opacity``. (#271)
+- Align CLI and HTML score color thresholds to eliminate the green/yellow inconsistency at the 0.80–0.85 boundary. ``color_for_score()`` in ``cli_summary.py`` now reads from the shared ``ZONE_THRESHOLDS`` constant (green ≥ 0.85) instead of a hard-coded 0.80 cutoff, matching the HTML report's coloring exactly. (#300)
+- Fix inverted SparklineData direction semantics for lower-is-better metrics in _make_report_with_sparklines test helper. (#305)
+- Replace inline JSON serialization lambdas in the cohort command with a named `_json_default` helper that raises `TypeError` for unexpected types instead of silently passing them through. (#313)
+- Fix ``--fail-on-regression`` notice never being shown when ``--group-by`` produces more than 2 cohorts; the dead ``group_count == 2`` guard has been removed so users always see the "only supported with 2 cohorts" warning. (#315)
+- Fix ``--until`` being silently ignored when combined with ``--group-by`` in ``raki cohort``.  The mutual-exclusivity check is now performed before session loading so that an empty sessions directory correctly produces exit code 2 (usage error) rather than exit code 1 (no sessions found). (#318)
+- HTML report now correctly displays timed-out (superseded) phases. Sessions where a phase
+  was interrupted by a timeout and restarted at a higher generation (timeout-resume pattern)
+  now show a synthesised ``superseded`` phase entry in the timeline with a distinct status
+  dot. The phase timeline is also sorted correctly: post-superseded gen-1 phases (verify,
+  review, submit) appear after the replacement generation rather than before it. (#319)
+- Phase status dots in the HTML report now reflect the structured verdict for verify and review phases. A verify phase with verdict ``FAIL`` shows a red dot even when its execution status is ``completed``; a review phase with verdict ``REWORK`` shows a yellow dot; and ``approve``/``pass``/``pass-with-follow-ups`` show a green dot. Hard execution failures (``failed``, ``skipped``, ``superseded``) still take priority over any verdict. (#325)
+- Add ``jinja2`` to the ``dev`` extra so ``ty check`` can resolve the deferred
+  import in ``html_report.py`` without relying on transitive dependencies.
+- Pin ``langchain-community>=0.4,<0.4.2`` in the ragas extra to work around
+  a broken ``ChatVertexAI`` import in ragas 0.4.3
+  (upstream: `ragas#2745 <https://github.com/explodinggradients/ragas/issues/2745>`_).
+- Review phase detail in HTML reports now shows findings inline when
+  ``output_structured.findings`` is stripped. Falls back to the session-level
+  findings list with severity badges and file locations.
+- ``--rerun-all`` now bypasses the duplicate-run detection warning from ``--force``.
+  Previously it only silenced the incremental deprecation warning.
+
+### Documentation
+
+- Fixed incorrect y-axis description for lower_is_better metrics in the comparing-runs doc; higher values always map to higher dot positions. (#307)
+- Removed duplicate 'Filtering the compare cohort with --until' section from comparing-runs.md. (#316)
+
+### Internal Changes
+
+- Add skipped-phase dot coloring coverage to ``TestPhaseTimelineDotColoring``: ``test_skipped_phase_has_skipped_dot`` and ``test_verify_pass_verdict_but_skipped_status_gives_skipped_dot`` verify that skipped phases render with the muted ``phase-status-skipped`` CSS class. (#272)
+- Strengthen weak assertion in test_tool_call_count_shown_when_present to verify class and count appear together in the correct HTML element. (#274)
+- Remove dead no-op Jinja2 block in ``report.html.j2`` that used a broken ``selectattr`` filter on ``session_id``; the correct ``namespace(value=false)`` loop was already in place. (#275)
+- Correct changelog entry for #260: replace '--before DATE' with '--since DATE' in the cohort command description. (#311)
+- Replace bare ``list`` annotation with ``list[RegressionResult]`` in ``gate_check`` command; adds ``RegressionResult`` to the ``TYPE_CHECKING`` import block for full generic annotation. (#317)
+
+
 ## [0.14.0] — 2026-05-24
 
 ### Breaking Changes

diff --git a/changes/271.fix b/changes/271.fix
diff --git a/changes/272.misc b/changes/272.misc
diff --git a/changes/274.misc b/changes/274.misc
diff --git a/changes/275.misc b/changes/275.misc
diff --git a/changes/293.feature b/changes/293.feature
diff --git a/changes/295.feature b/changes/295.feature
diff --git a/changes/300.fix b/changes/300.fix
diff --git a/changes/305.fix b/changes/305.fix
diff --git a/changes/307.doc b/changes/307.doc
diff --git a/changes/311.misc b/changes/311.misc
diff --git a/changes/313.fix b/changes/313.fix
diff --git a/changes/315.fix b/changes/315.fix
diff --git a/changes/316.doc b/changes/316.doc
diff --git a/changes/317.misc b/changes/317.misc
diff --git a/changes/318.fix b/changes/318.fix
diff --git a/changes/319.fix b/changes/319.fix
diff --git a/changes/320.feature b/changes/320.feature
diff --git a/changes/325.fix b/changes/325.fix
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 
 [project]
 name = "raki"
-version = "0.14.0"
+version = "0.15.0"
 description = "Retrieval Assessment for Knowledge Impact — evaluate agentic RAG quality"
 requires-python = ">=3.12"
 license = "Apache-2.0"

diff --git a/src/raki/__init__.py b/src/raki/__init__.py
@@ -1 +1 @@
-__version__ = "0.14.0"
+__version__ = "0.15.0"