Releases · decko/raki

28 May 13:08

github-actions

v0.15.0

b219ece

v0.15.0 — Second Glass Latest

Latest

What's Changed

feat(run): incremental evaluation — skip sessions seen in prior runs by @decko in #328
feat(metrics): split first_pass_success_rate into review-rework vs patch dims by @decko in #333
fix(deps): pin langchain-community<0.4.2 for ragas ChatVertexAI import by @decko in #337
fix(cli): JSON serialization default raises TypeError for unexpected types by @decko in #338
fix(cli): show --fail-on-regression warning for N>2 groups by @decko in #339
fix(cli): raise UsageError for --until + --group-by before session loading by @decko in #340
test(report): strengthen assertion in test_tool_call_count_shown_when_present by @decko in #343
docs(comparing-runs): fix incorrect y-axis description for lower_is_better by @decko in #344
misc(changelog): correct --before → --since in fragment for #260 by @decko in #345
docs(comparing-runs): remove duplicate --until section by @decko in #347
chore(cli): replace bare list annotation with list[RegressionResult] by @decko in #348
feat(history): match sparkline trends by manifest name field instead of filename by @decko in #349
fix(report): align CLI and HTML score color thresholds at 0.85 by @decko in #350
test(report): fix inverted SparklineData direction for lower-is-better metrics by @decko in #353
fix(report): superseded phases missing and gen-N sorts after submit by @decko in #356
fix(report): phase status dot color reflects verdict not execution status by @decko in #359
chore(report): remove dead no-op Jinja2 block in report.html.j2 by @decko in #360
fix(report): extract phase_dot_class() to eliminate vacuous test assertions by @decko in #361
test(272): add skipped phase dot coloring coverage to TestPhaseTimelineDotColoring by @decko in #363
release: v0.15.0 by @decko in #366

Full Changelog: v0.14.0...v0.15.0

Contributors

decko

Assets 2

24 May 23:57

github-actions

v0.14.0

c2fc675

v0.14.0

What's Changed

docs: add v0.13.0 gotchas and agent pitfalls to AGENTS.md by @decko in #279
feat(metrics): persist per-sample metric scores in JSON report by @decko in #290
fix(report): use exit_code over passed for verify ✓/✗ in HTML by @decko in #292
feat(cli): detect duplicate raki run and warn before re-evaluation by @decko in #294
fix(report): hide empty 'Worst N Sessions' and 'Recurring Failures' sections by @decko in #296
fix(report): severity distribution chart rounding and zone coloring by @decko in #297
feat(report): show metric context in score cards — thresholds, breakdowns, tooltips by @decko in #301
fix(report): truncate long transcripts to reduce HTML report file size by @decko in #302
feat(report): add navigation TOC and sort controls to HTML report by @decko in #303
feat(report): inline sparklines and delta indicators in HTML score cards by @decko in #304
feat(trends): HTML trend report with SVG dot charts by @decko in #306
feat(cohort): date-based session split and diff within a single report by @decko in #310
feat(cohort): manifest cohort tags and --group-by for multi-cohort comparison by @decko in #314
fix(report): nav bar alignment, trend indicators, sticky context by @decko in #322
docs: inline trend indicators and sticky nav context (Tech Preview) by @decko in #324
release: v0.14.0 by @decko in #326

Full Changelog: v0.13.0...v0.14.0

Contributors

decko

Assets 2

16 May 00:29

decko

v0.13.0

bedfd14

v0.13.0 — Louche

The louche effect — what was clear turns opaque and reveals hidden structure.

This release is about revealing the signal that was always dissolved in raw output. Session drill-downs now render structured HTML instead of raw JSON dumps. Two new triage metrics measure whether the agent knows what it's doing before it starts writing code.

Features

triage_calibration metric — predicted complexity vs actual cost (#251)
file_prediction_accuracy metric — triage files vs actual changes, mean F1 score (#252)
Structured drill-down sections in HTML report — phases, findings, and metrics in collapsible blocks (#250)

Bug Fixes

Phase timeline ordering — canonical pipeline order instead of alphabetical (#249)
Rework phases highlighted with amber dots instead of green (#249)
output_structured rendered as formatted HTML: triage approach/risks, plan task list, files changed with M/A/D prefixes, verify verdict with command results, review findings with severity badges (#277)
Chronological phase sorting with rework interleaving (implement→verify→implement→verify instead of grouped) (#277)

Documentation

New Comparing Runs guide for raki report --diff workflow (#258)

Stats

1581 tests passing
SODA pipeline cost for this milestone: ~$23
16 metrics total (9 operational, 2 knowledge, 4 analytical, 1 experimental)

Assets 2

08 May 18:56

decko

v0.12.0

0eca1df

v0.12.0 — Clear Signals

Make evaluation output transparent and judge configuration effortless. Enrich report headers with project identity and context, persist judge config in the manifest, fix provider bugs, and add Alcove pipeline export support.

Highlights

Manifest judge config — persist judge.provider and judge.model in your manifest YAML with 4-tier priority resolution (CLI > manifest > env vars > defaults)
Report header enrichment — project name, session formats, docs path, and judge annotation now appear in HTML reports
Alcove pipeline adapter — new alcove-pipeline adapter loads multi-step Alcove pipeline exports (run.json + steps/)
Google provider fixes — async client mismatch (#233), VERTEXAI_PROJECT env var fallback (#231), embeddings SDK path (#245)
SDK migration — dropped deprecated vertexai._model_garden dependency, removed 28 transitive packages

Full changelog

See CHANGELOG.md for the complete list of changes.

Install / Upgrade

pip install raki==0.12.0

Assets 2

29 Apr 19:57

decko

v0.11.0

957d945

v0.11.0 - Full Recall

What's Changed

docs: add gotchas #25-#27 and gitignore soda temp files by @decko in #217
chore: remove stale ty: ignore comments in llm_setup.py by @decko in #224
fix: increase Alcove adapter DETECT_READ_SIZE to 32KB by @decko in #225
chore: add SODA session test fixture for adapter integration tests by @decko in #226
fix: derive rework_cycles from SODA phase generation metadata by @decko in #227
feat: extend session-schema adapter for full SODA phase coverage by @decko in #228
feat: add raki import-history command for backfilling evaluation history by @decko in #229
chore: release v0.11.0 by @decko in #232

Full Changelog: v0.10.0...v0.11.0

Contributors

decko

Assets 2

27 Apr 00:40

github-actions

v0.10.0

852ed69

v0.10.0

What's Changed

feat: show judge model name in HTML and CLI report headers (#207) by @decko in #209
feat: add LiteLLM provider adapter for judge metrics (#103) by @decko in #210
fix: knowledge_miss_rate on SODA sessions — skip synthesized context (#183) by @decko in #211
feat: show phase output/transcript in HTML session drill-down (#194) by @decko in #212
feat: synthesize review findings from Alcove transcripts (#186) by @decko in #213
feat: wire LiteLLM provider into CLI and embeddings (#208) by @decko in #214
feat: RAKI as pipeline quality gate (#184) by @decko in #215
chore: bump version to 0.10.0 by @decko in #216

Full Changelog: v0.9.1...v0.10.0

Contributors

decko

Assets 2

25 Apr 01:54

github-actions

v0.9.1

28fe03d

v0.9.1

What's Changed

docs: update AGENTS.md with v0.9.0 learnings by @decko in #196
chore: remove deprecated --no-llm flag (#195) by @decko in #198
chore: rename skip_llm to skip_judge in report config (#178) by @decko in #199
feat: add pipeline/orchestrator metadata to session adapter (#175) by @decko in #200
refactor: extract shared scoring loop from Ragas metrics (#182) by @decko in #201
fix: Alcove detect() accepts sessions with 'id' instead of 'session_id' (#197) by @decko in #202
feat: distinguish agent model from judge model in reports (#179) by @decko in #203
feat: track judge cost per report (#174) by @decko in #204
feat: metric health checks — detect degenerate and dead metrics (#162) by @decko in #205
chore: bump version to 0.9.1 by @decko in #206

Full Changelog: v0.9.0...v0.9.1

Contributors

decko

Assets 2

24 Apr 15:23

github-actions

v0.9.0

aebf67b

v0.9.0

What's Changed

docs: add CI workflow guide with test ticket pattern by @decko in #167
docs: add diff use cases to CI workflow guide by @decko in #168
chore: commit soda pipeline config to main by @decko in #177
fix(ragas): detect and skip instructor#1658 silent-zero scores from Google provider by @decko in #180
docs: update AGENTS.md with v0.8.0 learnings by @decko in #181
chore(soda): improve pipeline prompts and commit SODA config by @decko in #185
feat(report): serialize judge config fields into report JSON (#173) by @decko in #188
feat(report): warn when judge configs differ in --diff comparison (#187) by @decko in #189
feat(history): JSONL history log for cross-run tracking (#170) by @decko in #190
fix(adapters): alcove adapter rework cycle and phase detection (#176) by @decko in #191
feat(trends): add raki trends command for metric trajectories (#171) by @decko in #192
chore: bump version to 0.9.0 by @decko in #193

Full Changelog: v0.8.0...v0.9.0

Contributors

decko

Assets 2

23 Apr 22:35

github-actions

v0.8.0

1d3b5b2

v0.8.0

What's Changed

docs: add release gating rule — agents must not tag releases by @decko in #128
chore: cherry-pick v0.7.1 fixes to main by @decko in #154
fix(gates): round actual values to 4dp in --gate output by @decko in #155
feat(adapters): support bridge/alcove session format by @decko in #163
fix(cli): validate --gate metric names early, exit 2 for unknown metrics by @decko in #156
fix(cli): include knowledge metrics in raki metrics output by @decko in #157
fix(cli): three-tier section headers and progression nudges by @decko in #158
fix(cli): use CWD as project root for --docs-path guard by @decko in #159
fix(cli): add --gate and --require-metric flags to report subcommand by @decko in #160
fix(metrics): rename first_pass_verify_rate to first_pass_success_rate by @decko in #161
fix(knowledge): replace loose word overlap with path+word hybrid matcher by @decko in #164
docs(metrics): rationale & interpretation guide for all metrics by @decko in #165
chore: bump version to 0.8.0 by @decko in #166

Full Changelog: v0.7.0...v0.8.0

Contributors

decko

Assets 2

22 Apr 16:01

github-actions

v0.7.1

a21e776

v0.7.1

What's Changed

fix(deps): pin instructor>=1.0 in ragas extra by @decko in #142
fix(gates): handle missing metrics in --require-metric gracefully by @decko in #143
fix(metrics): store N/A metrics as null in JSON instead of 0.0 by @decko in #144
fix(metrics): use per-domain matching for knowledge miss/gap rates by @decko in #145
fix(metrics): wire doc chunks as reference_contexts for precision/recall by @decko in #146
fix(ragas): truncate synthesized contexts and handle max_tokens errors by @decko in #147
fix(ragas): comprehensive truncation for Ragas text inputs by @decko in #148
fix(ragas): set max_tokens=4096 on llm_factory calls by @decko in #149
chore: bump version to 0.7.1 by @decko in #153

Full Changelog: v0.7.0...v0.7.1

Contributors

decko

Assets 2

Releases: decko/raki

v0.15.0 — Second Glass

What's Changed

Contributors

Uh oh!

v0.14.0

What's Changed

Contributors

Uh oh!

v0.13.0 — Louche

Features

Bug Fixes

Documentation

Stats

Uh oh!

v0.12.0 — Clear Signals

v0.12.0 — Clear Signals

Highlights

Full changelog

Install / Upgrade

Uh oh!

v0.11.0 - Full Recall

What's Changed

Contributors

Uh oh!

v0.10.0

What's Changed

Contributors

Uh oh!

v0.9.1

What's Changed

Contributors

Uh oh!

v0.9.0

What's Changed

Contributors

Uh oh!

v0.8.0

What's Changed

Contributors

Uh oh!

v0.7.1

What's Changed

Contributors

Uh oh!