feat: v0.2.0 — ROUGE-L, EvalMode tracking, optional LlamaIndex, JSON export by cortexark · Pull Request #1 · cortexark/rageval

cortexark · 2026-04-08T18:23:58Z

Summary

Add pure-Python ROUGE-L metric (LCS-based, zero deps, 14 tests)
Add EvalMode enum (llm_judge | heuristic | none) tracked in every GenerationMetrics result
Make LlamaIndex an optional dependency (pip install rageval[llm]) — core needs only pydantic/structlog/duckdb
Per-metric error isolation in LLM-as-Judge (partial results instead of total failure)
JSON export from ResultStore (export_json, list_runs)
More query filters: min_faithfulness, min_rouge_l, eval_mode
Bump to v0.2.0: 126 tests passing, ruff clean, mypy strict clean

Test plan

126/126 tests passing (1.68s)
ruff lint clean
mypy strict clean (14 source files)
New test files: test_rouge.py (14 tests), extended test_generation_metrics.py, test_models.py, test_storage.py
All existing E2E customer scenarios still pass

…, JSON export - Add pure-Python ROUGE-L implementation (LCS-based, zero deps, 14 tests) - Add EvalMode enum (llm_judge | heuristic | none) tracked in every result - Make LlamaIndex optional: core needs only pydantic/structlog/duckdb - Per-metric error isolation in LLM-as-Judge (partial results > nothing) - JSON export from ResultStore (export_json, list_runs) - More query filters (faithfulness, rouge_l, eval_mode) - Bump to v0.2.0, 126 tests passing, ruff + mypy strict clean

cortexark merged commit 4d67b83 into main Apr 8, 2026
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: v0.2.0 — ROUGE-L, EvalMode tracking, optional LlamaIndex, JSON export#1

feat: v0.2.0 — ROUGE-L, EvalMode tracking, optional LlamaIndex, JSON export#1
cortexark merged 1 commit intomainfrom
feat/v0.2.0-rouge-l-evalmode-export

cortexark commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cortexark commented Apr 8, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant