GitHub - MukundaKatta/EvalBench: LLM evaluation toolkit — BLEU, ROUGE, semantic similarity, and custom metrics for benchmarking AI outputs

Related work: Primary development for this problem space has converged on evalharness — prompts, agents, and RAG-pipeline red-teaming, regression, and CI testing. This repo remains available; check the canonical repo first for the latest tooling.

---# EvalBench — LLM evaluation toolkit — BLEU, ROUGE, semantic similarity, and custom metrics for benchmarking AI outputs

LLM evaluation toolkit — BLEU, ROUGE, semantic similarity, and custom metrics for benchmarking AI outputs.

Why EvalBench

EvalBench exists to make this workflow practical. Llm evaluation toolkit — bleu, rouge, semantic similarity, and custom metrics for benchmarking ai outputs. It favours a small, inspectable surface over sprawling configuration.

Features

CLI command evalbench
TestCase — exported from src/evalbench/core.py
EvalResult — exported from src/evalbench/core.py
EvalReport — exported from src/evalbench/core.py
Included test suite
Dedicated documentation folder

Tech Stack

Runtime: Python
Frameworks: Typer
Tooling: Rich, Pydantic

How It Works

The codebase is organised into docs/, src/, tests/. The primary entry points are src/evalbench/core.py, src/evalbench/__init__.py. src/evalbench/core.py exposes TestCase, EvalResult, EvalReport — the core types that drive the behaviour.

Getting Started

pip install -e .
evalbench --help

Usage

evalbench --help

Project Structure

EvalBench/
├── .env.example
├── CONTRIBUTING.md
├── Makefile
├── README.md
├── docs/
├── pyproject.toml
├── src/
├── tests/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why EvalBench

Features

Tech Stack

How It Works

Getting Started

Usage

Project Structure

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
docs		docs
src/evalbench		src/evalbench
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Why EvalBench

Features

Tech Stack

How It Works

Getting Started

Usage

Project Structure

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages