Contributing to evalkit

Thanks for your interest in contributing to evalkit! This guide will help you get started.

Development Setup

# Clone the repo
git clone https://github.com/cortexark/evalkit.git
cd evalkit

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install in development mode with all dependencies
pip install -e ".[dev]"

Running Tests

# Run all tests
make test

# Run with coverage
make coverage

# Run specific test file
pytest tests/test_judges.py -v

Code Quality

# Lint
make lint

# Format
make format

# Type check
make typecheck

Making Changes

Fork the repo and create a feature branch from main
Write tests for any new functionality
Run the full test suite before submitting
Follow existing code style — we use ruff for linting and formatting
Write clear commit messages describing the change
Submit a PR with a description of your changes

Architecture

See docs/architecture.md for an overview of the codebase structure. Key decisions are documented in ADRs.

Adding a New Judge

Create a new class in src/evalkit/judges/ extending BaseJudge
Implement the evaluate() method
Add tests in tests/test_judges.py
Update the judges __init__.py exports

Adding a New Rubric

Add your rubric definition in src/evalkit/judges/rubrics.py
Include scoring criteria and examples
Add tests validating the rubric structure

Reporting Issues

Use GitHub Issues with the provided templates for bugs and feature requests.

Code of Conduct

Be respectful, constructive, and collaborative. We're building tools to make LLM evaluation better for everyone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to evalkit

Development Setup

Running Tests

Code Quality

Making Changes

Architecture

Adding a New Judge

Adding a New Rubric

Reporting Issues

Code of Conduct

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to evalkit

Development Setup

Running Tests

Code Quality

Making Changes

Architecture

Adding a New Judge

Adding a New Rubric

Reporting Issues

Code of Conduct