Thanks for your interest in contributing to evalkit! This guide will help you get started.
# Clone the repo
git clone https://github.com/cortexark/evalkit.git
cd evalkit
# Create a virtual environment
python -m venv .venv
source .venv/bin/activate
# Install in development mode with all dependencies
pip install -e ".[dev]"# Run all tests
make test
# Run with coverage
make coverage
# Run specific test file
pytest tests/test_judges.py -v# Lint
make lint
# Format
make format
# Type check
make typecheck- Fork the repo and create a feature branch from
main - Write tests for any new functionality
- Run the full test suite before submitting
- Follow existing code style — we use ruff for linting and formatting
- Write clear commit messages describing the change
- Submit a PR with a description of your changes
See docs/architecture.md for an overview of the codebase structure. Key decisions are documented in ADRs.
- Create a new class in
src/evalkit/judges/extendingBaseJudge - Implement the
evaluate()method - Add tests in
tests/test_judges.py - Update the judges
__init__.pyexports
- Add your rubric definition in
src/evalkit/judges/rubrics.py - Include scoring criteria and examples
- Add tests validating the rubric structure
Use GitHub Issues with the provided templates for bugs and feature requests.
Be respectful, constructive, and collaborative. We're building tools to make LLM evaluation better for everyone.