Intereval is a command-line tool for evaluating and comparing Large Language Models (LLMs) available through Ollama. It provides an interactive and non-interactive interface to streamline the process of testing prompts against different models and evaluating their responses.
- Interactive Mode: A user-friendly, guided experience for setting up evaluations.
- Non-Interactive Mode: Run evaluations using command-line arguments for easy scripting and automation.
- Configuration Files: Save and reuse evaluation setups in YAML format.
- Two Evaluation Modes:
- One-Prompt-Many-Models: Test a single prompt against multiple LLMs.
- Many-Prompts-One-Model: Test multiple prompts against a single LLM.
- Flexible Evaluation: Evaluate responses based on an expected "golden" response or a set of instructions (rubric).
- Rich Output: Presents evaluation results in a clean, readable table format.
- Python 3.8+
- Docker (optional, for containerized execution)
- Ollama installed with at least one model already downloaded and running.
-
Clone the repository:
git clone https://github.com/avedave/intereval.git cd intereval -
Create and activate a virtual environment:
# On Windows, you may need to use 'python' instead of 'python3' python3 -m venv .venv source .venv/bin/activate
-
Install the dependencies:
pip install -r requirements.txt
Intereval can be run in three main ways: interactive mode, non-interactive mode, and from a configuration file.
To start the interactive session, run the following command:
python -m src.intereval.mainThe tool will guide you through selecting the evaluation mode, providing prompts, choosing models, and setting up the evaluation criteria.
For quick evaluations, you can use command-line arguments.
Example: One prompt against multiple models
python -m src.intereval.main \
--mode one-prompt-many-models \
--prompt "What is the capital of France?" \
--models llama3 qwen:7b \
--instructions "Is the answer Paris?" \
--eval-model llama3You can also run an evaluation from a YAML configuration file.
- Create a
config.yamlfile (or let the interactive mode generate one for you in theconfig/directory). - Run the evaluation:
python -m src.intereval.main --config /path/to/your/config.yaml
You can build and run Intereval using Docker.
- Build the Docker image:
docker build -t intereval . - Run the Docker container:
Note:
docker run -it --rm --network=host intereval
--network=hostis used to allow the container to connect to the Ollama service running on the host machine.
/
├── config/ # Stores generated YAML configuration files
├── eval_prompts/ # Stores evaluation prompts/instructions
├── prompts/ # Stores user-defined prompts
├── results/ # Stores evaluation results in JSON format
└── src/
└── intereval/
├── main.py # Main application logic
└── templates/ # Templates for evaluation prompts
Contributions are welcome! Please feel free to submit a pull request or open an issue.
This project is licensed under the MIT License. See the LICENSE file for details.