Intereval: The Interactive LLM Evaluation Tool

Intereval is a command-line tool for evaluating and comparing Large Language Models (LLMs) available through Ollama. It provides an interactive and non-interactive interface to streamline the process of testing prompts against different models and evaluating their responses.

Features

Interactive Mode: A user-friendly, guided experience for setting up evaluations.
Non-Interactive Mode: Run evaluations using command-line arguments for easy scripting and automation.
Configuration Files: Save and reuse evaluation setups in YAML format.
Two Evaluation Modes:
- One-Prompt-Many-Models: Test a single prompt against multiple LLMs.
- Many-Prompts-One-Model: Test multiple prompts against a single LLM.
Flexible Evaluation: Evaluate responses based on an expected "golden" response or a set of instructions (rubric).
Rich Output: Presents evaluation results in a clean, readable table format.

Getting Started

Prerequisites

Python 3.8+
Docker (optional, for containerized execution)
Ollama installed with at least one model already downloaded and running.

Installation

Clone the repository:

git clone https://github.com/avedave/intereval.git
cd intereval

Create and activate a virtual environment:

# On Windows, you may need to use 'python' instead of 'python3'
python3 -m venv .venv
source .venv/bin/activate

Install the dependencies:
```
pip install -r requirements.txt
```

Usage

Intereval can be run in three main ways: interactive mode, non-interactive mode, and from a configuration file.

Interactive Mode

To start the interactive session, run the following command:

python -m src.intereval.main

The tool will guide you through selecting the evaluation mode, providing prompts, choosing models, and setting up the evaluation criteria.

Non-Interactive Mode

For quick evaluations, you can use command-line arguments.

Example: One prompt against multiple models

python -m src.intereval.main \
  --mode one-prompt-many-models \
  --prompt "What is the capital of France?" \
  --models llama3 qwen:7b \
  --instructions "Is the answer Paris?" \
  --eval-model llama3

Using a Configuration File

You can also run an evaluation from a YAML configuration file.

Create a config.yaml file (or let the interactive mode generate one for you in the config/ directory).

Run the evaluation:

python -m src.intereval.main --config /path/to/your/config.yaml

Docker

You can build and run Intereval using Docker.

Build the Docker image:
```
docker build -t intereval .
```
Run the Docker container:
```
docker run -it --rm --network=host intereval
```
Note: --network=host is used to allow the container to connect to the Ollama service running on the host machine.

Project Structure

/
├── config/              # Stores generated YAML configuration files
├── eval_prompts/        # Stores evaluation prompts/instructions
├── prompts/             # Stores user-defined prompts
├── results/             # Stores evaluation results in JSON format
└── src/
    └── intereval/
        ├── main.py      # Main application logic
        └── templates/   # Templates for evaluation prompts

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
src/intereval		src/intereval
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intereval: The Interactive LLM Evaluation Tool

Features

Getting Started

Prerequisites

Installation

Usage

Interactive Mode

Non-Interactive Mode

Using a Configuration File

Docker

Project Structure

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Intereval: The Interactive LLM Evaluation Tool

Features

Getting Started

Prerequisites

Installation

Usage

Interactive Mode

Non-Interactive Mode

Using a Configuration File

Docker

Project Structure

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages