Skip to content

antonyga/modelarena2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ModelArena 2

A lightweight Python framework for testing Large Language Model APIs. Send prompts, validate responses against customizable criteria, and generate detailed reports.

Features

  • Multiple Validation Types: Contains, regex, length limits, JSON validation, exact match, and custom validators
  • OpenAI-Compatible: Works with OpenAI, Azure OpenAI, and any OpenAI-compatible API
  • Flexible Configuration: Define tests in JSON files or programmatically
  • Detailed Reporting: HTML reports with visual dashboard, plus console and JSON output
  • Environment Support: Loads configuration from .env files

Installation

pip install -r requirements.txt

Quick Start

  1. Create a .env file with your API key:
OPENAI_API_KEY=sk-your-key-here
  1. Run the example tests:
python llm_tester.py test_cases.json

Usage

Command Line

# Basic usage (generates report.html by default)
python llm_tester.py test_cases.json

# Specify model
python llm_tester.py test_cases.json --model gpt-4o

# Custom API endpoint
python llm_tester.py test_cases.json --base-url https://your-api.com/v1

# Custom output file
python llm_tester.py test_cases.json -o results.html

# JSON format instead of HTML
python llm_tester.py test_cases.json --format json -o report.json

# Verbose logging
python llm_tester.py test_cases.json -v

Programmatic Usage

from llm_tester import LLMClient, TestRunner, ReportGenerator

# Initialize with custom settings
client = LLMClient(
    api_key="sk-...",
    base_url="https://api.openai.com/v1",
    model="gpt-4o"
)

# Run tests from file
runner = TestRunner(client)
report = runner.run_from_file("test_cases.json")

# Or run tests programmatically
test_cases = [
    {
        "name": "Greeting test",
        "prompt": "Say hello",
        "criteria": {"contains": ["hello"]}
    }
]
report = runner.run_tests(test_cases)

# Generate reports
ReportGenerator.to_console(report)
ReportGenerator.to_html(report, "report.html")  # HTML dashboard
ReportGenerator.to_json(report, "report.json")  # JSON export

Test Case Format

Tests are defined in JSON with the following structure:

{
  "test_cases": [
    {
      "name": "Test name",
      "prompt": "The prompt to send",
      "system_prompt": "Optional system prompt",
      "criteria": {
        "contains": ["required", "words"],
        "not_contains": ["forbidden"],
        "regex": "pattern.*match",
        "min_length": 10,
        "max_length": 500,
        "equals": "exact match",
        "json_valid": true
      }
    }
  ]
}

Validation Criteria

Criteria Description Example
contains Response must contain all strings (case-insensitive) ["hello", "world"]
not_contains Response must not contain any strings ["error", "fail"]
regex Response must match regex pattern "\\d{3}-\\d{4}"
min_length Minimum response character count 50
max_length Maximum response character count 1000
equals Exact match (case-insensitive, trimmed) "yes"
json_valid Response must be valid JSON (handles markdown code blocks) true
custom Custom validator function (programmatic only) lambda r: (bool, reason)

Environment Variables

Variable Description Default
OPENAI_API_KEY API key for authentication -
LLM_API_KEY Alternative API key variable -
LLM_BASE_URL API base URL https://api.openai.com/v1
LLM_MODEL Model to use gpt-4o

Sample Output

HTML Report

The default HTML report includes:

  • Summary cards showing total tests, passed, failed, and pass rate
  • Detailed results table with prompts, responses, and failure reasons
  • Color-coded status indicators
  • Responsive design

Console Output

============================================================
TEST REPORT
============================================================
Start: 2025-01-25T10:30:00
End:   2025-01-25T10:30:15

Total: 8 | Passed: 7 | Failed: 1
Pass Rate: 87.5%

------------------------------------------------------------
FAILURES:
------------------------------------------------------------

[FAIL] Math calculation
  Prompt: What is 15 + 27? Reply with just the number....
  Response: The answer is 42....
  - Response doesn't match pattern: '^\d+$'

============================================================

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages