feat: LLM Judge by monoxgas · Pull Request #112 · dreadnode/sdk

monoxgas · 2025-07-22T22:48:59Z

LLM Judge

Key Changes:

Added a new llm_judge scorer which uses an inference model to grade based on a rubric.

Added a new module llm_judge.py to implement a scoring system using a language model (LLM) to evaluate outputs against a rubric.
Introduced two new classes, JudgeInput and Judgement, to structure the input data and output results respectively.
Implemented the core function llm_judge, which evaluates the input based on the rubric and provides metrics, including score and pass/fail status.
Updated __init__.py to include the new llm_judge scorer in the module's exports, ensuring it can be easily accessed.
Updated pyproject.toml to upgrade the rigging library to version ^3.2.1 for compatibility with the new scoring functionality.
Potential impact: This feature enhances the scoring mechanism by integrating LLM capabilities, which could improve the accuracy of evaluations based on defined criteria.

This summary was generated with ❤️ by rigging

Add llm_judge scorer

a7c7cb1

dreadnode-renovate-bot Bot added the area/python Changes to Python package configuration and dependencies label Jul 22, 2025

monoxgas merged commit f6980c4 into main Jul 22, 2025
9 checks passed

monoxgas deleted the feat/llm-judge branch July 22, 2025 22:53