feat: Add Automatic Multimodal Scoring to llm_judge scorer by rdheekonda · Pull Request #302 · dreadnode/sdk

rdheekonda · 2026-01-19T19:19:04Z

Enables llm_judge to automatically detect and score Message outputs containing images and audio alongside text. When a Message with images/audio is provided, they are automatically included in the evaluation using vision-capable models.

Key Changes:

Automatic multimodal detection via Message.image_parts/audio_parts
Zero API changes - backward compatible with text-only scoring
Single combined score for text + images + audio
Extract helper functions to improve code quality
Add observability attributes (has_multimodal, num_images, num_audio)
Example notebook demonstrating text-only, image-only, and multimodal scoring

Added:

New features/functionality
- Automatic multimodal detection in llm_judge scorer via Message.image_parts and audio_parts
- Multimodal content building with base64 encoding for images and audio
- Observability attributes: has_multimodal, num_images, num_audio in metrics
- New files/configurations
  - examples/airt/multimodal_llm_judge.ipynb - Example notebook demonstrating text-only, image-only, and multimodal scoring scenarios
- New dependencies

Changed:

Updates to existing code
- dreadnode/scorers/judge.py:
  - Extract _build_multimodal_content() helper for building rigging content from Message
  - Extract _create_judge_pipeline() helper for pipeline creation (keyword-only has_multimodal parameter)
  - Extract _create_judge_metrics() helper for metric creation
  - Update judge() prompt docstring to mention multimodal evaluation
  - Automatic detection and handling of multimodal Messages in evaluate()
- Configuration changes
- Dependency updates

Removed:

Deleted files/code
Removed dependencies
Cleaned up configurations

Generated Summary:

Added multimodal support to the llm_judge function for evaluating text, images, and audio.
Introduced helper functions _build_multimodal_content and _create_judge_pipeline to handle content construction and pipeline generation for multimodal messages.
Updated the metrics collection in _create_judge_metrics to include information on images and audio.
Enhanced judge function documentation to clarify that it evaluates all provided content when making judgments.
Created a new Jupyter notebook multimodal_llm_judge.ipynb to demonstrate usage of the new multimodal judging capabilities.
Improved error handling to ensure that multimodal content is processed consistently, even in cases where fallback models are utilized.
Ensured backward compatibility; existing text-only judging still functions without changes to API calls.

These updates significantly enhance the judging capabilities, allowing for richer evaluations of outputs that include various forms of content. Existing functionalities remain intact while new features are added.

This summary was generated with ❤️ by rigging

Enables llm_judge to automatically detect and score Message outputs containing images and audio alongside text. When a Message with images/audio is provided, they are automatically included in the evaluation using vision-capable models. Key changes: - Automatic multimodal detection via Message.image_parts/audio_parts - Zero API changes - backward compatible with text-only scoring - Single combined score for text + images + audio - Extract helper functions to improve code quality - Add observability attributes (has_multimodal, num_images, num_audio) - Example notebook demonstrating text-only, image-only, and multimodal scoring

dreadnode-renovate-bot Bot added the area/examples Changes to example code and demonstrations label Jan 19, 2026

rdheekonda force-pushed the feat/multimodal-llm-judge branch from 6f8cb1a to aeaffbb Compare January 19, 2026 19:23

Merge branch 'main' into feat/multimodal-llm-judge

eb5345d

rdheekonda force-pushed the feat/multimodal-llm-judge branch from aeaffbb to eb5345d Compare January 19, 2026 19:27

rdheekonda added this pull request to the merge queue Jan 20, 2026

Merged via the queue into main with commit fc0a946 Jan 20, 2026
8 checks passed

rdheekonda deleted the feat/multimodal-llm-judge branch January 20, 2026 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Automatic Multimodal Scoring to llm_judge scorer#302

feat: Add Automatic Multimodal Scoring to llm_judge scorer#302
rdheekonda merged 2 commits into
mainfrom
feat/multimodal-llm-judge

rdheekonda commented Jan 19, 2026 •

edited by github-actions Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rdheekonda commented Jan 19, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Generated Summary:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rdheekonda commented Jan 19, 2026 •

edited by github-actions Bot

Loading