feat: Scorer Improvements by monoxgas · Pull Request #115 · dreadnode/sdk

monoxgas · 2025-07-23T08:23:11Z

Scorer Improvements

Key Changes:

Added more scorers and migrated TaskInput to Lookup system
Enhanced scorer documentation and organization

Added:

New Lookup system for parameter resolution (dreadnode/lookup.py)
Classification scorers for zero-shot text classification (dreadnode/scorers/classification.py)
Format validation scorers for JSON/XML (dreadnode/scorers/format.py)
Harm detection scorer using transformers (dreadnode/scorers/harm.py)
Lexical analysis scorers (dreadnode/scorers/lexical.py)
Operator scorers for combining metrics (dreadnode/scorers/operators.py)
Comprehensive usage documentation (docs/usage/scorers.mdx)
Scorers section to documentation navigation

Changed:

Migrated all scorers from TaskInput to Lookup pattern for better parameter handling
Renamed llm_judge.py to judge.py for consistency
Enhanced scorer documentation with expanded examples and API details
Updated imports and exports to reflect new scorer organization
Improved error handling and metadata across scorer modules

Removed:

TaskInput system replaced by Lookup pattern (dreadnode/task.py deleted)
Task documentation page (docs/sdk/task.mdx removed)

Generated Summary:

Added new scorer categories to documentation.
Introduced new scoring methods including:
- detect_refusal_with_zero_shot: Detects refusal to answer using zero-shot classification.
- detect_bias: Scores presence of potentially biased language in data.
- is_json: Validates if a string is properly formatted JSON.
- is_xml: Validates if a string is properly formatted XML.
Updated existing scorer functions:
- Changed references in character_consistency and contains to improve data handling.
- Removed parameters related to PII detection in favor of more relevant scoring.
Code refactoring for better performance and clarity, especially in error handling.
Added comprehensive descriptions and examples in the documentation for all new and altered methods.
These changes improve the functionality of scoring while enhancing the overall documentation clarity and utility.

This summary was generated with ❤️ by rigging

Copilot

Pull Request Overview

This PR implements significant improvements to the scorer system by replacing the TaskInput pattern with a new Lookup system and adding many new scoring capabilities. The changes focus on enhancing parameter resolution, expanding scorer functionality, and improving documentation.

Migrates all scorers from TaskInput to Lookup pattern for better parameter handling
Adds 6 new scorer modules with comprehensive functionality (classification, format validation, harm detection, lexical analysis, operators)
Introduces comprehensive scorer documentation with usage examples

Reviewed Changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
dreadnode/task.py	Removes TaskInput class and related functionality
dreadnode/lookup.py	Introduces new Lookup system for dynamic parameter resolution
dreadnode/scorers/*.py	Updates existing scorers to use Lookup pattern and adds new scorer modules
docs/usage/scorers.mdx	Adds comprehensive scorer documentation
docs/usage/metrics.mdx	Removes scorer content moved to dedicated scorers page

Copilot · 2025-07-23T08:25:02Z

 import typing as t
 from difflib import SequenceMatcher

+import litellm


The litellm import should be moved inside the function where it's used or made conditional. This reduces startup time for modules that don't use litellm-based similarity.

Suggested change

import litellm

# Removed the top-level import of litellm. It will be imported inside the relevant function(s).

Copilot · 2025-07-23T08:25:04Z

-    if min_length < 0 or max_length < min_length:
-        raise ValueError("Invalid length bounds. Must have 0 <= min <= max.")

    def evaluate(data: t.Any) -> Metric:


The validation logic for min_length and max_length should be moved outside the evaluate function to fail fast during scorer creation rather than during evaluation.

Copilot · 2025-07-23T08:25:04Z

    def evaluate(data: t.Any) -> Metric:
+        nonlocal target_length
+
+        target_length = int(resolve_lookup(target_length))
+        if target_length < 0:
+            raise ValueError("Target length must be non-negative.")
+
        text = str(data)
        text_len = len(text)



The validation logic for target_length should be moved outside the evaluate function to fail fast during scorer creation rather than during evaluation.

Copilot · 2025-07-23T08:25:05Z

+        return Metric(value=inverted_value, attributes=original_metric.attributes)
+
+    name = name or f"{scorer.name}_inverted"
+    return Scorer.from_callable(evaluate, name=name)  # type: ignore [return-value]


The type ignore comment suggests a type mismatch. Consider fixing the return type annotation or the function signature to avoid needing type ignore.

Suggested change

return Scorer.from_callable(evaluate, name=name) # type: ignore [return-value]

return t.cast(ScorerT, Scorer.from_callable(evaluate, name=name))

Added more scorers. Cleaned TaskInput and migrated to Lookups. New docs.

3d1d70e

monoxgas requested a review from Copilot July 23, 2025 08:23

dreadnode-renovate-bot Bot added area/docs Changes to documentation and guides type/docs Documentation updates and improvements labels Jul 23, 2025

Copilot AI reviewed Jul 23, 2025

View reviewed changes

monoxgas added 3 commits July 23, 2025 17:41

Additional fixes from feedback

6d73275

Merge remote-tracking branch 'origin/main' into feat/scorer-improvements

b3b0962

Docs updates

696759d

dreadnode-renovate-bot Bot added the area/pre-commit Changes made to pre-commit hooks label Jul 24, 2025

monoxgas added 2 commits July 24, 2025 04:46

Fixing type errors

e633832

Fix type errors

62bc0cf

monoxgas merged commit 6b7ee52 into main Jul 24, 2025
8 checks passed

monoxgas deleted the feat/scorer-improvements branch July 24, 2025 10:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Scorer Improvements#115

feat: Scorer Improvements#115
monoxgas merged 6 commits into
mainfrom
feat/scorer-improvements

monoxgas commented Jul 23, 2025 •

edited by github-actions Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jul 23, 2025

Uh oh!

Copilot AI Jul 23, 2025

Uh oh!

Uh oh!

Copilot AI Jul 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	import litellm
	# Removed the top-level import of litellm. It will be imported inside the relevant function(s).

	return Scorer.from_callable(evaluate, name=name) # type: ignore [return-value]
	return t.cast(ScorerT, Scorer.from_callable(evaluate, name=name))

Conversation

monoxgas commented Jul 23, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Scorer Improvements

Generated Summary:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

monoxgas commented Jul 23, 2025 •

edited by github-actions Bot

Loading