Skip to content

feat: Scorer Improvements#115

Merged
monoxgas merged 6 commits into
mainfrom
feat/scorer-improvements
Jul 24, 2025
Merged

feat: Scorer Improvements#115
monoxgas merged 6 commits into
mainfrom
feat/scorer-improvements

Conversation

@monoxgas
Copy link
Copy Markdown
Contributor

@monoxgas monoxgas commented Jul 23, 2025

Scorer Improvements

Key Changes:

  • Added more scorers and migrated TaskInput to Lookup system
  • Enhanced scorer documentation and organization

Added:

  • New Lookup system for parameter resolution (dreadnode/lookup.py)
  • Classification scorers for zero-shot text classification (dreadnode/scorers/classification.py)
  • Format validation scorers for JSON/XML (dreadnode/scorers/format.py)
  • Harm detection scorer using transformers (dreadnode/scorers/harm.py)
  • Lexical analysis scorers (dreadnode/scorers/lexical.py)
  • Operator scorers for combining metrics (dreadnode/scorers/operators.py)
  • Comprehensive usage documentation (docs/usage/scorers.mdx)
  • Scorers section to documentation navigation

Changed:

  • Migrated all scorers from TaskInput to Lookup pattern for better parameter handling
  • Renamed llm_judge.py to judge.py for consistency
  • Enhanced scorer documentation with expanded examples and API details
  • Updated imports and exports to reflect new scorer organization
  • Improved error handling and metadata across scorer modules

Removed:

  • TaskInput system replaced by Lookup pattern (dreadnode/task.py deleted)
  • Task documentation page (docs/sdk/task.mdx removed)

Generated Summary:

  • Added new scorer categories to documentation.
  • Introduced new scoring methods including:
    • detect_refusal_with_zero_shot: Detects refusal to answer using zero-shot classification.
    • detect_bias: Scores presence of potentially biased language in data.
    • is_json: Validates if a string is properly formatted JSON.
    • is_xml: Validates if a string is properly formatted XML.
  • Updated existing scorer functions:
    • Changed references in character_consistency and contains to improve data handling.
    • Removed parameters related to PII detection in favor of more relevant scoring.
  • Code refactoring for better performance and clarity, especially in error handling.
  • Added comprehensive descriptions and examples in the documentation for all new and altered methods.
  • These changes improve the functionality of scoring while enhancing the overall documentation clarity and utility.

This summary was generated with ❤️ by rigging

@monoxgas monoxgas requested a review from Copilot July 23, 2025 08:23
@dreadnode-renovate-bot dreadnode-renovate-bot Bot added area/docs Changes to documentation and guides type/docs Documentation updates and improvements labels Jul 23, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements significant improvements to the scorer system by replacing the TaskInput pattern with a new Lookup system and adding many new scoring capabilities. The changes focus on enhancing parameter resolution, expanding scorer functionality, and improving documentation.

  • Migrates all scorers from TaskInput to Lookup pattern for better parameter handling
  • Adds 6 new scorer modules with comprehensive functionality (classification, format validation, harm detection, lexical analysis, operators)
  • Introduces comprehensive scorer documentation with usage examples

Reviewed Changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
dreadnode/task.py Removes TaskInput class and related functionality
dreadnode/lookup.py Introduces new Lookup system for dynamic parameter resolution
dreadnode/scorers/*.py Updates existing scorers to use Lookup pattern and adds new scorer modules
docs/usage/scorers.mdx Adds comprehensive scorer documentation
docs/usage/metrics.mdx Removes scorer content moved to dedicated scorers page

Comment thread dreadnode/scorers/similarity.py Outdated
import typing as t
from difflib import SequenceMatcher

import litellm
Copy link

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The litellm import should be moved inside the function where it's used or made conditional. This reduces startup time for modules that don't use litellm-based similarity.

Suggested change
import litellm
# Removed the top-level import of litellm. It will be imported inside the relevant function(s).

Copilot uses AI. Check for mistakes.
Comment thread dreadnode/scorers/similarity.py
Comment thread dreadnode/scorers/sentiment.py Outdated
Comment thread dreadnode/scorers/lexical.py Outdated
if min_length < 0 or max_length < min_length:
raise ValueError("Invalid length bounds. Must have 0 <= min <= max.")

def evaluate(data: t.Any) -> Metric:
Copy link

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation logic for min_length and max_length should be moved outside the evaluate function to fail fast during scorer creation rather than during evaluation.

Copilot uses AI. Check for mistakes.
Comment on lines 120 to 129
def evaluate(data: t.Any) -> Metric:
nonlocal target_length

target_length = int(resolve_lookup(target_length))
if target_length < 0:
raise ValueError("Target length must be non-negative.")

text = str(data)
text_len = len(text)

Copy link

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation logic for target_length should be moved outside the evaluate function to fail fast during scorer creation rather than during evaluation.

Copilot uses AI. Check for mistakes.
Comment thread dreadnode/scorers/readability.py
return Metric(value=inverted_value, attributes=original_metric.attributes)

name = name or f"{scorer.name}_inverted"
return Scorer.from_callable(evaluate, name=name) # type: ignore [return-value]
Copy link

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type ignore comment suggests a type mismatch. Consider fixing the return type annotation or the function signature to avoid needing type ignore.

Suggested change
return Scorer.from_callable(evaluate, name=name) # type: ignore [return-value]
return t.cast(ScorerT, Scorer.from_callable(evaluate, name=name))

Copilot uses AI. Check for mistakes.
Comment thread dreadnode/scorers/__init__.py Outdated
Comment thread docs/usage/scorers.mdx Outdated
@dreadnode-renovate-bot dreadnode-renovate-bot Bot added the area/pre-commit Changes made to pre-commit hooks label Jul 24, 2025
@monoxgas monoxgas merged commit 6b7ee52 into main Jul 24, 2025
8 checks passed
@monoxgas monoxgas deleted the feat/scorer-improvements branch July 24, 2025 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docs Changes to documentation and guides area/pre-commit Changes made to pre-commit hooks type/docs Documentation updates and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants