fix(eval): improve rubric text normalization for judge-garbled output by tottenjordan · Pull Request #6080 · google/adk-python

tottenjordan · 2026-06-11T13:12:25Z

Summary

_normalize_text currently only does .lower().strip(), so judge-model garbling (markdown bullets, smart quotes, bold formatting, extra whitespace) causes exact rubric match failures. Rubric scores get silently dropped with only a warning log.

Changes:

Replace _normalize_text with NFKC unicode normalization, smart-quote/dash translation, and markdown artifact stripping
Add substring fallback with uniqueness guard to convert_auto_rater_response_to_score — accepts a match only when exactly one rubric candidate matches, preventing ambiguous cross-matching

Garbling patterns handled:

Input	Normalized	Match
`- The response correctly uses tools`	`the response correctly uses tools`	✅
`* The response correctly uses tools`	`the response correctly uses tools`	✅
`"The response correctly uses tools"` (smart quotes)	`the response correctly uses tools`	✅
`— The response correctly uses tools` (em dash)	`the response correctly uses tools`	✅
`– The response correctly uses tools` (en dash)	`the response correctly uses tools`	✅
`• The response correctly uses tools` (unicode bullet)	`the response correctly uses tools`	✅
`The response correctly uses tools` (double spaces)	`the response correctly uses tools`	✅
`The response… uses tools` (ellipsis)	`the response... uses tools`	✅
`réponse` (accented chars)	`réponse` (preserved)	✅

Per @surajksharma07's suggestion in #6072: uses NFKC normalization instead of ascii-ignore (preserves non-English rubrics), and adds uniqueness guard on the substring fallback.

Validation

Unit tests: 46 tests pass (44 existing + 2 new) in test_rubric_based_evaluator.py
E2E pipeline: Ran full GEPA optimization pipeline (gepa-run-8fb68a8f52-20260611-115752) with 4 rubric-based criteria, gemini-2.5-pro judge — zero "not found in rubrics" warnings across all generations

Test plan

pytest tests/unittests/evaluation/test_rubric_based_evaluator.py -v — all 46 pass
Parametrized TestNormalizeText covers all garbling patterns from issue
TestSubstringFallbackUniquenessGuard verifies unique match accepted, ambiguous match rejected
All existing tests unchanged and passing

tottenjordan · 2026-06-11T13:16:22Z

@surajksharma07 PR is up per your suggestion in #6072. Includes the NFKC normalization, smart-char mapping, and uniqueness guard on the substring fallback. 46 tests pass (44 existing + 2 new).

rohityan · 2026-06-11T22:03:51Z

/adk-pr-analyze

rohityan · 2026-06-11T22:15:02Z

Hi @tottenjordan , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Please fix formatting errors.

tottenjordan · 2026-06-23T23:08:08Z

@googlebot I signed it.

Replace _normalize_text's simple lower().strip() with NFKC unicode normalization, smart-quote/dash translation, and markdown artifact stripping. Add substring fallback with uniqueness guard to convert_auto_rater_response_to_score for cases where normalization alone isn't sufficient. Fixes google#6072

Address reviewer feedback on google#6072: - Guard `if not rubric and normalized_rubric_text:` prevents empty judge Property: lines from matching every rubric via substring - Guard `if ct and` prevents empty rubric keys from matching - Add logger.debug when substring fallback rescues a match to track judge drift in eval logs - Add test_empty_property_text_does_not_match test case

surajksharma07 mentioned this pull request Jun 11, 2026

RubricBasedEvaluator _normalize_text too basic — fails on judge model markdown output #6072

Open

rohityan self-assigned this Jun 11, 2026

rohityan added the eval [Component] This issue is related to evaluation label Jun 11, 2026

rohityan added the request clarification [Status] The maintainer need clarification or more information from the author label Jun 11, 2026

zettaittenani mentioned this pull request Jun 20, 2026

[Bug] RubricBasedEvaluator silently drops all rubrics when judge paraphrases text (Japanese kanji↔hiragana etc.) — beyond #6080 normalization scope #6171

Open

tottenjordan force-pushed the fix/rubric-text-normalization branch 2 times, most recently from 67faae3 to e29c0f3 Compare June 23, 2026 23:16

tottenjordan added 3 commits June 23, 2026 23:18

style: fix import ordering for pre-commit compliance

b7c975a

tottenjordan force-pushed the fix/rubric-text-normalization branch from e29c0f3 to 72ece22 Compare June 23, 2026 23:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(eval): improve rubric text normalization for judge-garbled output#6080

fix(eval): improve rubric text normalization for judge-garbled output#6080
tottenjordan wants to merge 3 commits into
google:mainfrom
tottenjordan:fix/rubric-text-normalization

tottenjordan commented Jun 11, 2026

Uh oh!

tottenjordan commented Jun 11, 2026

Uh oh!

rohityan commented Jun 11, 2026

Uh oh!

rohityan commented Jun 11, 2026

Uh oh!

tottenjordan commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tottenjordan commented Jun 11, 2026

Summary

Validation

Test plan

Uh oh!

tottenjordan commented Jun 11, 2026

Uh oh!

rohityan commented Jun 11, 2026

Uh oh!

rohityan commented Jun 11, 2026

Uh oh!

tottenjordan commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants