support groupwise scoring by mayinghan · Pull Request #101 · eval-protocol/python-sdk

mayinghan · 2025-08-19T22:18:04Z

Add support for groupwise evaluation

Groupwise is equivalent to the "batch" mode in the old @reward_function use case, in groupwise mode, multiple rollout results by completion_params from the same input row will be sent into user's evaluation function. User can perform side by side comparison within the same group. This mode can provide a easier way to do llm as a judge, as user only need to let the judge to determine which one is better, instead of giving a pointwise score.

Also updated the mode to "pointwise", "groupwise" and "listwise".

Test case output for svg generation:

todo: need better UI rendering for groupwise mode
note: this is not the most optimized way to implement this, ideally rollout and eval should be pipelined, but that will require a huge change on rollout logic.

benjibc · 2025-08-20T01:34:25Z

eval_protocol/pytest/types.py

-If your evaluation can be computed pointwise, use "pointwise" as EP can pipeline the rollouts and evals to be faster.
+"pointwise": (default) applies test function to each row (rollout result).
+"groupwise": applies test function to a group of rollout results from the same original row (for use cases such as dpo/grpo).
+"listwise": applies test function to the whole dataset.


listwise is confusing, probably just "all" or something

benjibc · 2025-08-20T01:34:38Z

eval_protocol/pytest/default_single_turn_rollout_process.py


            _litellm = importlib.import_module("litellm")
            acompletion = getattr(_litellm, "acompletion")
+            logger.debug(f"********** request_params: {request_params} **********")


benjibc · 2025-08-20T01:34:41Z

eval_protocol/pytest/default_single_turn_rollout_process.py

            response = await acompletion(**request_params)

            assistant_content = response.choices[0].message.content or ""
+            logger.debug(f"********** assistant_content: {assistant_content} **********")


support groupwise scoring

d0cd7de

mayinghan force-pushed the groupwise-scoring-support branch from 8838ff2 to d0cd7de Compare August 19, 2025 22:19

mayinghan added 5 commits August 19, 2025 15:34

format

2431dbe

fix ut

a439e76

add tests

1b8032d

remove useless test

3406889

format

d587101

mayinghan requested review from benjibc and xzrderek August 20, 2025 00:57

benjibc approved these changes Aug 20, 2025

View reviewed changes

mayinghan added 2 commits August 19, 2025 19:00

rename listwise to all

d000f19

fix ut

9e41b2c

mayinghan merged commit aac0214 into main Aug 20, 2025
7 checks passed

mayinghan deleted the groupwise-scoring-support branch August 20, 2025 04:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support groupwise scoring#101

support groupwise scoring#101
mayinghan merged 8 commits intomainfrom
groupwise-scoring-support

mayinghan commented Aug 19, 2025 •

edited

Loading

Uh oh!

benjibc Aug 20, 2025

Uh oh!

mayinghan Aug 20, 2025

Uh oh!

benjibc Aug 20, 2025

Uh oh!

benjibc Aug 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mayinghan commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benjibc Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

mayinghan Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

benjibc Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

benjibc Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mayinghan commented Aug 19, 2025 •

edited

Loading