Skip to content

support groupwise scoring#101

Merged
mayinghan merged 8 commits intomainfrom
groupwise-scoring-support
Aug 20, 2025
Merged

support groupwise scoring#101
mayinghan merged 8 commits intomainfrom
groupwise-scoring-support

Conversation

@mayinghan
Copy link
Copy Markdown
Collaborator

@mayinghan mayinghan commented Aug 19, 2025

Add support for groupwise evaluation

Groupwise is equivalent to the "batch" mode in the old @reward_function use case, in groupwise mode, multiple rollout results by completion_params from the same input row will be sent into user's evaluation function. User can perform side by side comparison within the same group. This mode can provide a easier way to do llm as a judge, as user only need to let the judge to determine which one is better, instead of giving a pointwise score.

Also updated the mode to "pointwise", "groupwise" and "listwise".

Test case output for svg generation:

Screenshot 2025-08-19 at 5 22 55 PM

todo: need better UI rendering for groupwise mode
note: this is not the most optimized way to implement this, ideally rollout and eval should be pipelined, but that will require a huge change on rollout logic.

@mayinghan mayinghan force-pushed the groupwise-scoring-support branch from 8838ff2 to d0cd7de Compare August 19, 2025 22:19
@mayinghan mayinghan requested review from benjibc and xzrderek August 20, 2025 00:57
If your evaluation can be computed pointwise, use "pointwise" as EP can pipeline the rollouts and evals to be faster.
"pointwise": (default) applies test function to each row (rollout result).
"groupwise": applies test function to a group of rollout results from the same original row (for use cases such as dpo/grpo).
"listwise": applies test function to the whole dataset.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listwise is confusing, probably just "all" or something

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


_litellm = importlib.import_module("litellm")
acompletion = getattr(_litellm, "acompletion")
logger.debug(f"********** request_params: {request_params} **********")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

response = await acompletion(**request_params)

assistant_content = response.choices[0].message.content or ""
logger.debug(f"********** assistant_content: {assistant_content} **********")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@mayinghan mayinghan merged commit aac0214 into main Aug 20, 2025
7 checks passed
@mayinghan mayinghan deleted the groupwise-scoring-support branch August 20, 2025 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants