Fix handling and documentation for the cleanup attribute of EvalOutputConfig#1907
Fix handling and documentation for the cleanup attribute of EvalOutputConfig#1907dagardner-nv wants to merge 5 commits into
cleanup attribute of EvalOutputConfig#1907Conversation
Signed-off-by: David Gardner <dagardner@nvidia.com>
…voids a vague pandas error later Signed-off-by: David Gardner <dagardner@nvidia.com>
Update judge LLM leadership board to match Ragas, update github url for Ragas Update documentation to reflect that the default value for `cleanup` is true Signed-off-by: David Gardner <dagardner@nvidia.com>
…nfig(dir=output_dir) Signed-off-by: David Gardner <dagardner@nvidia.com>
WalkthroughDocumentation for workflow evaluation was updated with revised examples, model names, and semantic clarification. The output configuration validator was modified to derive ChangesOutput Configuration and Evaluation Semantics
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
cleanup attribute of EvalGeneralConfig
cleanup attribute of EvalGeneralConfigcleanup attribute of EvalOutputConfig
Signed-off-by: David Gardner <dagardner@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/source/improve-workflows/evaluate.md`:
- Around line 278-289: The fenced code block containing the leaderboard list is
missing a language tag (MD040); update the block start from ``` to include a
language identifier (for example use ```text) so the block becomes ```text and
the closing ``` remains unchanged; locate the fenced block with the list items
(the lines starting with "1)- nvidia/Llama-3_3-Nemotron-Super-49B-v1" through
"10)- google/gemma-2-2b-it") and add the language tag to satisfy the linter.
In `@packages/nvidia_nat_core/src/nat/data_models/evaluate_config.py`:
- Around line 139-144: The validator that syncs output and output_dir currently
only sets values["output"] when "output" is missing, which causes config drift
when "output" is present but output_dir is expected downstream; update the
validator so it always ensures values["output_dir"] matches values["output"].dir
when values["output"] is present, and conversely sets values["output"] =
EvalOutputConfig(dir=output_dir) when only "output_dir" is present—i.e., after
reading values.get("output") and values.get("output_dir"), if output is present
set values["output_dir"]=output.dir, else if output_dir is present set
values["output"]=EvalOutputConfig(dir=output_dir).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 054382ca-1c80-4a5b-ae82-600b45d880ec
📒 Files selected for processing (3)
docs/source/improve-workflows/evaluate.mdpackages/nvidia_nat_core/src/nat/data_models/evaluate_config.pypackages/nvidia_nat_eval/src/nat/plugins/eval/runtime/evaluate.py
| output_config = values.get("output") | ||
| if output_config is None: | ||
| output_dir = values.get("output_dir") | ||
| if output_dir is not None: | ||
| values["output"] = EvalOutputConfig(dir=output_dir) | ||
|
|
There was a problem hiding this comment.
output no longer reliably overrides output_dir (config drift bug).
When output is present, this validator no longer syncs output_dir. But downstream code still reads output_dir (e.g., packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/builder.py Line 133 and packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/evaluate.py Lines 432 and 511), so writes can go to the default path instead of output.dir.
💡 Proposed fix
`@model_validator`(mode="before")
`@classmethod`
def override_output_dir(cls, values):
output_config = values.get("output")
- if output_config is None:
- output_dir = values.get("output_dir")
- if output_dir is not None:
- values["output"] = EvalOutputConfig(dir=output_dir)
+ output_dir = values.get("output_dir")
+
+ if output_config is None:
+ if output_dir is not None:
+ values["output"] = EvalOutputConfig(dir=output_dir)
+ else:
+ # Keep legacy/expected precedence: output.dir overrides output_dir
+ if isinstance(output_config, dict):
+ out_dir = output_config.get("dir")
+ if out_dir is not None:
+ values["output_dir"] = out_dir
+ elif isinstance(output_config, EvalOutputConfig):
+ values["output_dir"] = output_config.dir
return values🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/nvidia_nat_core/src/nat/data_models/evaluate_config.py` around lines
139 - 144, The validator that syncs output and output_dir currently only sets
values["output"] when "output" is missing, which causes config drift when
"output" is present but output_dir is expected downstream; update the validator
so it always ensures values["output_dir"] matches values["output"].dir when
values["output"] is present, and conversely sets values["output"] =
EvalOutputConfig(dir=output_dir) when only "output_dir" is present—i.e., after
reading values.get("output") and values.get("output_dir"), if output is present
set values["output_dir"]=output.dir, else if output_dir is present set
values["output"]=EvalOutputConfig(dir=output_dir).
Description
Fix documentation issue which incorrectly states the default value for
cleanupasfalse.Update YAML snippets in documentation to reflect recent changes to the YAML files in the repo.
Update Ragas GitHub URLs and LLM list.
In situations where the workflows did not produce any output, catch this and raise an explicit exception, prevents a vague exception from being raised later.
Currently in
EvalGeneralConfigwe have two overlapping attributes:output_dir- str with default valueoutput-EvalOutputConfigwhen not-None overridesoutput_diroutputis defined (implied but not stated in documentation).Update the model validator for
EvalGeneralConfigto construct an instance ofEvalOutputConfig(dir=output_dir)whenoutputis undefined, ensuring the output directory cleanup is handled consistently.By Submitting this PR I confirm:
Summary by CodeRabbit
Documentation
Bug Fixes
Improvements