Fix handling and documentation for the `cleanup` attribute of `EvalOutputConfig` by dagardner-nv · Pull Request #1907 · NVIDIA/NeMo-Agent-Toolkit

dagardner-nv · 2026-05-04T18:44:33Z

Description

Fix documentation issue which incorrectly states the default value for cleanup as false.
Update YAML snippets in documentation to reflect recent changes to the YAML files in the repo.
Update Ragas GitHub URLs and LLM list.
In situations where the workflows did not produce any output, catch this and raise an explicit exception, prevents a vague exception from being raised later.
Currently in EvalGeneralConfig we have two overlapping attributes:
- output_dir - str with default value
- output - EvalOutputConfig when not-None overrides output_dir
- Output directory cleanup only occurs if output is defined (implied but not stated in documentation).
Update the model validator for EvalGeneralConfig to construct an instance of EvalOutputConfig(dir=output_dir) when output is undefined, ensuring the output directory cleanup is handled consistently.

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

Documentation
- Updated evaluation documentation with installation prerequisites and refined configuration examples.
- Clarified the timing semantics of output cleanup behavior.
Bug Fixes
- Added validation to prevent profiling workflows from running with empty trajectory data.
Improvements
- Enhanced output directory configuration handling.

Signed-off-by: David Gardner <dagardner@nvidia.com>

…voids a vague pandas error later Signed-off-by: David Gardner <dagardner@nvidia.com>

Update judge LLM leadership board to match Ragas, update github url for Ragas Update documentation to reflect that the default value for `cleanup` is true Signed-off-by: David Gardner <dagardner@nvidia.com>

…nfig(dir=output_dir) Signed-off-by: David Gardner <dagardner@nvidia.com>

coderabbitai · 2026-05-04T18:44:41Z

Walkthrough

Documentation for workflow evaluation was updated with revised examples, model names, and semantic clarification. The output configuration validator was modified to derive output from output_dir when missing. An empty-trajectory validation was added to the profiling runtime.

Changes

Output Configuration and Evaluation Semantics

Layer / File(s)	Summary
Configuration Validator `packages/nvidia_nat_core/src/nat/data_models/evaluate_config.py`	`EvalGeneralConfig.override_output_dir` now derives `output` from `output_dir` when `output` is absent, rather than overwriting `output_dir` from `output.dir`.
Documentation and Examples `docs/source/improve-workflows/evaluate.md`	Installation step added for simple web query example; NIM and judge models updated to `nvidia/nemotron-3-nano` and `nvidia/nemotron-3-nano-30b-a3b` respectively; configuration examples revised to use `output.dir` with `cleanup: true`; cleanup semantics clarified to remove output directory before evaluation; Ragas leadership board and GitHub links updated.
Profiling Validation `packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/evaluate.py`	`profile_workflow()` now validates that trajectories are non-empty before invoking `ProfilerRunner`, raising `ValueError` when none are found.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes to the codebase, which focus on fixing the handling and documentation of the cleanup attribute in EvalOutputConfig.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Review rate limit: 9/10 reviews remaining, refill in 6 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: David Gardner <dagardner@nvidia.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/source/improve-workflows/evaluate.md`:
- Around line 278-289: The fenced code block containing the leaderboard list is
missing a language tag (MD040); update the block start from ``` to include a
language identifier (for example use ```text) so the block becomes ```text and
the closing ``` remains unchanged; locate the fenced block with the list items
(the lines starting with "1)- nvidia/Llama-3_3-Nemotron-Super-49B-v1" through
"10)- google/gemma-2-2b-it") and add the language tag to satisfy the linter.

In `@packages/nvidia_nat_core/src/nat/data_models/evaluate_config.py`:
- Around line 139-144: The validator that syncs output and output_dir currently
only sets values["output"] when "output" is missing, which causes config drift
when "output" is present but output_dir is expected downstream; update the
validator so it always ensures values["output_dir"] matches values["output"].dir
when values["output"] is present, and conversely sets values["output"] =
EvalOutputConfig(dir=output_dir) when only "output_dir" is present—i.e., after
reading values.get("output") and values.get("output_dir"), if output is present
set values["output_dir"]=output.dir, else if output_dir is present set
values["output"]=EvalOutputConfig(dir=output_dir).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 054382ca-1c80-4a5b-ae82-600b45d880ec

📥 Commits

Reviewing files that changed from the base of the PR and between 443b5c8 and 48c0fac.

📒 Files selected for processing (3)

docs/source/improve-workflows/evaluate.md
packages/nvidia_nat_core/src/nat/data_models/evaluate_config.py
packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/evaluate.py

coderabbitai · 2026-05-04T19:01:12Z

+        output_config = values.get("output")
+        if output_config is None:
+            output_dir = values.get("output_dir")
+            if output_dir is not None:
+                values["output"] = EvalOutputConfig(dir=output_dir)
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

output no longer reliably overrides output_dir (config drift bug).

When output is present, this validator no longer syncs output_dir. But downstream code still reads output_dir (e.g., packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/builder.py Line 133 and packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/evaluate.py Lines 432 and 511), so writes can go to the default path instead of output.dir.

💡 Proposed fix

`@model_validator`(mode="before") `@classmethod` def override_output_dir(cls, values): output_config = values.get("output") - if output_config is None: - output_dir = values.get("output_dir") - if output_dir is not None: - values["output"] = EvalOutputConfig(dir=output_dir) + output_dir = values.get("output_dir") + + if output_config is None: + if output_dir is not None: + values["output"] = EvalOutputConfig(dir=output_dir) + else: + # Keep legacy/expected precedence: output.dir overrides output_dir + if isinstance(output_config, dict): + out_dir = output_config.get("dir") + if out_dir is not None: + values["output_dir"] = out_dir + elif isinstance(output_config, EvalOutputConfig): + values["output_dir"] = output_config.dir return values

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/nvidia_nat_core/src/nat/data_models/evaluate_config.py` around lines 139 - 144, The validator that syncs output and output_dir currently only sets values["output"] when "output" is missing, which causes config drift when "output" is present but output_dir is expected downstream; update the validator so it always ensures values["output_dir"] matches values["output"].dir when values["output"] is present, and conversely sets values["output"] = EvalOutputConfig(dir=output_dir) when only "output_dir" is present—i.e., after reading values.get("output") and values.get("output_dir"), if output is present set values["output_dir"]=output.dir, else if output_dir is present set values["output"]=EvalOutputConfig(dir=output_dir).

dagardner-nv added 4 commits May 4, 2026 10:56

Install missing example install command

6cece48

Signed-off-by: David Gardner <dagardner@nvidia.com>

Raise an explicit value error when the input dataset is empty, this a…

7e25893

…voids a vague pandas error later Signed-off-by: David Gardner <dagardner@nvidia.com>

Update yaml snippets to match what is currently in the source tree.

b76f701

Update judge LLM leadership board to match Ragas, update github url for Ragas Update documentation to reflect that the default value for `cleanup` is true Signed-off-by: David Gardner <dagardner@nvidia.com>

When output is undefined, but output_dir is, instantiate EvalOutputCo…

a99faed

…nfig(dir=output_dir) Signed-off-by: David Gardner <dagardner@nvidia.com>

dagardner-nv self-assigned this May 4, 2026

dagardner-nv added bug Something isn't working non-breaking Non-breaking change labels May 4, 2026

dagardner-nv changed the title ~~Cleanup eval documentation~~ Fix handling and documentation for the cleanup attribute of EvalGeneralConfig May 4, 2026

dagardner-nv changed the title ~~Fix handling and documentation for the cleanup attribute of EvalGeneralConfig~~ Fix handling and documentation for the cleanup attribute of EvalOutputConfig May 4, 2026

Remove unused imports

48c0fac

Signed-off-by: David Gardner <dagardner@nvidia.com>

dagardner-nv marked this pull request as ready for review May 4, 2026 18:56

dagardner-nv requested a review from a team as a code owner May 4, 2026 18:56

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

dagardner-nv marked this pull request as draft May 5, 2026 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling and documentation for the `cleanup` attribute of `EvalOutputConfig`#1907

Fix handling and documentation for the `cleanup` attribute of `EvalOutputConfig`#1907
dagardner-nv wants to merge 5 commits into
NVIDIA:developfrom
dagardner-nv:david-eval-docs-17

dagardner-nv commented May 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dagardner-nv commented May 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

By Submitting this PR I confirm:

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dagardner-nv commented May 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading