Skip to content

Fix handling and documentation for the cleanup attribute of EvalOutputConfig#1907

Draft
dagardner-nv wants to merge 5 commits into
NVIDIA:developfrom
dagardner-nv:david-eval-docs-17
Draft

Fix handling and documentation for the cleanup attribute of EvalOutputConfig#1907
dagardner-nv wants to merge 5 commits into
NVIDIA:developfrom
dagardner-nv:david-eval-docs-17

Conversation

@dagardner-nv
Copy link
Copy Markdown
Contributor

@dagardner-nv dagardner-nv commented May 4, 2026

Description

  • Fix documentation issue which incorrectly states the default value for cleanup as false.

  • Update YAML snippets in documentation to reflect recent changes to the YAML files in the repo.

  • Update Ragas GitHub URLs and LLM list.

  • In situations where the workflows did not produce any output, catch this and raise an explicit exception, prevents a vague exception from being raised later.

  • Currently in EvalGeneralConfig we have two overlapping attributes:

    • output_dir - str with default value
    • output - EvalOutputConfig when not-None overrides output_dir
    • Output directory cleanup only occurs if output is defined (implied but not stated in documentation).
  • Update the model validator for EvalGeneralConfig to construct an instance of EvalOutputConfig(dir=output_dir) when output is undefined, ensuring the output directory cleanup is handled consistently.

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • Documentation

    • Updated evaluation documentation with installation prerequisites and refined configuration examples.
    • Clarified the timing semantics of output cleanup behavior.
  • Bug Fixes

    • Added validation to prevent profiling workflows from running with empty trajectory data.
  • Improvements

    • Enhanced output directory configuration handling.

Signed-off-by: David Gardner <dagardner@nvidia.com>
…voids a vague pandas error later

Signed-off-by: David Gardner <dagardner@nvidia.com>
Update judge LLM leadership board to match Ragas, update github url for Ragas

Update documentation to reflect that the default value for `cleanup` is true

Signed-off-by: David Gardner <dagardner@nvidia.com>
…nfig(dir=output_dir)

Signed-off-by: David Gardner <dagardner@nvidia.com>
@dagardner-nv dagardner-nv self-assigned this May 4, 2026
@dagardner-nv dagardner-nv added bug Something isn't working non-breaking Non-breaking change labels May 4, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

Walkthrough

Documentation for workflow evaluation was updated with revised examples, model names, and semantic clarification. The output configuration validator was modified to derive output from output_dir when missing. An empty-trajectory validation was added to the profiling runtime.

Changes

Output Configuration and Evaluation Semantics

Layer / File(s) Summary
Configuration Validator
packages/nvidia_nat_core/src/nat/data_models/evaluate_config.py
EvalGeneralConfig.override_output_dir now derives output from output_dir when output is absent, rather than overwriting output_dir from output.dir.
Documentation and Examples
docs/source/improve-workflows/evaluate.md
Installation step added for simple web query example; NIM and judge models updated to nvidia/nemotron-3-nano and nvidia/nemotron-3-nano-30b-a3b respectively; configuration examples revised to use output.dir with cleanup: true; cleanup semantics clarified to remove output directory before evaluation; Ragas leadership board and GitHub links updated.
Profiling Validation
packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/evaluate.py
profile_workflow() now validates that trajectories are non-empty before invoking ProfilerRunner, raising ValueError when none are found.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes to the codebase, which focus on fixing the handling and documentation of the cleanup attribute in EvalOutputConfig.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@dagardner-nv dagardner-nv changed the title Cleanup eval documentation Fix handling and documentation for the cleanup attribute of EvalGeneralConfig May 4, 2026
@dagardner-nv dagardner-nv changed the title Fix handling and documentation for the cleanup attribute of EvalGeneralConfig Fix handling and documentation for the cleanup attribute of EvalOutputConfig May 4, 2026
Signed-off-by: David Gardner <dagardner@nvidia.com>
@dagardner-nv dagardner-nv marked this pull request as ready for review May 4, 2026 18:56
@dagardner-nv dagardner-nv requested a review from a team as a code owner May 4, 2026 18:56
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/source/improve-workflows/evaluate.md`:
- Around line 278-289: The fenced code block containing the leaderboard list is
missing a language tag (MD040); update the block start from ``` to include a
language identifier (for example use ```text) so the block becomes ```text and
the closing ``` remains unchanged; locate the fenced block with the list items
(the lines starting with "1)- nvidia/Llama-3_3-Nemotron-Super-49B-v1" through
"10)- google/gemma-2-2b-it") and add the language tag to satisfy the linter.

In `@packages/nvidia_nat_core/src/nat/data_models/evaluate_config.py`:
- Around line 139-144: The validator that syncs output and output_dir currently
only sets values["output"] when "output" is missing, which causes config drift
when "output" is present but output_dir is expected downstream; update the
validator so it always ensures values["output_dir"] matches values["output"].dir
when values["output"] is present, and conversely sets values["output"] =
EvalOutputConfig(dir=output_dir) when only "output_dir" is present—i.e., after
reading values.get("output") and values.get("output_dir"), if output is present
set values["output_dir"]=output.dir, else if output_dir is present set
values["output"]=EvalOutputConfig(dir=output_dir).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 054382ca-1c80-4a5b-ae82-600b45d880ec

📥 Commits

Reviewing files that changed from the base of the PR and between 443b5c8 and 48c0fac.

📒 Files selected for processing (3)
  • docs/source/improve-workflows/evaluate.md
  • packages/nvidia_nat_core/src/nat/data_models/evaluate_config.py
  • packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/evaluate.py

Comment thread docs/source/improve-workflows/evaluate.md
Comment on lines +139 to +144
output_config = values.get("output")
if output_config is None:
output_dir = values.get("output_dir")
if output_dir is not None:
values["output"] = EvalOutputConfig(dir=output_dir)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

output no longer reliably overrides output_dir (config drift bug).

When output is present, this validator no longer syncs output_dir. But downstream code still reads output_dir (e.g., packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/builder.py Line 133 and packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/evaluate.py Lines 432 and 511), so writes can go to the default path instead of output.dir.

💡 Proposed fix
 `@model_validator`(mode="before")
 `@classmethod`
 def override_output_dir(cls, values):
     output_config = values.get("output")
-    if output_config is None:
-        output_dir = values.get("output_dir")
-        if output_dir is not None:
-            values["output"] = EvalOutputConfig(dir=output_dir)
+    output_dir = values.get("output_dir")
+
+    if output_config is None:
+        if output_dir is not None:
+            values["output"] = EvalOutputConfig(dir=output_dir)
+    else:
+        # Keep legacy/expected precedence: output.dir overrides output_dir
+        if isinstance(output_config, dict):
+            out_dir = output_config.get("dir")
+            if out_dir is not None:
+                values["output_dir"] = out_dir
+        elif isinstance(output_config, EvalOutputConfig):
+            values["output_dir"] = output_config.dir
 
     return values
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nvidia_nat_core/src/nat/data_models/evaluate_config.py` around lines
139 - 144, The validator that syncs output and output_dir currently only sets
values["output"] when "output" is missing, which causes config drift when
"output" is present but output_dir is expected downstream; update the validator
so it always ensures values["output_dir"] matches values["output"].dir when
values["output"] is present, and conversely sets values["output"] =
EvalOutputConfig(dir=output_dir) when only "output_dir" is present—i.e., after
reading values.get("output") and values.get("output_dir"), if output is present
set values["output_dir"]=output.dir, else if output_dir is present set
values["output"]=EvalOutputConfig(dir=output_dir).

@dagardner-nv dagardner-nv marked this pull request as draft May 5, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant