Refine dynamic mode skill eval#6389
Conversation
|
| Filename | Overview |
|---|---|
| skills/dali-dynamic-mode/evals/evals.json | Prompt and assertion improvements across all six evals: adds absolute path for file input, suppresses file-save behaviour, relaxes over-strict assertions, adds correct-import check to evals 2–4. |
| skills/dali-dynamic-mode/SKILL.md | Adds .torch(pad=True) guidance for variable-shape batches; updates code example and troubleshooting note to reference sync_cpu alongside sync_full. |
| skills/dali-dynamic-mode/scripts/requirements.txt | New file: pins runtime deps to nvidia-dali-cuda130 and torch without version constraints, which may affect reproducibility. |
| skills/dali-dynamic-mode/BENCHMARK.md | Updated with full Tier 3 agent evaluation results (claude-code and codex) dated 2026-06-08; replaces all "not available" placeholders with actual metrics. |
| skills/dali-dynamic-mode/skill-card.md | Adds evaluation agent list, task count, underlying signal descriptions, and results table; adds references to SKILL.md and BENCHMARK.md; minor wording and license-format updates. |
| skills/dali-dynamic-mode/skill.oms.sig | Signature refreshed to cover the newly added scripts/requirements.txt and all updated file digests. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Eval harness reads evals.json] --> B{Has input files?}
B -- Yes --> C[Stage files to /workspace/input/]
B -- No --> D[Run agent with prompt only]
C --> E[Run agent with absolute path in prompt]
D --> F{Agent output}
E --> F
F --> G[Assert: correct-import]
F --> H[Assert: correct API patterns]
F --> I[Assert: .torch / pad=True for variable shapes]
F --> J[Assert: no pipeline-mode constructs]
G & H & I & J --> K{All assertions pass?}
K -- Yes --> L[PASS]
K -- No --> M[FAIL]
Reviews (3): Last reviewed commit: "Attach NVSkills validation signatures" | Re-trigger Greptile
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
ac41a26 to
5107f33
Compare
|
/nvskills-ci |
Signed-off-by: nvskills-svc-account <svc-nvskills-signing@nvidia.com>
Category:
Other (e.g. Documentation, Tests, Configuration)
Description:
The current prompts in the evaluation for the dynamic mode skill are sometimes not specific enough, leading (but not limited) to the following issues:
Additional information:
Affected modules and functionalities:
Dynamic mode skill
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A