Skip to content

test(sleep): add verifier-discipline stress test (closes #67)#87

Open
Tanmay9223 wants to merge 2 commits into
microsoft:mainfrom
Tanmay9223:test-verifier-discipline
Open

test(sleep): add verifier-discipline stress test (closes #67)#87
Tanmay9223 wants to merge 2 commits into
microsoft:mainfrom
Tanmay9223:test-verifier-discipline

Conversation

@Tanmay9223

Copy link
Copy Markdown

🎯 What: Adds a verifier-discipline stress test to ensure the validation gate correctly rejects reward-hacking skill edits.
💡 Why: To prevent regressions where the optimizer proposes a rule that looks attractive on train/replay tasks (e.g., gaming the evaluator by forcing literal tokens) but degrades grounded behavior on held-out tasks. This turns the failure mode identified in related discussions into a repeatable regression test.
Verification: Verified locally with python3 -m unittest tests/test_sleep_engine.py, which now executes the new test_gate_rejects_reward_hacking_edit and confirms the candidate is rejected.
Result: SkillOpt-Sleep now has an explicit invariant test ensuring reward-hacking edits are caught and rejected by the validation gate.

Yif-Yang and others added 2 commits June 24, 2026 23:44
Add missing configuration setup in scripts/eval_only.py to properly
support the minimax_chat backend, which was entirely omitted.

Fix the following coverage gaps in eval_only.py:
- Add minimax CLI arguments
- Include the minimax config mappings in _MAP
- Update the backend parsing logic
- Call configure_minimax_chat
Add a regression test to ensure the validation gate correctly rejects
reward-hacking skill edits. It has been observed that optimizers
sometimes propose shortcuts that improve train/replay metrics but fail
to improve held-out behavior. This test codifies that the gate blocks
such artifacts.

Add TestVerifierDiscipline to the test_sleep_engine.py suite:
- Create MockRewardHackingBackend that simulates a reward-hacking rule
  which passes the train set but degrades the held-out tasks.
- Assert that the proposed edit is rejected by the gate.
@Yif-Yang

Yif-Yang commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Thanks @Tanmay9223 — the reward-hacking gate test is a genuinely useful invariant to lock in. 🙏

A few things to sort out before this can land, now that #85 has merged:

1. Rebase needed — scripts/eval_only.py now conflicts. #85 just landed backend wiring on main that touches the same import block and configure_* section this PR edits. A git rebase origin/main (or merge) will be needed. Important detail so you don't drop the wrong hunks: #85 only added the qwen_chat wiring — it did not add any MiniMax code to eval_only.py (no import, no configure_minimax_chat call, no routing), despite #85's title mentioning minimax. So on rebase:

  • The qwen hunks here will conflict and can be dropped (already on main via fix(eval-only): call configure_qwen_chat so local LLM endpoints can be used #85).
  • The MiniMax hunks — the import, the --backend choices entry, the elif backend in {"minimax", "minimax_chat"} routing branch, the model-sentinel rewrite, and the configure_minimax_chat(...) call — are still needed; please keep them. main's eval_only.py is currently missing all of those, so your PR is what closes that gap.

2. The new test class is defined after unittest.main(). TestVerifierDiscipline sits below the if __name__ == "__main__": unittest.main() block (around line 1009), so a direct python tests/test_sleep_engine.py run exits before the class is even defined and silently skips it. pytest and python -m unittest still discover it (which is why it passes in CI), but please move the class above that __main__ block so the file-run path covers it too.

3. Heads-up on the bundled skillopt/utils/json_utils.py change. This PR also adds a _looks_json_like guard to the shared JSON extractor, which isn't mentioned in the title/description. I checked it and it's not a regression — inputs like {op: "add", ...} already returned None on main — so I'm fine keeping it. But since json_utils is shared by all backends' optimizer path, a one-line note in the PR description explaining the guard would help future git blame.

4. Overlap with #86. This PR is effectively a superset of #86 — I've suggested we close #86 in favor of this one.

Once it's rebased and the test class is moved, I'm happy to merge with a merge-commit so your authorship is preserved. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants