test: 重构前行为锁安全网(reward / reset / backend / config)#575
Open
Michael-Jetson wants to merge 1 commit into
Open
Conversation
Characterization-first net locking current behavior before the planned architecture refactor, so structural changes can be proven behavior-preserving (it pins what the code does now; it does not assert correctness). What it locks: - reward golden: 10-step trajectory + exact component key-set over the 2200-line locomotion reward dispatch (Go1/Go2/Go2W joystick flat+rough), 5 fixtures - reward dispatch unit: scale/ctrl_dt reduction, only_positive clamp, log cadence, log value is pre-ctrl_dt, zero-scale short-circuit (spy proves fn not called), off-cadence log carry-forward - DR reset-plan golden: full qpos(spawn+orient+joints)/qvel/DR payload + the terrain-curriculum side effect (record_episode_start: Go1/Go2 record, Go2W does not) which the returned ResetPlan cannot show, 3 fixtures - SimBackend contract baseline: abstract/optional/concrete public-method name sets - owner-YAML contract: no orphan top-level `algorithm:` (full ppo/appo/offpolicy file scan) + CLI rejects training.sim_backend as a route override; the one known APPO go2/motrix orphan is a strict xfail (flips the suite red once fixed) Determinism: rough-terrain golden forces terrain_curriculum.seed (default None -> np.random.default_rng(None) picks a random spawn tile per process, independent of apply_training_seed); 8x repeat = 0 flakes. New locks validated by mutation drills (inject the regression, confirm the test goes red): 4/4. All not-slow (run inside make test-all). Fixtures + tests + generators in one commit. Validation: make check green; make test-cov 1257 passed, 1 xfailed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
背景:UniLab 即将进入一轮分阶段架构重构(reward 提纯函数、DR provider 收敛、off-policy runner 拆分、SimBackend 接口隔离等)。这些重构改的是结构、不该改行为。但当前这些高风险区域缺少能"在行为悄悄变坏时确实变红"的回归网——顶层冒烟训练只能证"能跑完",证不了 reward 分量、DR reset payload、backend 契约、owner-YAML 路由是否漂移。
为什么这样做:采用 characterization-first(Feathers, Working Effectively with Legacy Code)——重构之前先把"当前代码实际做什么"钉成 golden/契约(明确只锁行为不变,不主张行为正确)。这样每步重构都能被证明是行为保持的;红了就是行为变了,去查,而不是盲目重生成 fixture。
改了什么(纯新增测试,零
src/改动):only_positive钳位、log cadence、log 值是 pre-ctrl_dt、zero-scale 短路(spy 证明 fn 未被调用)、off-cadence log carry-forwardrecord_episode_start:Go1/Go2 记录、Go2W 不记录——返回的ResetPlan看不见此副作用),3 个 fixturealgorithm:" + CLI 拒training.sim_backend作路由 override;唯一已知 APPO go2/motrix orphan 用 strict xfail 记录(修复后 XPASS → suite 红 → 强制清理)关键工程点:
terrain_curriculum.seed——默认None→np.random.default_rng(None)每进程随机选 spawn tile(独立于apply_training_seed)。固定后连跑 8× = 0 偶发。git还原 = 4/4 变红。证明这些锁真能抓回归,不是假信心。tests/config/test_config_system.py::test_supported_task_composes覆盖;本 PR 不重复,只加它没有的(orphan 逐文件扫描 + CLI guard)。user-facing / training 影响:无。纯测试新增,不改任何
src/。Linked Work
Validation
make check(ruff format + ruff check --fix + mypysrc/+ pyright)— greenuv run pytest -m "not slow"— 1257 passed, 12 skipped, 1 xfailedCommands actually run:
已知 flaky(与本 PR 正交、预存):
tests/ipc/test_async_runner.py::test_start_collector_does_not_merge_runner_runtime_fields与tests/ipc/test_replay_buffer.py::test_multiprocess_add_then_sample在make test-cov(带 coverage)下偶发红。对照实验:stash 本 PR 文件后 baselinemake test-cov= 1161 passed;含本 PRmake test-cov= 1257 passed(同命令复现 = 过)。证明是 coverage 拖慢下 multiprocess 时序 flaky,与本 PR 测试内容无关。make test(无 coverage)下两者均稳定通过。Impact
src/改动)Artifacts
Checklist
后续(各自独立 PR):
DomainRandomizationManager.reset → backend.set_state → obs/info)