test: 重构前行为锁安全网（reward / reset / backend / config） by Michael-Jetson · Pull Request #575 · unilabsim/UniLab

Michael-Jetson · 2026-06-03T09:13:37Z

Summary

背景：UniLab 即将进入一轮分阶段架构重构（reward 提纯函数、DR provider 收敛、off-policy runner 拆分、SimBackend 接口隔离等）。这些重构改的是结构、不该改行为。但当前这些高风险区域缺少能"在行为悄悄变坏时确实变红"的回归网——顶层冒烟训练只能证"能跑完"，证不了 reward 分量、DR reset payload、backend 契约、owner-YAML 路由是否漂移。

为什么这样做：采用 characterization-first（Feathers, Working Effectively with Legacy Code）——重构之前先把"当前代码实际做什么"钉成 golden/契约（明确只锁行为不变，不主张行为正确）。这样每步重构都能被证明是行为保持的；红了就是行为变了，去查，而不是盲目重生成 fixture。

改了什么（纯新增测试，零 src/ 改动）：

reward golden：10 步全轨迹 + component 精确名集合，锁 2200 行 locomotion reward dispatch（Go1/Go2/Go2W joystick flat+rough），5 个 fixture
reward dispatch 单测：scale×ctrl_dt、only_positive 钳位、log cadence、log 值是 pre-ctrl_dt、zero-scale 短路（spy 证明 fn 未被调用）、off-cadence log carry-forward
DR reset-plan golden：全 qpos(spawn+orient+joints)/qvel/DR payload + terrain-curriculum 副作用（record_episode_start：Go1/Go2 记录、Go2W 不记录——返回的 ResetPlan 看不见此副作用），3 个 fixture
SimBackend 契约 baseline：abstract/optional/concrete public 方法名集合（拆 mixin 前快照）
owner-YAML 契约：全量 ppo/appo/offpolicy 逐文件扫描"无顶层 orphan algorithm:" + CLI 拒 training.sim_backend 作路由 override；唯一已知 APPO go2/motrix orphan 用 strict xfail 记录（修复后 XPASS → suite 红 → 强制清理）

关键工程点：

确定性：rough-terrain golden 强制 terrain_curriculum.seed——默认 None → np.random.default_rng(None) 每进程随机选 spawn tile（独立于 apply_training_seed）。固定后连跑 8× = 0 偶发。
mutation 演练验证安全网：往生产代码注入每个新锁该防的回归（log 移到 post-ctrl_dt / 删 zero-scale 短路 / 每步清 log / 删 reset 副作用），确认测试变红再 git 还原 = 4/4 变红。证明这些锁真能抓回归，不是假信心。
去重：全量 owner-YAML compose matrix 已由现有 tests/config/test_config_system.py::test_supported_task_composes 覆盖；本 PR 不重复，只加它没有的（orphan 逐文件扫描 + CLI guard）。

user-facing / training 影响：无。纯测试新增，不改任何 src/。

Linked Work

Issue: 无（重构前置安全网基础设施；后续各 Phase PR 会各自关联对应 issue）
Milestone: 架构重构（前置）

Validation

make check（ruff format + ruff check --fix + mypy src/ + pyright）— green
uv run pytest -m "not slow" — 1257 passed, 12 skipped, 1 xfailed
Additional task-specific validation listed below

Commands actually run:

make check          # green：mypy "no issues in 211 source files"；pyright "0 errors"
make test           # 1257 passed, 12 skipped, 1 xfailed (non-slow, 无 coverage)
# 确定性：reward + reset golden 连跑 8× = 0 偶发
# mutation 演练：注入 4 类回归 → 确认对应测试变红 → git 还原 = 4/4

已知 flaky（与本 PR 正交、预存）：tests/ipc/test_async_runner.py::test_start_collector_does_not_merge_runner_runtime_fields 与 tests/ipc/test_replay_buffer.py::test_multiprocess_add_then_sample 在 make test-cov（带 coverage）下偶发红。对照实验：stash 本 PR 文件后 baseline make test-cov = 1161 passed；含本 PR make test-cov = 1257 passed（同命令复现 = 过）。证明是 coverage 拖慢下 multiprocess 时序 flaky，与本 PR 测试内容无关。make test（无 coverage）下两者均稳定通过。

Impact

Backend impact: none（纯测试；覆盖 mujoco golden + ppo/appo/offpolicy owner-YAML 契约，不改运行时）
Platform impact: both（纯 pytest，无平台特定代码）
Training effect expected: no（test-only，零 src/ 改动）

Artifacts

无（test-only，无 W&B / benchmark / checkpoint）

Checklist

Added or updated tests where needed（本 PR 即新增测试；fixture + 测试 + 生成器同 PR）
Updated docs if behavior or workflow changed（N/A——无行为 / workflow 改动）
Linked the driving issue（N/A——重构前置基础设施，无单一驱动 issue）
Noted any follow-up work explicitly（见下）

后续（各自独立 PR）：

Phase 1 修 APPO go2/motrix orphan YAML（届时 strict xfail 转 XPASS → 强制删标记）
Phase 3 / 6 / 7 补 mixin exactly-one、backend-leak / hot-path-asset AST ratchet
Phase 4.5 reset 消费链 characterization（DomainRandomizationManager.reset → backend.set_state → obs/info）
上述已知 flaky IPC multiprocess 测试可单独跟踪（放宽超时 / retry / 标记）

Characterization-first net locking current behavior before the planned architecture refactor, so structural changes can be proven behavior-preserving (it pins what the code does now; it does not assert correctness). What it locks: - reward golden: 10-step trajectory + exact component key-set over the 2200-line locomotion reward dispatch (Go1/Go2/Go2W joystick flat+rough), 5 fixtures - reward dispatch unit: scale/ctrl_dt reduction, only_positive clamp, log cadence, log value is pre-ctrl_dt, zero-scale short-circuit (spy proves fn not called), off-cadence log carry-forward - DR reset-plan golden: full qpos(spawn+orient+joints)/qvel/DR payload + the terrain-curriculum side effect (record_episode_start: Go1/Go2 record, Go2W does not) which the returned ResetPlan cannot show, 3 fixtures - SimBackend contract baseline: abstract/optional/concrete public-method name sets - owner-YAML contract: no orphan top-level `algorithm:` (full ppo/appo/offpolicy file scan) + CLI rejects training.sim_backend as a route override; the one known APPO go2/motrix orphan is a strict xfail (flips the suite red once fixed) Determinism: rough-terrain golden forces terrain_curriculum.seed (default None -> np.random.default_rng(None) picks a random spawn tile per process, independent of apply_training_seed); 8x repeat = 0 flakes. New locks validated by mutation drills (inject the regression, confirm the test goes red): 4/4. All not-slow (run inside make test-all). Fixtures + tests + generators in one commit. Validation: make check green; make test-cov 1257 passed, 1 xfailed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Michael-Jetson requested a review from TATP-233 as a code owner June 3, 2026 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: 重构前行为锁安全网（reward / reset / backend / config）#575

test: 重构前行为锁安全网（reward / reset / backend / config）#575
Michael-Jetson wants to merge 1 commit into
unilabsim:mainfrom
Michael-Jetson:refactor/test-safety-net

Michael-Jetson commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Michael-Jetson commented Jun 3, 2026

Summary

Linked Work

Validation

Impact

Artifacts

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant