Skip to content

test: 重构前行为锁安全网(reward / reset / backend / config)#575

Open
Michael-Jetson wants to merge 1 commit into
unilabsim:mainfrom
Michael-Jetson:refactor/test-safety-net
Open

test: 重构前行为锁安全网(reward / reset / backend / config)#575
Michael-Jetson wants to merge 1 commit into
unilabsim:mainfrom
Michael-Jetson:refactor/test-safety-net

Conversation

@Michael-Jetson

Copy link
Copy Markdown

Summary

背景:UniLab 即将进入一轮分阶段架构重构(reward 提纯函数、DR provider 收敛、off-policy runner 拆分、SimBackend 接口隔离等)。这些重构改的是结构、不该改行为。但当前这些高风险区域缺少能"在行为悄悄变坏时确实变红"的回归网——顶层冒烟训练只能证"能跑完",证不了 reward 分量、DR reset payload、backend 契约、owner-YAML 路由是否漂移。

为什么这样做:采用 characterization-first(Feathers, Working Effectively with Legacy Code)——重构之前先把"当前代码实际做什么"钉成 golden/契约(明确只锁行为不变,主张行为正确)。这样每步重构都能被证明是行为保持的;红了就是行为变了,去查,而不是盲目重生成 fixture。

改了什么(纯新增测试,零 src/ 改动)

  • reward golden:10 步全轨迹 + component 精确名集合,锁 2200 行 locomotion reward dispatch(Go1/Go2/Go2W joystick flat+rough),5 个 fixture
  • reward dispatch 单测:scale×ctrl_dt、only_positive 钳位、log cadence、log 值是 pre-ctrl_dtzero-scale 短路(spy 证明 fn 未被调用)、off-cadence log carry-forward
  • DR reset-plan golden:全 qpos(spawn+orient+joints)/qvel/DR payload + terrain-curriculum 副作用record_episode_start:Go1/Go2 记录、Go2W 不记录——返回的 ResetPlan 看不见此副作用),3 个 fixture
  • SimBackend 契约 baseline:abstract/optional/concrete public 方法名集合(拆 mixin 前快照)
  • owner-YAML 契约:全量 ppo/appo/offpolicy 逐文件扫描"无顶层 orphan algorithm:" + CLI 拒 training.sim_backend 作路由 override;唯一已知 APPO go2/motrix orphan 用 strict xfail 记录(修复后 XPASS → suite 红 → 强制清理)

关键工程点

  • 确定性:rough-terrain golden 强制 terrain_curriculum.seed——默认 Nonenp.random.default_rng(None) 每进程随机选 spawn tile(独立于 apply_training_seed)。固定后连跑 8× = 0 偶发。
  • mutation 演练验证安全网:往生产代码注入每个新锁该防的回归(log 移到 post-ctrl_dt / 删 zero-scale 短路 / 每步清 log / 删 reset 副作用),确认测试变红再 git 还原 = 4/4 变红。证明这些锁真能抓回归,不是假信心。
  • 去重:全量 owner-YAML compose matrix 已由现有 tests/config/test_config_system.py::test_supported_task_composes 覆盖;本 PR 不重复,只加它没有的(orphan 逐文件扫描 + CLI guard)。

user-facing / training 影响:无。纯测试新增,不改任何 src/

Linked Work

  • Issue: 无(重构前置安全网基础设施;后续各 Phase PR 会各自关联对应 issue)
  • Milestone: 架构重构(前置)

Validation

  • make check(ruff format + ruff check --fix + mypy src/ + pyright)— green
  • uv run pytest -m "not slow" — 1257 passed, 12 skipped, 1 xfailed
  • Additional task-specific validation listed below

Commands actually run:

make check          # green:mypy "no issues in 211 source files";pyright "0 errors"
make test           # 1257 passed, 12 skipped, 1 xfailed (non-slow, 无 coverage)
# 确定性:reward + reset golden 连跑 8× = 0 偶发
# mutation 演练:注入 4 类回归 → 确认对应测试变红 → git 还原 = 4/4

已知 flaky(与本 PR 正交、预存)tests/ipc/test_async_runner.py::test_start_collector_does_not_merge_runner_runtime_fieldstests/ipc/test_replay_buffer.py::test_multiprocess_add_then_samplemake test-cov(带 coverage)下偶发红。对照实验:stash 本 PR 文件后 baseline make test-cov = 1161 passed;含本 PR make test-cov = 1257 passed(同命令复现 = 过)。证明是 coverage 拖慢下 multiprocess 时序 flaky,与本 PR 测试内容无关make test(无 coverage)下两者均稳定通过。

Impact

  • Backend impact: none(纯测试;覆盖 mujoco golden + ppo/appo/offpolicy owner-YAML 契约,不改运行时)
  • Platform impact: both(纯 pytest,无平台特定代码)
  • Training effect expected: no(test-only,零 src/ 改动)

Artifacts

  • 无(test-only,无 W&B / benchmark / checkpoint)

Checklist

  • Added or updated tests where needed(本 PR 即新增测试;fixture + 测试 + 生成器同 PR)
  • Updated docs if behavior or workflow changed(N/A——无行为 / workflow 改动)
  • Linked the driving issue(N/A——重构前置基础设施,无单一驱动 issue)
  • Noted any follow-up work explicitly(见下)

后续(各自独立 PR)

  • Phase 1 修 APPO go2/motrix orphan YAML(届时 strict xfail 转 XPASS → 强制删标记)
  • Phase 3 / 6 / 7 补 mixin exactly-one、backend-leak / hot-path-asset AST ratchet
  • Phase 4.5 reset 消费链 characterization(DomainRandomizationManager.reset → backend.set_state → obs/info
  • 上述已知 flaky IPC multiprocess 测试可单独跟踪(放宽超时 / retry / 标记)

Characterization-first net locking current behavior before the planned
architecture refactor, so structural changes can be proven behavior-preserving
(it pins what the code does now; it does not assert correctness).

What it locks:
- reward golden: 10-step trajectory + exact component key-set over the 2200-line
  locomotion reward dispatch (Go1/Go2/Go2W joystick flat+rough), 5 fixtures
- reward dispatch unit: scale/ctrl_dt reduction, only_positive clamp, log cadence,
  log value is pre-ctrl_dt, zero-scale short-circuit (spy proves fn not called),
  off-cadence log carry-forward
- DR reset-plan golden: full qpos(spawn+orient+joints)/qvel/DR payload + the
  terrain-curriculum side effect (record_episode_start: Go1/Go2 record, Go2W does
  not) which the returned ResetPlan cannot show, 3 fixtures
- SimBackend contract baseline: abstract/optional/concrete public-method name sets
- owner-YAML contract: no orphan top-level `algorithm:` (full ppo/appo/offpolicy
  file scan) + CLI rejects training.sim_backend as a route override; the one known
  APPO go2/motrix orphan is a strict xfail (flips the suite red once fixed)

Determinism: rough-terrain golden forces terrain_curriculum.seed (default None ->
np.random.default_rng(None) picks a random spawn tile per process, independent of
apply_training_seed); 8x repeat = 0 flakes. New locks validated by mutation drills
(inject the regression, confirm the test goes red): 4/4.

All not-slow (run inside make test-all). Fixtures + tests + generators in one commit.
Validation: make check green; make test-cov 1257 passed, 1 xfailed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Michael-Jetson Michael-Jetson requested a review from TATP-233 as a code owner June 3, 2026 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant