fix infinite error in eval mode#618
Merged
Merged
Conversation
TATP-233
approved these changes
Jun 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary(概述)
修复在 headless(无显示器)机器上运行
eval --render-mode record(MuJoCo)时的无限刷错误日志 + 卡死问题 —— issue #605 。_resolve_gl_backend会回退到glfw,而 glfw 需要 X11 显示。没有DISPLAY时,每个渲染 worker 的mujoco.Renderer()都会抛mujoco.FatalError: an OpenGL platform library has not been loaded。multiprocessing.Pool,当 worker 的initializer抛异常时它会静默地无限重启 worker —— 于是同一段 traceback 反复打印、pool.map永不返回(无限日志 + 卡死)。具体改动
_resolve_gl_backend:headless(无DISPLAY)且无 EGL 时,改为回退到 OSMesa(软件渲染),不再用glfw。macOS 行为不变(仍是glfw)。multiprocessing.Pool→concurrent.futures.ProcessPoolExecutor,worker 失败时快速失败抛BrokenProcessPool,不再无限重启。render_backend_usable()+_warn_render_unavailable();若没有可用后端(或 worker 死亡),只打印一条警告并返回[]。run_mujoco_playback/render_states_to_video随后跳过写视频,而不是在空帧上崩溃。Linked Work(关联工作)
Validation(验证)
make checkuv run pytest -m "not slow"(通过make test-all)实际执行的命令:
额外验证:
MUJOCO_GL=glfw(无DISPLAY)忠实复现出报告中一字不差的错误:mujoco.FatalError: an OpenGL platform library has not been loaded。multiprocessing.Pool在几秒内重启数百个 worker 并卡死(rc=124);新的ProcessPoolExecutor约 0.05s 抛出BrokenProcessPool。render_states_to_video管线 —— 全部降级为「一条警告 + 跳过视频 + exit 0」,没有 traceback 刷屏。Impact(影响面)
mujocoArtifacts(产物)
Checklist(检查项)