[MISC] Constraint solver parallel linesearch optimization #2523
[MISC] Constraint solver parallel linesearch optimization #2523erizmr wants to merge 20 commits intoGenesis-Embodied-AI:mainfrom
Conversation
f3b60ed to
9ad02e9
Compare
Replace the sequential per-env linesearch (_kernel_linesearch) with a parallel linesearch pipeline using 6 specialized kernels: - _kernel_parallel_linesearch_mv: mv = M @ search, ndrange(dof, env) - _kernel_parallel_linesearch_jv: jv = J @ search, ndrange(constraint, env) - _kernel_parallel_linesearch_p0: fused snorm/quad_gauss/eq_sum/p0_cost with shared memory reductions - _kernel_parallel_linesearch_eval: K=16 log-spaced candidates evaluated in parallel with shared memory argmin - _kernel_parallel_linesearch_apply_alpha_dofs: apply best alpha to qacc/Ma - _kernel_parallel_linesearch_apply_alpha_constraints: apply best alpha to Jaref Also includes decomposed update_constraint (3 kernels) in the iteration loop. Additional changes: - Add dofs_info to func_solve_body dispatch signature - Add _log_scale helper function to solver.py - Exclude requires_grad from decomposed path (parallel LS is sensitive to FP precision) - Update test_grad.py to pass dofs_info Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
e0415a2 to
789ca2b
Compare
|
🔴 Benchmark Regression Detected ➡️ Report |
perf dispatch re-benchmarking induces this, updated to re-run |
I'd like to see some simple standalone reproduction of how you feel perf dispatch repeat induces high latency, ideally. |
This would have a few benefits:
|
|
🔴 Benchmark Regression Detected ➡️ Report |
|
🔴 Benchmark Regression Detected ➡️ Report |
|
🔴 Benchmark Regression Detected ➡️ Report |
|
🔴 Benchmark Regression Detected ➡️ Report |
Description
Related Issue
Resolves Genesis-Embodied-AI/Genesis#
Motivation and Context
How Has This Been / Can This Be Tested?
Screenshots (if appropriate):
Checklist:
Submitting Code Changessection of CONTRIBUTING document.