Skip to content

[MISC] Constraint solver parallel linesearch optimization #2523

Draft
erizmr wants to merge 20 commits intoGenesis-Embodied-AI:mainfrom
erizmr:mingrui/260309/solver_opt_parallel_linesearch
Draft

[MISC] Constraint solver parallel linesearch optimization #2523
erizmr wants to merge 20 commits intoGenesis-Embodied-AI:mainfrom
erizmr:mingrui/260309/solver_opt_parallel_linesearch

Conversation

@erizmr
Copy link
Contributor

@erizmr erizmr commented Mar 9, 2026

Description

Related Issue

Resolves Genesis-Embodied-AI/Genesis#

Motivation and Context

How Has This Been / Can This Be Tested?

Screenshots (if appropriate):

Checklist:

  • I read the CONTRIBUTING document.
  • I followed the Submitting Code Changes section of CONTRIBUTING document.
  • I tagged the title correctly (including BUG FIX/FEATURE/MISC/BREAKING)
  • I updated the documentation accordingly or no change is needed.
  • I tested my changes and added instructions on how to test it for reviewers.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@erizmr erizmr marked this pull request as ready for review March 10, 2026 00:18
@erizmr erizmr force-pushed the mingrui/260309/solver_opt_parallel_linesearch branch from f3b60ed to 9ad02e9 Compare March 11, 2026 17:45
erizmr and others added 4 commits March 13, 2026 14:31
Replace the sequential per-env linesearch (_kernel_linesearch) with a
parallel linesearch pipeline using 6 specialized kernels:
- _kernel_parallel_linesearch_mv: mv = M @ search, ndrange(dof, env)
- _kernel_parallel_linesearch_jv: jv = J @ search, ndrange(constraint, env)
- _kernel_parallel_linesearch_p0: fused snorm/quad_gauss/eq_sum/p0_cost with shared memory reductions
- _kernel_parallel_linesearch_eval: K=16 log-spaced candidates evaluated in parallel with shared memory argmin
- _kernel_parallel_linesearch_apply_alpha_dofs: apply best alpha to qacc/Ma
- _kernel_parallel_linesearch_apply_alpha_constraints: apply best alpha to Jaref

Also includes decomposed update_constraint (3 kernels) in the iteration loop.

Additional changes:
- Add dofs_info to func_solve_body dispatch signature
- Add _log_scale helper function to solver.py
- Exclude requires_grad from decomposed path (parallel LS is sensitive to FP precision)
- Update test_grad.py to pass dofs_info

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@erizmr erizmr force-pushed the mingrui/260309/solver_opt_parallel_linesearch branch from e0415a2 to 789ca2b Compare March 13, 2026 14:50
@github-actions
Copy link

🔴 Benchmark Regression Detected ➡️ Report

@erizmr
Copy link
Contributor Author

erizmr commented Mar 13, 2026

🔴 Benchmark Regression Detected ➡️ Report

perf dispatch re-benchmarking induces this, updated to re-run

@hughperkins
Copy link
Collaborator

🔴 Benchmark Regression Detected ➡️ Report

perf dispatch re-benchmarking induces this, updated to re-run

I'd like to see some simple standalone reproduction of how you feel perf dispatch repeat induces high latency, ideally.

@hughperkins
Copy link
Collaborator

🔴 Benchmark Regression Detected ➡️ Report

perf dispatch re-benchmarking induces this, updated to re-run

I'd like to see some simple standalone reproduction of how you feel perf dispatch repeat induces high latency, ideally.

This would have a few benefits:

  • easier to discuss and reason about
  • easier to fix (if it is a perf dispatch bug)

@github-actions
Copy link

🔴 Benchmark Regression Detected ➡️ Report

@hughperkins hughperkins marked this pull request as draft March 19, 2026 17:45
@github-actions
Copy link

🔴 Benchmark Regression Detected ➡️ Report

@github-actions
Copy link

🔴 Benchmark Regression Detected ➡️ Report

@github-actions
Copy link

🔴 Benchmark Regression Detected ➡️ Report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants