[Bug]: `loss.backward()` hangs indefinitely for articulated robots with freejoint + child joints

### Bug Description

`loss.backward()` hangs indefinitely for any articulated robot where a free-floating root body (`freejoint`) has child joints (revolute, prismatic, etc.). The backward pass never returns — it blocks inside the ABD backward kernel.

**Single free-floating bodies (freejoint only, no children) work correctly.** Single fixed-base joints (hinge/slide with no parent freejoint) also work. The hang is triggered exclusively by kinematic trees of depth ≥ 2 with a free root.

---

### Steps to Reproduce

#### Working case (single free body — for reference)

```python
import os, tempfile
import genesis as gs
import torch

MJCF_FREE_BODY = """
<mujoco model="free_body">
  <worldbody>
    <body name="chassis" pos="0 0 0">
      <freejoint name="root"/>
      <inertial mass="1.0" pos="0 0 0" diaginertia="0.1 0.1 0.1"/>
      <geom type="box" size="0.1 0.1 0.1" contype="0" conaffinity="0"/>
    </body>
  </worldbody>
</mujoco>
"""

gs.init(backend=gs.gpu, logging_level="warning")
fd, path = tempfile.mkstemp(suffix=".xml")
with os.fdopen(fd, "w") as f:
    f.write(MJCF_FREE_BODY)

scene = gs.Scene(
    sim_options=gs.options.SimOptions(dt=0.01, gravity=(0, 0, 0), requires_grad=True),
    rigid_options=gs.options.RigidOptions(enable_collision=False),
    show_viewer=False,
)
robot = scene.add_entity(gs.morphs.MJCF(file=path))
scene.build()

# NOTE: must use gs.tensor (not torch.tensor) for gradient to flow
ctrl = gs.tensor([0.1, 0.0, 0.0, 0.0, 0.0, 0.0], requires_grad=True)
target = torch.tensor([0.05, 0.0, 0.0], device=gs.device)

scene.reset()
for _ in range(5):
    robot.set_dofs_velocity(ctrl)
    scene.step()

# NOTE: must use robot.get_state().pos (not get_pos() or get_links_pos())
# get_state() registers the state in _queried_states so backward can seed gradients
state = robot.get_state()
loss = torch.nn.functional.mse_loss(state.pos.squeeze(), target)
loss.backward()  # completes in ~11s (JIT), ctrl.grad is non-zero ✓
print(f"ctrl.grad = {ctrl.grad}")
```

#### Hanging case (freejoint + one child hinge — minimal repro)

```python
import os, tempfile
import genesis as gs
import torch

MJCF_ARTICULATED = """
<mujoco model="free_plus_hinge">
  <worldbody>
    <body name="chassis" pos="0 0 0">
      <freejoint name="root"/>
      <inertial mass="1.0" pos="0 0 0" diaginertia="0.1 0.1 0.1"/>
      <geom type="box" size="0.1 0.1 0.1" contype="0" conaffinity="0"/>
      <body name="wheel" pos="0.2 0 0">
        <joint name="hinge_y" type="hinge" axis="0 1 0"/>
        <inertial mass="0.5" pos="0 0 0" diaginertia="0.05 0.05 0.05"/>
        <geom type="cylinder" fromto="0 -0.05 0 0 0.05 0" size="0.1"
              contype="0" conaffinity="0"/>
      </body>
    </body>
  </worldbody>
</mujoco>
"""

gs.init(backend=gs.gpu, logging_level="warning")
fd, path = tempfile.mkstemp(suffix=".xml")
with os.fdopen(fd, "w") as f:
    f.write(MJCF_ARTICULATED)

scene = gs.Scene(
    sim_options=gs.options.SimOptions(dt=0.01, gravity=(0, 0, 0), requires_grad=True),
    rigid_options=gs.options.RigidOptions(enable_collision=False),
    show_viewer=False,
)
robot = scene.add_entity(gs.morphs.MJCF(file=path))
scene.build()

ctrl = gs.tensor([0.0] * 7, requires_grad=True)  # 6 free DOFs + 1 hinge
target = torch.tensor([0.05, 0.0, 0.0], device=gs.device)

scene.reset()
for _ in range(5):
    robot.set_dofs_velocity(ctrl)
    scene.step()

state = robot.get_state()
loss = torch.nn.functional.mse_loss(state.pos.squeeze(), target)
loss.backward()  # <-- hangs indefinitely
print("Never reached")
```

Same hang occurs with `slide` (prismatic) child joints and with 3+ child joints.
Replacing the hinge with a second `freejoint` (separate free body, no parent-child
relationship) does not hang.

---

### Expected Behavior

`loss.backward()` completes and `ctrl.grad` is populated with the gradient of the
loss w.r.t. the control velocities, as it does for the single free-body case.

---

### Environment

| | |
|---|---|
| OS | Arch Linux (kernel 6.18.9) |
| GPU | NVIDIA RTX A500 Laptop GPU |
| GPU driver | 590.48.01 |
| CUDA | 12.8 |
| PyTorch | 2.9.1+cu128 |
| Python | 3.12.12 |

---

### Release versions tested

Tested on **v0.3.8**, **v0.3.9**, and **v0.4.1** — all hang for the articulated case.

---

### Additional Context

#### What works vs. what hangs

| Configuration | Backward |
|---|---|
| Single freejoint (free-floating body, no children) | ✅ completes (~11s JIT) |
| Single fixed-base hinge (no parent freejoint) | ✅ completes (~4s JIT) |
| Single fixed-base slide / prismatic | ✅ completes |
| freejoint root + one hinge child | ❌ hangs |
| freejoint root + one slide child | ❌ hangs |
| freejoint root + three hinge children (e.g. wheeled robot) | ❌ hangs |

The hang is unaffected by: `enable_collision`, `disable_constraint`, `gravity`,
number of simulation steps, or joint type of the child.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: `loss.backward()` hangs indefinitely for articulated robots with freejoint + child joints #2537

Bug Description

Steps to Reproduce

Working case (single free body — for reference)

Hanging case (freejoint + one child hinge — minimal repro)

Expected Behavior

Environment

Release versions tested

Additional Context

What works vs. what hangs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development


OS	Arch Linux (kernel 6.18.9)
GPU	NVIDIA RTX A500 Laptop GPU
GPU driver	590.48.01
CUDA	12.8
PyTorch	2.9.1+cu128
Python	3.12.12

Configuration	Backward
Single freejoint (free-floating body, no children)	✅ completes (~11s JIT)
Single fixed-base hinge (no parent freejoint)	✅ completes (~4s JIT)
Single fixed-base slide / prismatic	✅ completes
freejoint root + one hinge child	❌ hangs
freejoint root + one slide child	❌ hangs
freejoint root + three hinge children (e.g. wheeled robot)	❌ hangs

[Bug]: loss.backward() hangs indefinitely for articulated robots with freejoint + child joints #2537

Description

Bug Description

Steps to Reproduce

Working case (single free body — for reference)

Hanging case (freejoint + one child hinge — minimal repro)

Expected Behavior

Environment

Release versions tested

Additional Context

What works vs. what hangs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: `loss.backward()` hangs indefinitely for articulated robots with freejoint + child joints #2537