Hi @xbpeng, I have been trying simple domain randomization (DR) like Kp, Kd, COM, Mass and Joint friction for DeepMimic policy training, with 5-step robot state observation history and 1-step action history. The training of the policy seems to be very slow, w/o DR it used to take 8hrs for 8000 iterations, w/ DR it takes 8hrs for 800 iterations. And the returns after 800 iterations are also very low ( < 1). I want to test the sim2sim transfer performance of some policies for Go2 skills ( tried running deepmimic trained policy from IsaacGym to Newton, completely fails to track). Do you have any suggestions ?
Hi @xbpeng, I have been trying simple domain randomization (DR) like Kp, Kd, COM, Mass and Joint friction for DeepMimic policy training, with 5-step robot state observation history and 1-step action history. The training of the policy seems to be very slow, w/o DR it used to take 8hrs for 8000 iterations, w/ DR it takes 8hrs for 800 iterations. And the returns after 800 iterations are also very low ( < 1). I want to test the sim2sim transfer performance of some policies for Go2 skills ( tried running deepmimic trained policy from IsaacGym to Newton, completely fails to track). Do you have any suggestions ?