Thanks for opening such good work!
I have two questions about the work:
-
Is the target position output by the VLM the same as the action output by the VLA—namely, a trajectory composed of multiple points?
-
If so, why do we need to use this trajectory to steer the VLA? Why not directly feed the trajectory output by the VLM to the robot for execution?
I would greatly appreciate it if you could help answer my questions!
Thanks for opening such good work!
I have two questions about the work:
Is the target position output by the VLM the same as the action output by the VLA—namely, a trajectory composed of multiple points?
If so, why do we need to use this trajectory to steer the VLA? Why not directly feed the trajectory output by the VLM to the robot for execution?
I would greatly appreciate it if you could help answer my questions!