To enhance generalization to novel instructions and environment variations, we propose Coarse-to-fine Language-Aligned manipulation Policy (CLAP), a framework that integrates three key components: 1) task decomposition, 2) VLM fine-tuning for 3D keypoint prediction, and 3) 3D-aware representation.
-
Tested (Recommended) Versions: Python 3.10 and CUDA 12.1.
-
Step 1 (Optional): We recommend using conda and creating a virtual environment.
conda create --name clap python=3.10
conda activate clap
- Step 2: Install PyTorch. Make sure the PyTorch version is compatible with the CUDA version.More instructions to install PyTorch can be found here.
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
check cuda is available with the installed torch before moving to next step.
- Step 3: Install PyTorch3D. For more instructions visit here.
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
- Step 4: Install CoppeliaSim. PyRep requires version 4.1 of CoppeliaSim. Download and unzip CoppeliaSim:
- Ubuntu 16.04
- Ubuntu 18.04
- Ubuntu 20.04
Once you have downloaded CoppeliaSim, add the following to your ~/.bashrc file. (NOTE: the 'EDIT ME' in the first line)
export COPPELIASIM_ROOT=<EDIT ME>/PATH/TO/COPPELIASIM/INSTALL/DIR
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$COPPELIASIM_ROOT
export QT_QPA_PLATFORM_PLUGIN_PATH=$COPPELIASIM_ROOT
export DISPLAY=:1.0
For headless server, use this line to replace the last line above
Xvfb :0 -screen 0 1024x768x24 +extension GLX +render -noreset & export DISPLAY=:0
Remember to source your .bashrc (source ~/.bashrc) or .zshrc (source ~/.zshrc) after this.
- Step 5: Clone the repository with the submodules using the following command.
git clone --recurse-submodules https://github.com/Jianshu-Hu/CLAP.git && cd CLAP && git submodule update --init
- Step 6: Install packages for fine-tuning VLM with ms-swift
pip install ms-swift==3.5.2
pip install transformers==4.51.3
pip install modelscope==1.27.1
pip install peft==0.15.2
pip install trl==0.18
pip install deepspeed==0.16.9
pip install vllm==0.8.5.post1
pip install qwen_vl_utils
- Step 7: Install, required libraries such as PyRep, RLBench, YARR, Point Renderer and robot-colosseum.
pip install -e libs/PyRep
pip install -e libs/RLBench
pip install -e libs/YARR
pip install -e libs/point-renderer
pip install -e libs/robot-colosseum-rvt/
pip install transforms3d
pip install timm
pip install bitsandbytes
pip install openai-clip
pip install pyquaternion
-
Step 8: Collect dataset.
- You can generate the initial demonstrations using the following command. They will be generated under
Generalizable-CLAP/data/gembench/xxxwherexxxis eithertrain,test, orval. And modifyDATA_DIRinconfig.pyto match the location.
bash scripts/collect_gembench_data.sh- Additionally, we use the same dataloader as PerAct, which is based on YARR. It will save the replay buffer in the disk (It will only be created once when you run the low-level training). You can modify
TASK_REPLAY_STORAGE_FOLDERinconfig.pyto decide the location for saving the replay buffer.
- You can generate the initial demonstrations using the following command. They will be generated under
-
Additional notes:
- For headless server, if you faced issue related to qt such as
Could not find the Qt platform plugin "xcb", try
pip uninstall opencv-python opencv-python-headless pip install opencv-python-headless- If you faced issue related to libGL such as
miniconda3/envs/robot-vlm/bin/../lib/libstdc++.so.6: version GLIBCXX_3.4.30' not found, try the following command to see if your computer already hasGLIBCXX_3.4.30locally.
strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXXIf you can see
GLIBCXX_3.4.30, try to backup the originallibstdc++.so.6and copy your locallibstdc++.so.6to the conda env. Remember to replace the directory with your own path.mv miniconda3/envs/robot-vlm/lib/libstdc++.so.6 miniconda3/envs/robot-vlm/lib/libstdc++.so.6.old cp /usr/lib/x86_64-linux-gnu/libstdc++.so.6 miniconda3/envs/robot-vlm/lib/libstdc++.so.6If you can not see
GLIBCXX_3.4.30, try to update thelibstdc++library. - For headless server, if you faced issue related to qt such as
- Coarse Task Planner:
- Step 1: Prepare training data.
See detailed instructions for more information.
bash scripts/prepare_gembench_pretraining_data.sh - Step 2: Train high-level module for GemBench. We provide scripts for multi-gpu training:
and python code for single gpu training:
bash scripts/sft_gembench.shpython train.py --tag coarse_task_planner --task_name gembench --num_episodes 10 --data_type lang_keypoints --cot 9 --epochs 1 --lr 0.0003 --eval_save_steps 250 --include_lang_plan gembench
- Step 1: Prepare training data.
- Fine-grained action predictor:
- Step 1: Train low-level policy for Gembench. Note that you need to set gradient_accumulation as 16/num_gpus according to the number of gpus you use. For example, if you run with one gpu, run with:
python finegrained_policy/train.py --gradient_accumulation 16 --with_val --epochs 20 --tasks gembench --tag fine_grained_policy
- Step 1: Train low-level policy for Gembench. Note that you need to set gradient_accumulation as 16/num_gpus according to the number of gpus you use. For example, if you run with one gpu, run with:
Eval with Gembench.
bash scripts/eval_gembench.sh
Note: See detailed instructions for more information.