FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment
- [2026/2/20] We released our paper on ArXiv.
First, clone and install the RoboTwin repo and required packages. You can follow the guidance in RoboTwin Document.
git clone https://github.com/RoboTwin-Platform/RoboTwin.git
conda create -n frappe python=3.10 -y
conda activate frappe
bash script/_install.sh #Install RoboTwin basic envs and CuRobo
bash script/_download_assets.sh #Download assets (RoboTwin-OD, Texture Library and Embodiments)Then we can continue to set up the environment for environment.
# Make sure python version == 3.10
conda activate frappe
# Install pytorch
# Look up https://pytorch.org/get-started/previous-versions/ with your cuda version for a correct command
pip install torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu121
# Install packaging
pip install packaging==24.0
pip install ninja
# Verify Ninja --> should return exit code "0"
ninja --version; echo $?
# Install flash-attn
pip install flash-attn==2.7.2.post1 --no-build-isolation
# Install other prequisites
pip install -r requirements.txtThen clone our repo as a policy of the RoboTwin, the directory structure will be as below:
cd policy
git clone https://github.com/Jbo-Wang/frappe.gitRoboTwin
├── policy
· ├── FRAPPE
│
└── other policys ...
Download the pretrained ckpt nad Encoders we will use in the training stage.
# In the RoboTwin ROOT directory
cd policy
mkdir weights
cd weights
mkdir RDT && cd RDT
# Download the models
huggingface-cli download google/t5-v1_1-xxl --local-dir t5-v1_1-xxl
huggingface-cli download google/siglip-so400m-patch14-384 --local-dir siglip-so400m-patch14-384
huggingface-cli download robotics-diffusion-transformer/rdt-1b --local-dir rdt-1b
# Teacher eocders
#theia
huggingface-cli download theaiinstitute/theia-base-patch16-224-cdiv --local-dir theia-base-patch16-224-cdiv
#clip
huggingface-cli download laion/CLIP-ViT-H-14-laion2B-s32B-b79K --local-dir CLIP-ViT-H-14-laion2B-s32B-b79K
#vit
huggingface-cli download google/vit-huge-patch14-224-in21k --local-dir vit-huge-patch14-224-in21k
#dinov2
git clone https://github.com/facebookresearch/dinov2.git
cd dinov2-main
mkdir checkpoints && cd checkpoints
huggingface-cli download facebook/dinov2-base Then update your real paths of teacher encoders (Theia, CLIP, VIT, DINOv2) in the utils.py.
You can download the checkpoints from Huggingface.
The directory structure will be as below:
flappe
├── checkpoints
· ├── flappe_taskxxx
│ └──checkpoint-xxx
└── ...
We offer a inference example for our method (eval.sh).
model_name should be the checkpoint file name under the ./checkpoints folders.
conda activate frappe
bash eval.shWe offer a training example for our method. It contains two stage training:
- mid-training (finetune_mid.sh & model_config/mid_train.yml)
For mid_train.yml, you should update the path of the pretrained ckpts and the pretrained_model_name_or_path; - post-training (finetune_post.sh & model_config/post_train.yml) For post_train.yml, you should update the path of the mid-train ckpts, the pretrained_model_name_or_path and the teacher encoder paths;
conda activate frappe
bash finetune_mid.sh # or bash finetune_post.shThe default configurations match the experimental setup in our paper.
✅ Training and inference code on RoboTwin2.0
For further discussion and collaboration, please feel free to contact us via Email and WeChat:
| Author | ||
|---|---|---|
| Han Zhao | zhaohan34@westlake.edu.cn | este_zh |
| Jingbo Wang | guangtouchangkaishen@outlook.com | guangtouchangkaishen |
| Wenxuan Song | songwenxuan0115@gmail.com | swx0757 |
We thank these great works and open-source codebases: RDT & Theia
If you find this work useful, please cite:
@article{zhao2026frappe,
title={FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment},
author={Han Zhao and Jingbo Wang and Wenxuan Song and Shuai Chen and Yang Liu and Yan Wang and Haoang Li and Donglin Wang},
journal = {arXiv preprint arXiv:2602.17259},
year={2026},
} 