HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

HiF-VLA (Hindsight, Insight, and Foresight for VLAs), a unified framework that leverages motion for bidirectional temporal reasoning. HiF-VLA encodes past dynamics through hindsight priors, anticipates future motion via foresight reasoning, and integrates both through a hindsight-modulated joint expert to enable “think-while-acting” control.

🛠️ Installation

1. Clone Repo and Environment Setup

# Create environment
conda create -n hif-vla python=3.10 -y
conda activate hif-vla

# Install PyTorch
# Use a command specific to your machine: https://pytorch.org/get-started/locally/
pip3 install torch torchvision torchaudio

# Clone hif-vla repo and pip install to download dependencies
git clone https://github.com/minnie-lin/HiF-VLA.git
cd HiF-VLA
pip install -e .

# Install Flash Attention 2 for training (https://github.com/Dao-AILab/flash-attention)
#   =>> If you run into difficulty, try `pip cache remove flash_attn` first
pip install packaging ninja
ninja --version; echo $?  # Verify Ninja --> should return exit code "0"
pip install "flash-attn==2.5.5" --no-build-isolation

2. Dataset Preparation

LIBERO

We used a modified version of the LIBERO dataset from LIBERO, where we added trajectory IDs as an additional annotation.

git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
pip install -e LIBERO
pip install -r experiments/robot/libero/libero_requirements.txt  # From vla-adapter base dir

CALVIN

For the CALVIN ABC→D dataset, we use the rlds version from calvin_abc_rlds.

git clone --recurse-submodules https://github.com/mees/calvin.git
$ export CALVIN_ROOT=$(pwd)/calvin
$ cd $CALVIN_ROOT
$ sh install.sh

Get Motion Vectors

HiF-VLA uses this tool to extract motion vectors. You need to install FFMPEG to support re-encodeing video with MPEG-4 Part 2 codec.
```
# extract motion vectors
python get_save_motion.py
 --data_root_dir xxx
 --dataset_name libero_10_no_noops
```

🚀 Inference

Below are the four independently trained HiF-VLA checkpoints for LIBERO and CALVIN ABC->D:

First, download these checkpoints and place them in the ./ckpts/ folder. The directory structure is as below:

HiF-VLA
    ├── ckpts
    ·   ├── hifvla-libero-spatial
        ·   ├── lora_adapter (folder)
            ├── action_head.pt
            ├── model-00001-of-00003.safetensors
            └── ...

Then, run the commands below to start evaluations with the independently trained checkpoints:

# Launch LIBERO evals
bash eval_libero.sh

# Launch ABC→D evals
bash eval_calvin.sh

🚀 Training

First, be sure you have downloaded the LIBERO datasets, as mentioned in the Data Preparation Section.

Then, download the OpenVLA foundation models.

Next, launch the fine-tuning script below.

bash train.sh

Please be sure to test your policy with the same device/GPU used to train it! Otherwise, performance may drop substantially. You may be able to avoid the performance drop if you merge the LoRA weights into the base model on the downstream device used for testing (e.g., if you train on H100 and then merge on A100 before testing on A100). You can see our script vla-scripts/merge_lora_weights_and_save.py for merging the LoRA adapter into the base model offline. It's okay if you already merged LoRA weights into the base OpenVLA model during fine-tuning; you can always redownload the base model and merge again as long as you still have the LoRA adapter (merge_lora_weights_and_save.py will handle this for you).

Acknowledgments

We thank these great works and open-source codebases: OpenVLA, OpenVLA-OFT, Video-LaViT.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
experiments/robot		experiments/robot
motion_layers		motion_layers
prismatic		prismatic
scripts/extern		scripts/extern
vla-scripts		vla-scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
eval_calvin.sh		eval_calvin.sh
eval_libero.sh		eval_libero.sh
get_save_motion.py		get_save_motion.py
pyproject.toml		pyproject.toml
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

🛠️ Installation

1. Clone Repo and Environment Setup

2. Dataset Preparation

🚀 Inference

🚀 Training

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

🛠️ Installation

1. Clone Repo and Environment Setup

2. Dataset Preparation

🚀 Inference

🚀 Training

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages