Training speed mismatch

Hello, Thanks for the great works again. I am trying to use the following command (which is almost the same as the command your repo provided)to reproduce the training on Libero with 4H100. 

It turn out it will take about 28 hours to train, which is mismatch with 5 hours the repo provided.
<img width="1448" height="121" alt="Image" src="https://github.com/user-attachments/assets/2b32cbd4-647a-4250-8543-d2025fd072f4" />

Also GPU use also now fully utilized, is this normal?

<img width="400" height="200" alt="Image" src="https://github.com/user-attachments/assets/bf713fc7-13ad-4d45-914d-7f805194be1f" />

Do you have any thought why the training time is much longer? Thanks and looking forward to hear from you!



```bash
data_name=libero_spatial_no_noops

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --standalone --nnodes 1 --nproc-per-node 4 vla-scripts/finetune.py \
--vlm_path pretrained_models/prism-qwen25-extra-dinosiglip-224px-0_5b \
--config_file_path pretrained_models/configs \
--data_root_dir data/libero \
--dataset_name $data_name \
--run_root_dir outputs \
--use_film False \
--num_images_in_input 2 \
--use_proprio True \
--use_lora True \
--use_fz False \
--use_minivlm True \
--image_aug True \
--num_steps_before_decay 150000 \
--max_steps 150005 \
--save_freq 5000 \
--save_latest_checkpoint_only False \
--merge_lora_during_training True \
--batch_size 16 \
--grad_accumulation_steps 1 \
--learning_rate 2e-4 \
--lora_rank 64 \
--use_pro_version True \
--wandb_project "$data_name" \
--run_id_note VLA-Adapter--spatial--testing

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training speed mismatch #35

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training speed mismatch #35

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions