Skip to content

MiniCPM-V 4.6 as a VLM backbone in StarVLA for LIBERO robot manipulation #1109

@shaohua-pan

Description

@shaohua-pan

Hi MiniCPM-V team,

I wanted to share a downstream integration / community use case of MiniCPM-V 4.6 in embodied AI.

We recently integrated MiniCPM-V 4.6 as a new vision-language backbone for StarVLA, a Vision-Language-Action framework for robot manipulation.

StarVLA PR: starVLA/starVLA#354

The integration adds:

  • A MiniCPM-V 4.6 VLM wrapper for StarVLA
  • PI / GR00T-style VLA framework support
  • LIBERO training and evaluation scripts
  • Example configs for 8-GPU training

Using MiniCPM-V 4.6, we trained and evaluated on the LIBERO benchmark and got the following results:

Benchmark Success Rate
LIBERO-Spatial 94.0%
LIBERO-Object 98.0%
LIBERO-Goal 98.0%
LIBERO-10 92.4%
Overall 95.6%

Training setup:

  • Backbone: openbmb/MiniCPM-V-4.6
  • Effective batch size: 128
  • Max training steps: 80k
  • Attention implementation: flash_attention_2
  • No modules are frozen by default (FREEZE_MODULES=""), so MiniCPM-V is trainable unless overridden.

MiniCPM-V 4.6 looks like a promising lightweight VLM backbone for embodied AI / VLA-style robot learning. The initial LIBERO results are encouraging, especially given the compact model size compared with many larger VLM backbones.

As a next step, we are also considering testing this MiniCPM-V-based StarVLA model on a real SO-101 robot setup, to further evaluate its sim-to-real potential beyond LIBERO simulation benchmarks.

Thanks for releasing MiniCPM-V!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions