MVInverse enables feed-forward, multi-view consistent inverse rendering without per-scene optimization
- [April 1, 2026] ✨ Training code release, please see Training Guide.
- [December 24, 2025] 🚀 Inference code release.
We introduce MVInverse, aiming to address the limitations of existing methods—such as inconsistent results or high computational costs—when reconstructing scene geometry and materials from multiple images. It introduces a feed-forward framework that leverages alternating attention mechanisms to directly and coherently predict holistic scene properties from an image sequence, achieving state-of-the-art performance in multi-view consistency, material and normal estimation quality.
First, clone the repository and install the required packages.
git clone https://github.com/Maddog241/mvinverse.git
cd mvinverse
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118
pip install opencv-python huggingface_hub==0.35.0You can run inference directly using the provided script. It processes a directory of images and generates corresponding material and geometry maps for each input frame.
python inference.py --data_path examples/Courtroom --save_path <your/output/dir>
python inference.py --data_path <path/to/your/images_dir> --save_path <your/output/dir>
Arguments:
data_path: Path to the input image directory.ckpt: Path to the model checkpoint file.save_path: Directory where the output images will be saved.num_frames: Number of frames to process. Set to -1 to process all images in the directory.device: Device to run inference on (cuda or cpu).
Please see Training Guide
Our work is built upon these fantastic open-source projects: