Joint Optimization for 4D Human-Scene Reconstruction in the Wild

Zhizheng Liu, Joe Lin, Wayne Wu, Bolei Zhou
University of California, Los Angeles

Installation

Setup the repo:

git clone --recursive git@github.com:genforce/JOSH.git
cd JOSH
conda create -n josh python=3.10 -y # must use python 3.10 for chumpy compatibility
conda activate josh

Installing Dependencies (Tested with Ubuntu 22.04 + CUDA 12.8 + 24GB VRAM):

# assume CUDA 12.8, install pytorch and packages
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt 
pip install --no-build-isolation git+https://github.com/mattloper/chumpy
pip install -e .

Download Pretrained Models

Download SMPL body models (SMPL_MALE.pkl, SMPL_FEMALE.pkl, SMPL_NEUTRAL.pkl) at the official webpage and place then under data/smpl folder.
Download VIMO checkpoint(vimo_checkpoint.pth.tar) for HMR and place it under data/checkpoints.
Download DECO checkpoint(deco_best.pth) for contact estimation and place it under data/checkpoints.
Move the function parse_chunks from third_party/tram/lib/pipeline/tools.py to third_party/tram/lib/models/hmr_vimo.py so we don't install extra dependencies.

JOSH Demo

Assume the demo video is located at $input_folder/XXXX.mp4, run the following:

rerun --serve-grpc # in another terminal, for visualization
bash josh_demo.sh $input_folder

For example, run bash josh_demo.sh assets/demo1, we will store all the intermediate outputs as well as the final result under $input_folder.

Compared to the original paper, we now support using the local point cloud from the state-of-the-art method Pi3X as initialization, which could lead to a better reconstruction performance.

Note that since JOSH is an optimization-based method, you may want to tune the hyper-parameters for the optimal performance (see josh/config.py). With the default hyperparameters, you should get the following results after running the demos:

Demo 1 Sample Output

Demo 2 Sample Output

Long Demo Sample Output

For long videos (>=200 frames), we apply chunk processing and then aggregate the chunk results by simply concatenating them (see josh/aggregate_results.py). We will leave global bundle adjustment to future work.

JOSH3R Demo

Download the JOSH3R checkpoint from this link to $CKPT_PATH, and use the same $input_folder from the JOSH demo, then run the follows:

python josh/inference_josh3r.py --input_folder "$input_folder" --ckpt_path $CKPT_PATH  --visualize

Note that the scene reconstruction quality of JOSH3R may not be great due to the end-to-end inference of the base model MASt3R without optimization, but the global human trajectory prediction should look more plausible.

Evaluation

pip install evo # for camera pose evaluation

We provide evaluation scripts at josh/eval on all the datasets with basic instructions. Please refer to the original dataset repos for data downloading and processing. The scripts are not thoroughly tested, and feel free to open an issue if you encounter any problems or bugs.

Acknowledgements

We would like to thank the following projects for inspiring our work and open-sourcing their implementations:

Human Mesh Recovery: WHAM, TRAM, HMR2.0

Human Detection and Segmentation: SAM3

Scene Reconstruction: DUSt3R, MASt3R, Pi3

Human Contact Estimation: BSTRO, DECO

Evaluation Datasets: EMDB, SLOPER4D, RICH

Contact

For any questions or discussions, please contact Zhizheng Liu.

Reference

If our work is helpful to your research, please cite the following:

@article{liu2026joint,
    title={Joint Optimization for 4D Human-Scene Reconstruction in the Wild},
    author={Liu, Zhizheng and Lin, Joe and Wu, Wayne and Zhou, Bolei},
    journal={The Fourteenth International Conference on Learning Representations},
    year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assets		assets
josh		josh
preprocess		preprocess
recordings		recordings
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
josh_demo.sh		josh_demo.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Joint Optimization for 4D Human-Scene Reconstruction in the Wild

Installation

Setup the repo:

Installing Dependencies (Tested with Ubuntu 22.04 + CUDA 12.8 + 24GB VRAM):

Download Pretrained Models

JOSH Demo

Demo 1 Sample Output

Demo 2 Sample Output

Long Demo Sample Output

JOSH3R Demo

Evaluation

Acknowledgements

Contact

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Joint Optimization for 4D Human-Scene Reconstruction in the Wild

Installation

Setup the repo:

Installing Dependencies (Tested with Ubuntu 22.04 + CUDA 12.8 + 24GB VRAM):

Download Pretrained Models

JOSH Demo

Demo 1 Sample Output

Demo 2 Sample Output

Long Demo Sample Output

JOSH3R Demo

Evaluation

Acknowledgements

Contact

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages