M3DDM+: An Improved Video Outpainting by a Modified Masking Strategy

Takuya Murakawa, Takumi Fukuzawa, Ning Ding, Toru Tamaki
Nagoya Institute of Technology

IWAIT 2026

comparison.mp4

Environment Setup

Create and activate a virtual environment using Python 3.12 (or 3.10+):

python3.12 -m venv .venv
source .venv/bin/activate

Install all dependencies from the requirements.txt file:

pip install -r requirements.txt

Note: Make sure you have Python 3.10 or later installed. Our testing environment uses Python 3.12.3 with PyTorch 2.8.0+cu128 and CUDA 13.1.

Troubleshooting

If you encounter the following error during setup:

ImportError: cannot import name 'cached_download' from 'huggingface_hub'

Run the following command to fix it:

pip install huggingface-hub==0.25.2

Reference: Stack Overflow - ImportError: cannot import name 'cached_download' from 'huggingface_hub'

Download Models

Before you can run the project, you need to download the following:

Pre-trained Stable Diffusion Model Weights:

We used the VAE encoder and decoder inside Stable Diffusion Model. To get the pre-trained stable diffusion v1.5 weights, download them from the following link:
https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
```
huggingface-cli download stable-diffusion-v1-5/stable-diffusion-v1-5 \
  --local-dir ./stable-diffusion-v1-5
```
M3DDM+ Model Checkpoints:

To get pre-trained M3DDM+ model weights, download them from the Hugging Face repository.
https://huggingface.co/MurakawaTakuya/M3DDM-Plus
```
huggingface-cli download MurakawaTakuya/M3DDM-Plus \
  --local-dir ./M3DDM-Plus
```

Directory Structure

After downloading the models, your directory should look like this:

M3DDM-Plus/
├── src/                        # Source code
│   ├── inference.py
│   ├── evaluate.py
│   ├── train.py
│   ├── model/
│   └── pipelines/
├── stable-diffusion-v1-5/      # SD v1.5 weights (VAE + scheduler)
│   ├── scheduler/
│   │   └── scheduler_config.json
│   └── vae/
│       ├── config.json
│       └── diffusion_pytorch_model.bin
├── M3DDM-Plus/                 # M3DDM+ model weights
│   ├── config.json
│   └── diffusion_pytorch_model.bin
└── sample/                     # Sample videos (optional)
    ├── bear.mp4
    └── ...

Code Dependency

flowchart LR
    train.py -->|instantiates| evaluate.py
    evaluate.py -->|instantiates| inference.py

inference.py runs video outpainting on a single input video.
evaluate.py crops each video in a dataset by a specified ratio, runs outpainting to reconstruct the cropped region, and computes metrics (MSE, PSNR, SSIM, LPIPS, BMSE) against the original.
train.py trains the model. At the end of each epoch, it optionally calls evaluate.py to run outpainting on a real dataset — separate from the loss-based validation step — so you can visually and quantitatively track generation quality during training.

Inference

Takes a single input video and expands it to a specified aspect ratio using outpainting.

Try with Samples

The sample videos in the sample/ directory are taken from the DAVIS dataset. You can quickly test the model using these clips.

CUDA_VISIBLE_DEVICES=0 python src/inference.py \
  --input_video_path "sample/bear.mp4" \
  --pretrained_sd_dir "stable-diffusion-v1-5" \
  --video_outpainting_model_dir "M3DDM-Plus" \
  --output_dir "sample/output/bear" \
  --target_ratio_list "1:1" \
  --output_size 256

You can run the inference code with the following command:

CUDA_VISIBLE_DEVICES=0 python src/inference.py \
  --input_video_path "path/to/input_video.mp4" \
  --pretrained_sd_dir "stable-diffusion-v1-5" \
  --video_outpainting_model_dir "M3DDM-Plus" \
  --output_dir "path/to/output_directory" \
  --target_ratio_list "1:1" \
  --output_size 256

Parameters

video_outpainting_model_dir: The directory where the video-outpainting model weights are stored.
target_ratio_list: The aspect ratio for the output video. You can input a single value such as "1:1", "16:9", or "9:16", or you can input a list like "16:9,9:16". For better results, we recommend inputting a single value.

GPU Memory

Inference requires approximately 13GB of VRAM for 256*256 resolution on a single NVIDIA RTX 8000. (Increasing frames doesn't increase GPU memory usage.)
To save GPU memory, you can use PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True. Also, using --enable_attention_slicing will reduce memory consumption at the cost of inference speed.

Training

Training fine-tunes the M3DDM+ model weights. The VAE (from Stable Diffusion v1.5) is frozen throughout; only the 3D UNet loaded from video_outpainting_model_dir is updated.

You can run the training code with the following command:

CUDA_VISIBLE_DEVICES=1 python src/train.py \
  --data_dir "path/to/dataset/directory" \
  --size 128 \
  --epochs 5 \
  --lr 1e-5 \
  --pretrained_sd_dir "stable-diffusion-v1-5" \
  --video_model_dir "M3DDM-Plus" \
  --gpus 1 \
  --output_dir "output" \
  --max_samples 10000 \
  --eval_video_dir "path/to/evaluation_video_directory" \
  --eval_crop_ratio 0.25 \
  --eval_crop_axis "horizontal" \
  --eval_target_ratio_list "16:9" \
  --limit_val_batches 1000

Parameters

data_dir: The directory where the training data is stored. The directory should contain /train and /val directories.
video_model_dir: The directory where the video-outpainting model weights are stored.
output_dir: The directory where the training results will be saved.
max_samples: The maximum number of samples to use for training.
eval_video_dir: The directory where the evaluation data is stored.
eval_crop_ratio: The ratio of the evaluation data to use for evaluation.
eval_crop_axis: The axis to use for cropping the evaluation data.
eval_target_ratio_list: The aspect ratio for the output video. You can input a single value such as "1:1", "16:9", or "9:16", or you can input a list like "16:9,9:16". For better results, we recommend inputting a single value.
limit_val_batches: The number of videos to use for validation.

Use --disable_validation to disable validation.

GPU Memory

Training requires approximately 28GB of VRAM at 128x128 resolution on a single NVIDIA RTX 8000.
To reduce GPU memory usage, you can enable --enable_unet_gradient_checkpointing, which will reduce memory consumption at the cost of training speed.

Evaluation

Takes a folder of videos, crops each by a specified ratio to simulate a narrower input, runs outpainting to reconstruct the cropped region, and computes metrics (MSE, PSNR, SSIM, LPIPS, BMSE) by comparing the generated output against the original.

CUDA_VISIBLE_DEVICES=0 python src/evaluate.py \
  --video_dir "path/to/data/directory" \
  --pretrained_sd_dir "stable-diffusion-v1-5" \
  --video_outpainting_model_dir "M3DDM-Plus" \
  --target_ratio_list "16:9" \
  --crop_ratio 0.25 \
  --crop_axis "horizontal" \
  --output_size 256 \
  --limit_outpainting_frames -1

Parameters

video_dir: The directory where the evaluation data is stored.
pretrained_sd_dir: The directory where the pre-trained stable diffusion model weights are stored.
video_outpainting_model_dir: The directory where the video-outpainting model weights are stored.
target_ratio_list: The aspect ratio for the output video. You can input a single value such as "1:1", "16:9", or "9:16", or you can input a list like "16:9,9:16". For better results, we recommend inputting a single value.
crop_ratio: The ratio of the evaluation data to use for evaluation.
crop_axis: The axis to use for cropping the evaluation data.
output_size: The size of the output video.
limit_outpainting_frames: The number of frames to use for outpainting. Use -1 to use all frames.

GPU Memory

Evaluation requires the same amount of VRAM and time as inference, multiplied by the number of evaluation videos. To save GPU memory, you can use PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True. Also, using --enable_attention_slicing will reduce memory consumption at the cost of inference speed.

Logging

This project uses Comet ML for experiment tracking and logging.

Add --disable_comet (or -dc) to disable logging to Comet.

Comet Configuration

Comet configuration uses two configuration files:

./.comet.config in this project directory
~/.comet.config in your home directory

For more details, refer to the Comet configuration documentation.

Important: Do not write API keys directly in code.

Global Configuration (Home Directory)

Create ~/.comet.config with settings common to all your projects as follows:

[comet]
api_key=XXXXXHereIsYourAPIKeyXXXXXXXX
workspace=your_workspace_name

[comet_logging]
hide_api_key=True

Set your Comet API key and default workspace
Set hide_api_key=True to prevent API keys from appearing in logs

Project Configuration

Copy the example configuration file .comet.config.example and name it .comet.config:

cp .comet.config.example .comet.config

Then edit ./.comet.config with your data:

[comet]
workspace=your_workspace_name # Change to your workspace name (comet user name)
project_name=M3DDM-Plus

[comet_logging]
file=comet_logs/comet_{project}_{datetime}.log # Change the path to your desired location (optional)

Settings here override those in ~/.comet.config

Citation

If our work is helpful, please help to ⭐ the repo.

Please consider citing our paper if you found our work interesting and useful.

@inproceedings{murakawa_IWAIT2026_M3DDMPlus,
  title={M3DDM+: An improved video outpainting by a modified masking strategy},
  author={Murakawa, Takuya and Fukuzawa, Takumi and Ding, Ning and Tamaki, Toru},
  booktitle={Proceedings of the International Workshop on Advanced Imaging Technology (IWAIT)},
  year={2026}
}

Contact us

Please feel free to reach out to us:

Email: t.murakawa.080@nitech.jp and tamaki.toru@nitech.ac.jp

Acknowledgement

The inference and pipeline code is based on published code of M3DDM-Video-Outpainting. The training and evaluation code was reproduced based on the M3DDM paper as it isn't published, and modified for our proposed method.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.vscode		.vscode
img		img
sample		sample
src		src
.comet.config.example		.comet.config.example
.gitignore		.gitignore
LICENSE		LICENSE
README.MD		README.MD
pyrghtconfig.json		pyrghtconfig.json
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

M3DDM+: An Improved Video Outpainting by a Modified Masking Strategy

Environment Setup

Troubleshooting

Download Models

Directory Structure

Code Dependency

Inference

GPU Memory

Training

GPU Memory

Evaluation

GPU Memory

Logging

Comet Configuration

Global Configuration (Home Directory)

Project Configuration

Citation

Contact us

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

M3DDM+: An Improved Video Outpainting by a Modified Masking Strategy

Environment Setup

Troubleshooting

Download Models

Directory Structure

Code Dependency

Inference

GPU Memory

Training

GPU Memory

Evaluation

GPU Memory

Logging

Comet Configuration

Global Configuration (Home Directory)

Project Configuration

Citation

Contact us

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages