This repository contains a Cog wrapper for the MiniMax-Remover video object removal model.
Model API: ayushunleashed/minimax-remover
MiniMax-Remover is a fast and effective video object remover based on minimax optimization. This Cog wrapper provides a convenient API for running the model on Replicate with video and mask inputs.
object-remover/
├── cog.yaml # Cog configuration
├── predict.py # Cog prediction interface
├── download_weights.py # Weight downloader script
├── minimax_remover/ # Git submodule
│ ├── README.md
│ ├── requirements.txt
│ ├── pipeline_minimax_remover.py
│ ├── transformer_minimax_remover.py
│ └── ...
├── sample_data/ # Sample videos for testing
│ ├── racoon_video.mp4 # Input video with racoon
│ └── racoon_mask.mp4 # Mask video (white areas to remove)
└── README.md # This file
This wrapper is build for MiniMax-Remover project. The original repository contains the core implementation which is implemented as sub module here.
# Clone the repository
git clone https://github.com/AyushUnleashed/object-remover.git
cd object-remover
# Initialize and update the submodule
git submodule update --init --recursiveFollow the official Cog installation guide:
# On macOS
brew install replicate/tap/cog
# On Linux/Windows WSL
sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)"
sudo chmod +x /usr/local/bin/cog# manually download weights first (optional)
python download_weights.py
# Build the Docker image
cog build
# Test with sample videos
cog predict -i video=@sample_data/racoon_video.mp4 -i mask=@sample_data/racoon_mask.mp4- Video: MP4 format recommended, max 81 frames
- Mask: Video file where white areas indicate objects to remove
- Both video and mask should have the same frame count
If you don't have a mask video yet, you can create one using the following workflow:
- Original video you want to remove objects from
- Binary masked video of the object(s) you want to remove
- Extract first frame: Use Frame Extractor to get the first frame of your original video
- Generate mask: Use SAM-2 to create multiple masked images from the first frame
- Select target mask: Choose the mask that covers the subject(s) you want to remove
- Create mask video: Use X-MEM with your selected masked image and original video to generate the complete masked video
This workflow ensures you have a properly formatted mask video that tracks your target object(s) throughout all frames.
video(required): Input video filemask(required): Mask video filenum_frames: Number of frames to processheight: Output video heightwidth: Output video widthnum_inference_steps: Denoising stepsiterations: Mask dilation iterationsseed: Random seed (optional)
# Basic usage with sample data
cog predict \
-i video=@sample_data/racoon_video.mp4 \
-i mask=@sample_data/racoon_mask.mp4
# With custom parameters
cog predict \
-i video=@sample_data/racoon_video.mp4 \
-i mask=@sample_data/racoon_mask.mp4 \import replicate
output = replicate.run(
"ayushunleashed/minimax-remover",
input={
"video": open("your_video.mp4", "rb"),
"mask": open("your_mask.mp4", "rb"),
"num_frames": 25,
"height": 480,
"width": 832,
"num_inference_steps": 12,
"iterations": 6
}
)
print(f"Output video: {output}")- Architecture: Simplified DiT (Diffusion Transformer) with minimax optimization
- Inference Steps: 6-12 steps (much faster than traditional diffusion models)
- Memory Requirements: ~8GB GPU memory for typical usage
- Model Weights: Downloaded automatically from Hugging Face during first setup
- Frame Count: Fewer frames = faster processing
- Resolution: Lower resolution = faster processing
- Inference Steps: 6-12 steps provide good quality/speed balance
- Mask Quality: Clean masks with clear boundaries work best
- Out of Memory: Reduce
num_frames,height, orwidth - Slow Performance: Reduce
num_inference_stepsto 6-8 - Poor Quality: Increase
num_inference_stepsor improve mask quality
This Cog wrapper follows the same license as the original MiniMax-Remover project. See the original repository for license details.
If you use this model, please cite the original MiniMax-Remover paper:
@article{minimax2024,
title={MiniMax-Remover: Taming Bad Noise Helps Video Object Removal},
author={Bojia Zi and Weixuan Peng and Xianbiao Qi and Jianan Wang and Shihao Zhao and Rong Xiao and Kam-Fai Wong},
year={2024}
}