A high-performance, real-time segmentation and depth estimation pipeline designed for high-end NVIDIA GPUs. This project seamlessly integrates SAM 3 (Segment Anything Model 3) and Depth Anything V3 to provide instance-aware depth maps.
- Real-time Processing: Optimized for high-throughput inference.
- Instance-Aware Depth: Combines semantic masks from SAM 3 with metric depth from Depth Anything 3.
- Automated Setup: One-click environment configuration for complex dependencies (PyTorch + Local Packages).
- Screen Capture Integration: Built-in support for real-time screen inference.
- Cross-Platform: Supports both Windows 11 and Ubuntu/Linux.
Requirements: NVIDIA GPU with CUDA 12.x support. Supports Windows 11 and Ubuntu 20.04+.
Ensure you clone with submodules to get the core model architectures:
git clone --recursive https://github.com/Lloyd-lei/SegDepthFusion.git
cd SegDepthFusionRun the setup script in your terminal:
bash auto_setup.shThis script will:
- Create a Conda environment named
seg_depth_auto(Python 3.11). - Install project dependencies (xformers, triton, etc.).
- Install local
sam3andDepth-Anything-3modules in editable mode.
Note: The script assumes PyTorch and CUDA are already installed on your system. If not, install them first:
conda activate seg_depth_auto pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
Run this command in PowerShell:
powershell -ExecutionPolicy Bypass -File auto_setup.ps1This script will:
- Create a Conda environment named
seg_depth_auto(Python 3.11). - Install PyTorch, TorchVision, and xformers compatible with your GPU.
- Install general dependencies (
numpy,cv2, etc.). - Compile and install local
sam3andDepth-Anything-3modules.
conda activate seg_depth_autoVerify your installation and model loading with the included test script:
python quick_test.pyProcess a folder of images to generate segmentation + depth visualizations:
python test_pipline.py --folder orange_photosResults will be saved to the outputs/ directory.
To run the main real-time processing loop (configurable via config.yaml):
python main.pyseg_depth/
├── auto_setup.sh # Linux installation script
├── auto_setup.ps1 # Windows installation script
├── requirements.txt # Python dependencies
├── config.yaml # Pipeline configuration
├── main.py # Real-time application entry point
├── test_pipline.py # Batch image processing script
├── quick_test.py # Installation verification
├── sam3_model.py # SAM 3 Model Wrapper
├── da3_model.py # Depth Anything 3 Model Wrapper
├── seg_depth_pipeline.py # Core logic combining Seg + Depth
├── orange_photos/ # Test images directory
├── Depth-Anything-3/ # [Submodule] Depth Anything V3 source
└── sam3/ # [Submodule] SAM 3 source
gsplatWarning: You may see a warning aboutgsplatmissing. This is optional for 3D rendering and does not affect the core pipeline.tritonWarning: On Windows,xformersmay warn about missingtriton. This is normal (Linux-only feature) and safe to ignore. On Linux,tritonis installed automatically for better performance.
This project is based on SAM 3 and Depth Anything. Please refer to their respective repositories for license details.
