Skip to content

Latest commit

 

History

History
160 lines (99 loc) · 4.42 KB

File metadata and controls

160 lines (99 loc) · 4.42 KB

YOLO Image Segmenter

A powerful Python tool for object detection, cropping, background removal, and classification using YOLO and Segment Anything Model (SAM).

Features

  • Object Detection: Detect objects in images using YOLOv11
  • Smart Cropping: Automatically crop detected objects with intelligent padding based on confidence
  • Background Removal: Remove backgrounds using SAM segmentation
  • Mask Refinement: Clean segmentation masks and assign unallocated pixels
  • Classification: Optionally classify cropped objects
  • Batch Processing: Process entire directories of images

Requirements

  • Python 3.12 or higher
  • Dependencies listed in pyproject.toml

Installation

This package uses uv for dependency management. If you don't have uv installed, you can install it following the instructions at https://github.com/astral-sh/uv.

# Clone the repository
git clone https://github.com/trenchproject/trench_image_processor.git
cd trench_image_processor

# Install dependencies using uv and actiavte virtual environment
uv sync
source .venv/bin/activate

# Run unit and integration tests
pytest
pytest -m "slow"

Usage

The program provides a command-line interface with various options:

uv run src/main.py -i INPUT_DIR -o OUTPUT_DIR -m YOLO_MODEL [options]

Basic Arguments

  • -i, --input_dir: Directory containing images to process
  • -o, --output_dir: Directory for saving cropped images
  • -m, --model: Path to YOLOv11 model file (.pt)
  • -c, --confidence: Confidence threshold (default: 0.8)
  • --iou: Intersection over union threshold (default: 0.2)
  • -r, --resolution: Set output resolution width in pixels

Padding Options

  • -bp, --base-padding: Base padding factor at highest confidence (default: 0.05)
  • -mp, --max-padding: Maximum padding factor at threshold confidence (default: 0.25)

Background Removal Options

  • --sam-model: Path to SAM segmentation model file (.pt)
  • --bg-color: Background replacement color in hex format (default: #00a000 - green)
  • --clean-masks: Clean segmentation mask by removing small holes and islands
  • --complete-masks: Assign all foreground pixels in source image to an output segment
  • --visualize-leftovers: Visualize leftover pixels before and after mask completion

Classification Options

  • --cls-model: Path to YOLO classification model file (.pt)
  • --cls-confidence: Confidence threshold for classification (default: 0.75)

Examples

Basic Object Detection and Cropping

uv run main.py -i ./images -o ./output -m yolov11n.pt

With Background Removal

uv run src/main.py -i ./images -o ./output -m yolov11n.pt --sam-model sam_l.pt --bg-color "#00a000"

With Classification

uv run src/main.py -i ./images -o ./output -m yolov11n.pt --cls-model yolov11-cls.pt

Complete Pipeline with Custom Settings

uv run main.py -i ./images -o ./output -m yolov11n.pt -c 0.7 --iou 0.3 \
  -bp 0.1 -mp 0.3 -r 800 \
  --sam-model sam_l.pt --bg-color "#0000FF" --clean-masks --complete-masks \
  --cls-model yolov11-cls.pt --cls-confidence 0.8

Output

The program saves cropped images to the output directory with filenames that include:

  • Original image name
  • Class number and name
  • Instance number (if multiple objects of the same class)
  • Classification result (if classification is enabled)

Example: image1--0_person-(1)-[5_adult].jpg

Advanced Features

Smart Padding

The tool applies variable padding to bounding boxes based on detection confidence:

  • Higher confidence detections receive minimal padding (base-padding)
  • Lower confidence detections receive more padding (up to max-padding)

Mask Refinement

When using background removal with --clean-masks and --complete-masks:

  • Small isolated regions and holes are removed from masks
  • Unallocated foreground pixels are assigned to the closest appropriate segment

Logging

The program provides detailed logging information:

  • Detection confidence scores
  • Applied padding values
  • Classification results
  • Processing status for each image

License

Copyright (c) 2025 University of Washington
Licensed under the MIT License. See LICENSE file for details.

Acknowledgments