Skip to content

Latest commit

 

History

History
228 lines (172 loc) · 6.36 KB

File metadata and controls

228 lines (172 loc) · 6.36 KB

CursorTracker

Unsupervised mouse cursor detection and tracking in instructional videos using tracking-by-detection.

Quick Start

# Install dependencies
poetry install && poetry shell

# From YouTube URL - single command for everything
python cursor_tracker.py \
  --url https://youtube.com/watch?v=VIDEO_ID \
  --output-dir ./data/my_video

# View results
open data/my_video/our_results_1/tracked_video_our_results_1.mp4

Features

  • Fully Unsupervised: Automatically discovers cursor templates, no manual annotation needed
  • End-to-End Pipeline: YouTube URL to Download to Extract to Track to Visualize
  • Robust Tracking: Handles fast motion (over 200px per frame) and instant appearance changes
  • Visual Output: Generates annotated videos with bounding boxes around detected cursors

How It Works

  1. Unsupervised Template Discovery: Uses background subtraction + blob detection to identify cursor templates
  2. Multi-Scale Template Matching: Generates cursor proposals for each frame
  3. Spatiotemporal Path Optimization: Finds optimal tracking trajectory through entire video
  4. Visualization: Draws bounding boxes on frames and creates annotated video

Installation

# Install Poetry
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
git clone https://github.com/yourusername/CursorTracker.git
cd CursorTracker
poetry install
poetry shell

# Create directories
mkdir -p data templates saved_models

Usage

YouTube Videos (Recommended)

Basic usage:

python cursor_tracker.py \
  --url "https://youtube.com/watch?v=VIDEO_ID" \
  --output-dir ./data/my_video

Options:

# Custom quality
--quality 1080p  # Options: 144p, 360p, 480p, 720p, 1080p, 1440p, 2160p

# Process specific frames
--start-frame 100 --end-frame 500

# Skip tracking (preprocessing only)
--skip-tracking

# Custom configuration
--config my_config.yaml

Local Video Files

# Step 1: Preprocess video
python preprocess_video.py \
  --video_path /path/to/video.mp4 \
  --output_dir ./data/my_video \
  --extract_templates

# Step 2: Track cursor
python cursor_tracker_dp.py \
  --video_name my_video \
  --base_dir ./data

# Step 3: Visualize (optional - automatic with YouTube pipeline)
python visualize_results.py \
  --video_name my_video \
  --base_dir ./data

Output Structure

data/my_video/
├── original_video.mp4              # Downloaded video
├── images/                         # Extracted frames
├── background/                     # Background masks
├── estimated_templates/            # Auto-discovered cursor templates
└── our_results_1/
    ├── our_results.txt             # Tracking results (CSV)
    ├── visualizations/             # Annotated frames
    └── tracked_video_our_results_1.mp4  # Annotated video

Visualization

Automatic (YouTube Pipeline)

Visualizations are generated automatically when using cursor_tracker.py.

Manual (Standalone)

python visualize_results.py \
  --video_name my_video \
  --base_dir ./data \
  --bbox_color "0,255,0" \  # Green (BGR format)
  --bbox_thickness 2 \
  --fps 30 \
  --quality 9

Configuration

Edit config/config.yaml to customize:

template_matching:
  score_threshold: 0.5          # Min template match score
  use_laplacian: true           # Edge detection
  template_vicinity: 300        # Temporal window for templates
  max_scale: 2                  # Max template scale factor
  nms_overlap_threshold: 0.3    # IoU threshold for NMS

tracking:
  enabled: true                 # Enable path optimization
  dist_threshold: 150           # Max pixel distance between frames
  scale_threshold: 1.3          # Max scale change ratio

Performance

Tested on 8 Adobe Photoshop instructional videos (3595 frames):

Method VIOU Success Rate
CursorTracker (Ours) 0.365 ~87%
Faster-RCNN 0.05 ~25%
Online Trackers (TLD/MIL) 0.03 ~15%
  • Speed: ~0.5 seconds/frame (1280×720)
  • Robustness: Handles 200+ pixel movements and instant appearance changes

Key Scripts

Script Purpose
cursor_tracker.py Main pipeline: YouTube → Track → Visualize
preprocess_video.py Extract frames + background masks from local video
extract_templates.py Discover cursor templates from preprocessed data
cursor_tracker_dp.py Run cursor tracking with DP path optimization
visualize_results.py Generate annotated frames and video

Dependencies

Core:

  • Python >=3.10,<3.13
  • OpenCV, NumPy, scikit-image
  • PyYAML, tqdm, imageio
  • youtube-downloader (git dependency)

Optional (install with poetry install --with ml):

  • TensorFlow, Keras (for CNN filtering)

Citation

@inproceedings{cursortracker2020,
  title={Mouse Cursor Detection and Tracking in Instructional Videos},
  booktitle={IEEE Winter Conference on Applications of Computer Vision (WACV)},
  year={2020}
}

Troubleshooting

Few templates discovered?

  • Check background subtraction quality in background/ folder
  • Adjust --consecutive_frames parameter in template extraction

Poor tracking results?

  • Tune dist_threshold and scale_threshold in config
  • Try adjusting score_threshold (lower = more proposals)

Out of memory?

  • Reduce template_vicinity parameter
  • Process in segments with --start-frame / --end-frame

Algorithm Overview

Phase 1: Unsupervised Template Discovery

  • Apply MOG background subtraction
  • Detect blobs (moving objects) in difference images
  • Track sequences where exactly 1 blob appears for N consecutive frames
  • Extract and save cursor templates from these sequences

Phase 2: Multi-Scale Template Matching

  • Select templates from temporal vicinity of current frame
  • Generate multi-scale template versions
  • Perform normalized cross-correlation matching
  • Apply non-maximum suppression to proposals

Phase 3: Optimal Path Search

  • Model as graph optimization problem
  • Find highest-scoring spatiotemporal path through video
  • Enforce distance and scale constraints between consecutive frames
  • Output optimal cursor trajectory

Key Insight: Cursors in screencasts exhibit unique motion signatures (movement while background stays static), enabling unsupervised discovery without labeled training data.

License

MIT