-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Prerequisites
The current implementation uses a sliding window approach for inference, which is relatively slow. You should have a functioning CNN binary classifier and a crop_to_bbox utility already implemented.
Motivation
To improve inference speed, we need to transition from a computationally expensive sliding window to a Selective Search region proposal method. Region proposal is just a CV heuristic that groups pixels with similar color and texture to generate a list of rectangles that COULD contain a valid object. This effectively turns our object localization detector into a two-pass model.
Overview
Refactor the existing localization logic to utilize Selective Search for region proposals, followed by CNN classification for detection.
Implementation Checklist
-
Create Region Proposal Class: Implement a separate class for the region proposal portion using OpenCV’s Selective Search. Link below on what that might look like.
- Note: Since the algorithm outputs a list of rectangle coordinates and the CNN requires fixed-size images, ensure the output regions are cropped and resized correctly; this logic should exist already in the
crop_to_bboxmethod in the PascalVOCDataset - I recommend extracting the cropping and resizing logic from
crop_to_bboxinto a standalone utility function, just for better separation of responsibility.
- Note: Since the algorithm outputs a list of rectangle coordinates and the CNN requires fixed-size images, ensure the output regions are cropped and resized correctly; this logic should exist already in the
-
Encapsulate Object Localizer: Create a top-level class for the object localizer that inherits from
torch.nn.Module. This class should encapsulate both the selective search (for region proposals) and the binary classifier (for detection). create a new file for this.
Reference
For implementation details on Selective Search in C++, see: LearnOpenCV - Selective Search for Object Detection