Skip to content

siiena25/ImageSearch

Repository files navigation

Visual Product Search — On-Device Image Search for Marketplaces

Master's project @ USI, 2026 · 🌐 Project Website · 📄 Poster

Foundation models like CLIP and SigLIP 2 match a photo of a dress to the word "dress" almost perfectly — but ask them to distinguish a floral print from a polka-dot one of the same shape, and they quietly fail.
This project dissects where semantic embeddings stop being enough for e-commerce visual search and rebuilds the pipeline around that gap. A lightweight category router dispatches queries to domain-specific experts, and the whole stack is compressed into an on-device Android demo.


Demo

demo.mp4

(If the video doesn't auto-play above, see demo.mp4 in the repo root.)

Result screenshots

Query → Top results Query → Top results

Pipeline overview

User photo
    │
    ▼
① U2Net-P segmentation        — remove background / model / studio backdrop
    │  RGBA crop of the product
    ▼
② Category router             — zero-shot: cosine(crop embedding, category text prompts)
    │  top-1 / top-2 category labels
    ▼
③ Semantic embedding          — FashionSigLIP (fashion) or SigLIP 2 (other)
    │  768-dim L2-normalised vector
    ▼
④ Cosine retrieval            — dot-product over catalog filtered by category → top-50
    │
    ▼
⑤ Texture + colour rerank     — DINOv2 patch features · HSV colour signature · colour gate
    │
    ▼
Top-10 results shown in the Android app

Repository structure

ImageSearch/
├── category_router.py           # zero-shot category router
├── color_v10.py                 # HSV colour signature (mirrored to Kotlin)
├── retrieve_topk_multicat.py    # CLI retrieval pipeline
├── bench_onnx_desktop.py        # recall@10 benchmark
├── requirements.txt
├── data/                        # catalogs, embeddings, demo query images
├── models/                      # ONNX models fp32+int8 (generated by scripts)
├── android_bundle/              # assets for the Android app (generated by scripts)
├── screens/test5/               # final result screenshots
├── docs/                        # GitHub Pages project website
└── scripts/                     # preprocessing & build scripts (run in order below)

Installation

# 1. Clone
git clone https://github.com/<your-username>/ImageSearch.git
cd ImageSearch

# 2. Create virtual environment (Python 3.12 recommended)
python3 -m venv .venv
source .venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

Requirements: Python 3.10–3.12, ~8 GB RAM (16 GB recommended for export step).
GPU optional but significantly speeds up the SAM 2 segmentation step.


Running the pipeline (step by step)

Run every script from the project root (ImageSearch/).

Step 1 — Download & build the raw catalog

python3 scripts/build_catalog_from_asos.py   # dresses from ASOS CSV
python3 scripts/build_multicat_catalog.py    # t-shirts, jeans, watches from HF datasets
python3 scripts/refetch_multicat_hires.py    # re-download hi-res images

Step 2 — Segment catalog images (background / model removal)

python3 scripts/segment_multicat.py          # Grounded-SAM 2; GPU recommended

Output: data/images_sam2_multicat/{category}/*.jpg

Step 3 — Build embeddings & texture features

python3 scripts/build_embeddings_multicat.py  # FashionSigLIP / SigLIP 2 → data/
python3 scripts/build_patch_features_dino.py  # DINOv2 texture vectors   → data/

Step 4 — Export ONNX models

python3 scripts/export_models.py             # fp32 + int8 → models/
# Re-quantize only (if fp32 already exists):
python3 scripts/requantize_int8_for_android.py

Step 5 — Build category visual prototypes

python3 scripts/build_category_visual_prototypes.py

Step 6 — Export Android asset bundle

python3 scripts/export_catalog_for_android.py  # → android_bundle/

Step 7 — Quick desktop test (optional)

python3 retrieve_topk_multicat.py --query data/multicat_demo_dress.jpg
python3 retrieve_topk_multicat.py --query data/multicat_demo_jeans.jpg --save-grid data/result.jpg

Step 8 — Recall@10 benchmark (optional)

python3 bench_onnx_desktop.py
# Expected: mean recall@10 ≥ 0.90

Android app

  1. Copy android_bundle/ImageSearchApp/app/src/main/assets/bundle/
  2. Open ImageSearchApp/ in Android Studio (Meerkat or newer).
  3. Build & run on Pixel 10 (API 36 / Android 16) or compatible emulator.

All ML inference runs fully on-device — no internet required after first launch.


Models used

Model Role Format
U2Net-P Background / model removal ONNX int8
FashionCLIP / SigLIP Fashion semantic embeddings ONNX int8
SigLIP 2 General embeddings + category router ONNX int8
DINOv2-S Texture / patch features for reranking ONNX int8

Key results

Metric Value
Recall@10 (4-category benchmark) 0.93
Per-model inference latency (Pixel 10) ~130–180 ms
U2Net-P segmentation latency ~2 500 ms
Total pipeline latency (end-to-end) < 4 s

Scaling to production

This demo runs on-device with ~700 catalog items across 6 categories.
For a real marketplace (millions of items, all categories):

  • Replace on-device ONNX runtime with a backend service (FastAPI / Triton Inference Server)
  • Use a vector database (Faiss, Qdrant, Weaviate) for sub-millisecond ANN search
  • Add HNSW / IVF indexing
  • Expand category taxonomy; fine-tune embeddings per vertical if needed

About

End-to-end on-device visual product search for marketplace. Combines SAM 2 segmentation · FashionSigLIP / SigLIP 2 embeddings · DINOv2 texture · HSV colour reranking · category router

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors