Master's project @ USI, 2026 · 🌐 Project Website · 📄 Poster
Foundation models like CLIP and SigLIP 2 match a photo of a dress to the word "dress" almost perfectly — but ask them to distinguish a floral print from a polka-dot one of the same shape, and they quietly fail.
This project dissects where semantic embeddings stop being enough for e-commerce visual search and rebuilds the pipeline around that gap. A lightweight category router dispatches queries to domain-specific experts, and the whole stack is compressed into an on-device Android demo.
demo.mp4
(If the video doesn't auto-play above, see demo.mp4 in the repo root.)
| Query → Top results | Query → Top results |
|---|---|
![]() |
![]() |
![]() |
![]() |
User photo
│
▼
① U2Net-P segmentation — remove background / model / studio backdrop
│ RGBA crop of the product
▼
② Category router — zero-shot: cosine(crop embedding, category text prompts)
│ top-1 / top-2 category labels
▼
③ Semantic embedding — FashionSigLIP (fashion) or SigLIP 2 (other)
│ 768-dim L2-normalised vector
▼
④ Cosine retrieval — dot-product over catalog filtered by category → top-50
│
▼
⑤ Texture + colour rerank — DINOv2 patch features · HSV colour signature · colour gate
│
▼
Top-10 results shown in the Android app
ImageSearch/
├── category_router.py # zero-shot category router
├── color_v10.py # HSV colour signature (mirrored to Kotlin)
├── retrieve_topk_multicat.py # CLI retrieval pipeline
├── bench_onnx_desktop.py # recall@10 benchmark
├── requirements.txt
├── data/ # catalogs, embeddings, demo query images
├── models/ # ONNX models fp32+int8 (generated by scripts)
├── android_bundle/ # assets for the Android app (generated by scripts)
├── screens/test5/ # final result screenshots
├── docs/ # GitHub Pages project website
└── scripts/ # preprocessing & build scripts (run in order below)
# 1. Clone
git clone https://github.com/<your-username>/ImageSearch.git
cd ImageSearch
# 2. Create virtual environment (Python 3.12 recommended)
python3 -m venv .venv
source .venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txtRequirements: Python 3.10–3.12, ~8 GB RAM (16 GB recommended for export step).
GPU optional but significantly speeds up the SAM 2 segmentation step.
Run every script from the project root (ImageSearch/).
python3 scripts/build_catalog_from_asos.py # dresses from ASOS CSV
python3 scripts/build_multicat_catalog.py # t-shirts, jeans, watches from HF datasets
python3 scripts/refetch_multicat_hires.py # re-download hi-res imagespython3 scripts/segment_multicat.py # Grounded-SAM 2; GPU recommendedOutput: data/images_sam2_multicat/{category}/*.jpg
python3 scripts/build_embeddings_multicat.py # FashionSigLIP / SigLIP 2 → data/
python3 scripts/build_patch_features_dino.py # DINOv2 texture vectors → data/python3 scripts/export_models.py # fp32 + int8 → models/
# Re-quantize only (if fp32 already exists):
python3 scripts/requantize_int8_for_android.pypython3 scripts/build_category_visual_prototypes.pypython3 scripts/export_catalog_for_android.py # → android_bundle/python3 retrieve_topk_multicat.py --query data/multicat_demo_dress.jpg
python3 retrieve_topk_multicat.py --query data/multicat_demo_jeans.jpg --save-grid data/result.jpgpython3 bench_onnx_desktop.py
# Expected: mean recall@10 ≥ 0.90- Copy
android_bundle/→ImageSearchApp/app/src/main/assets/bundle/ - Open
ImageSearchApp/in Android Studio (Meerkat or newer). - Build & run on Pixel 10 (API 36 / Android 16) or compatible emulator.
All ML inference runs fully on-device — no internet required after first launch.
| Model | Role | Format |
|---|---|---|
| U2Net-P | Background / model removal | ONNX int8 |
| FashionCLIP / SigLIP | Fashion semantic embeddings | ONNX int8 |
| SigLIP 2 | General embeddings + category router | ONNX int8 |
| DINOv2-S | Texture / patch features for reranking | ONNX int8 |
| Metric | Value |
|---|---|
| Recall@10 (4-category benchmark) | 0.93 |
| Per-model inference latency (Pixel 10) | ~130–180 ms |
| U2Net-P segmentation latency | ~2 500 ms |
| Total pipeline latency (end-to-end) | < 4 s |
This demo runs on-device with ~700 catalog items across 6 categories.
For a real marketplace (millions of items, all categories):
- Replace on-device ONNX runtime with a backend service (FastAPI / Triton Inference Server)
- Use a vector database (Faiss, Qdrant, Weaviate) for sub-millisecond ANN search
- Add HNSW / IVF indexing
- Expand category taxonomy; fine-tune embeddings per vertical if needed



