Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 13 additions & 12 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -136,23 +136,24 @@ <h3>Probing V-JEPA 2: What Does a Video Model Actually See?</h3>
</article>

<article class="project-card">
<div class="teaser" role="img" aria-label="Two hands playing rock-paper-scissors, but one holds a banana instead of a valid sign, illustrating anomaly detection.">
<div class="teaser" role="img" aria-label="Visual product search for marketplace items using image embeddings, segmentation, and on-device retrieval.">
<img src="assets/group_B.png" alt="" style="position:absolute; inset:0; width:100%; height:100%; object-fit:cover; z-index:2;">
<span class="teaser-label" style="z-index:3;">Group B</span>
</div>
<div class="project-content">
<p class="project-meta">Vision-language embeddings, segmentation, on-device retrieval</p>
<h3>One Photo, Many Aisles: Visual Search Across a Heterogeneous Marketplace</h3>
<p class="project-meta">Visual search, marketplace retrieval, segmentation, Android demo</p>
<h3>Visual Search for Marketplace</h3>
<p class="project-abstract">
A single foundation model rarely knows a sofa, a smartphone and a floral dress equally well — yet a real
marketplace catalog mixes all of them on the same shelf. This project studies where general-purpose visual
encoders like CLIP and SigLIP 2 stop being enough for e-commerce retrieval, and what has to be rebuilt
around them when the catalog is not one domain but twenty. Each query is first stripped of its context —
mannequins, human models, living-room scenes, studio gradients — then routed to a category-specific
expert whose embeddings are reranked with the fine-grained color and texture cues that generic models
quietly discard. The whole pipeline is then compressed into an on-device Android demo, raising a second
question the paper versions of Google Lens rarely address in the open: how much of a foundation-model
retrieval system actually survives when it has to run on a phone?
This project explores visual product search for marketplace applications, where users can search for visually similar
items using a single photo. I build a retrieval pipeline based on vision-language embeddings, object segmentation,
category-aware routing, and color-aware reranking to improve search quality for fashion and marketplace-style product
images. The final system is demonstrated in an Android app that performs on-device image-based retrieval over a local
product catalog.
</p>
<p>
<a href="https://github.com/siiena25/ImageSearch" target="_blank" rel="noopener noreferrer">
GitHub / Code
</a>
</p>
<label class="project-toggle-label">
<input class="project-toggle" type="checkbox" aria-label="Toggle full project pitch">
Expand Down
Loading