Skip to content

Closes the last two SerpAPI quality issues seen in prod after PR #100 merged#101

Merged
Bogdan-ca merged 3 commits into
mainfrom
feat/dynamic-shopping
May 26, 2026
Merged

Closes the last two SerpAPI quality issues seen in prod after PR #100 merged#101
Bogdan-ca merged 3 commits into
mainfrom
feat/dynamic-shopping

Conversation

@Bogdan-ca

Copy link
Copy Markdown
Collaborator
  1. Wrong-product images — SerpAPI sometimes returns a thumbnail from a neighboring shopping
    row (e.g. a Zara card carrying an image.hm.com URL). Now rejected by a per-brand image-CDN
    whitelist (zara.net for Zara, static.nike.com for Nike, etc.).

  2. Empty / $0 results from Google Shopping redirect responses — SerpAPI's google_shopping
    engine returns two response shapes. The previous strict-host filter rejected every row in shape B
    (where product_link is a google.com redirect and the thumbnail is proxied via
    gstatic.com). Now:

    • When product_link host is google.com, trust the source field as the retailer label
      (whitelist still applies — non-Google URLs with spoofed source claims still rejected).
    • Accept gstatic.com im

Bogdan-ca added 2 commits May 26, 2026 13:01
SerpAPI's google_shopping engine occasionally cross-contaminates
thumbnails between adjacent results (observed in prod: a Zara product
card carrying an image.hm.com URL, a Uniqlo card carrying a static.zara.net
URL). Cards rendered correctly but looked broken because the image
clearly didn't represent the product.

Added a per-brand image-host whitelist (e.g. Zara -> zara.net/zara.com)
and reject rows whose image is hosted somewhere else. Marketplaces
(Vinted, Depop, Grailed, eBay) are exempt because sellers upload
arbitrary imagery. This also tends to drop category-page rows since
those usually come back with mismatched or placeholder images.

Existing tests updated to use brand-aligned CDN hosts; new cases cover
the cross-contamination scenarios and the marketplace exemption.
SerpAPI's google_shopping engine returns two response shapes depending
on the query: (a) direct retailer URLs with retailer-hosted thumbnails,
or (b) Google Shopping redirect URLs (host=google.com) with thumbnails
proxied through encrypted-tbn*.gstatic.com. The previous strict-host
filters rejected every shape-(b) row, leaving the recommender empty
for many real queries.

Now:
- When the product_link host is google.com (and only then), trust the
  `source` field as the retailer label. This stays safe — a non-Google
  URL with a spoofed source claim is still rejected.
- Accept gstatic.com image hosts unconditionally. Google manages the
  proxy and assigns the image, so we treat it as trusted-by-source.

Verified live against SerpAPI with 5 realistic queries — 15 accepted
recommendations, all with brand/name/price/image consistent.

Tests cover the redirect path, the spoofed-source attack, and the
gstatic image exemption.
Comment thread ai-service/recommend_purchases.py Fixed
…ring sanitization'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@Bogdan-ca Bogdan-ca merged commit 64ab77b into main May 26, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants