Skip to content

Align search URLs with upstream sites#15

Open
hqhq1025 wants to merge 1 commit into
aiming-lab:mainfrom
hqhq1025:codex/fix-search-url-realism
Open

Align search URLs with upstream sites#15
hqhq1025 wants to merge 1 commit into
aiming-lab:mainfrom
hqhq1025:codex/fix-search-url-realism

Conversation

@hqhq1025
Copy link
Copy Markdown

@hqhq1025 hqhq1025 commented May 14, 2026

Summary

Makes known search URL shapes realistic while keeping old /search?q=... routes as compatibility aliases.

Canonical search URLs added/emitted:

  • Amazon: /s?k=<query> with /search?q=<query> kept as legacy alias.
  • Booking: /searchresults.html?ss=<query> with /search?q=<query> kept as legacy alias.
  • Google Maps: /maps/search/<query> with /search?q=<query> kept as legacy alias.
  • ESPN: /search/_/q/<query> with /search?q=<query> kept as legacy alias.
  • Apple: /search/<query> with /search?q=<query> kept as legacy alias.
  • Coursera: /search?query=<query> with /search?q=<query> kept as legacy alias.
  • Hugging Face: /search/full-text?q=<query> with /search?q=<query> kept as legacy alias.
  • Cambridge Dictionary: /search/direct/?datasetsearch=english&q=<query> with /search?q=<query> kept as legacy alias.
  • Cambridge Thesaurus: /search/english-thesaurus/direct/?datasetsearch=english-thesaurus&q=<query> with /thesaurus?q=<query> kept as legacy alias.

Audited and documented as already close to upstream shape or intentionally different:

  • Google Search: /search?q=<query> plus vertical params such as tbm=....
  • GitHub: /search?q=<query>&type=....
  • BBC: /search?q=<query>.
  • arXiv: /search/?query=<query>&searchtype=....
  • WolframAlpha: /input?i=<query> for computation and /search?q=... for topic search.
  • Google Flights: primary flight searches already use /flights?...; generic /search?q=... is a local airport/city/airline helper.

For path-based search URLs, small submit handlers rewrite the destination before navigation so the browser location becomes the upstream-shaped URL. Existing no-JS/query-string aliases still resolve through the same search view.

Also adds:

  • docs/search-url-realism.md
  • scripts/check_search_url_realism.py

Verification

  • python3 scripts/check_search_url_realism.py
  • python3 -m py_compile sites/amazon/app.py sites/booking/app.py sites/google_map/app.py sites/espn/app.py sites/apple/app.py sites/coursera/app.py sites/huggingface/app.py sites/cambridge_dictionary/app.py scripts/check_search_url_realism.py
  • Flask test-client route checks for canonical and legacy URLs:
    • Amazon /s?k=xbox and /search?q=xbox
    • Booking /searchresults.html?ss=Paris and /search?q=Paris
    • Google Maps /maps/search/central%20park and /search?q=central%20park
    • ESPN /search/_/q/lakers and /search?q=lakers
    • Apple /search/iphone and /search?q=iphone
    • Coursera /search?query=python and /search?q=python
    • Cambridge /search/direct/?datasetsearch=english&q=hello and /search?q=hello
  • Hugging Face full route was covered by static regression and py_compile; full render smoke requires HF-managed image assets that are not present in this isolated worktree.

Fixes #14.

@hqhq1025 hqhq1025 force-pushed the codex/fix-search-url-realism branch from 3cc9d1d to 6ffc69d Compare May 14, 2026 07:15
@hqhq1025 hqhq1025 force-pushed the codex/fix-search-url-realism branch from 6ffc69d to 0dc28ee Compare May 14, 2026 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Search URLs should match upstream site shapes

1 participant