Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions docs/search-url-realism.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Search URL Realism

## Policy

Search forms should emit the URL shape used by the real upstream site whenever
that shape is known. Legacy `/search?q=...` routes remain as compatibility
aliases for existing benchmark tasks, hand-written trajectories, and old links.

This keeps the user-visible behavior realistic without breaking existing local
WebHarbor consumers.

## Canonical Search URLs

| Site | Canonical URL | Legacy alias |
| --- | --- | --- |
| Amazon | `/s?k=<query>` | `/search?q=<query>` |
| Booking | `/searchresults.html?ss=<query>` | `/search?q=<query>` |
| Google Maps | `/maps/search/<query>` | `/search?q=<query>` |
| ESPN | `/search/_/q/<query>` | `/search?q=<query>` |
| Apple | `/search/<query>` | `/search?q=<query>` |
| Coursera | `/search?query=<query>` | `/search?q=<query>` |
| Hugging Face | `/search/full-text?q=<query>` | `/search?q=<query>` |
| Cambridge Dictionary | `/search/direct/?datasetsearch=english&q=<query>` | `/search?q=<query>` |
| Cambridge Thesaurus | `/search/english-thesaurus/direct/?datasetsearch=english-thesaurus&q=<query>` | `/thesaurus?q=<query>` |

Some sites already matched their upstream search shape closely and are
documented rather than changed here:

- Google Search: `/search?q=<query>` plus vertical parameters such as `tbm=...`.
- GitHub: `/search?q=<query>&type=...`.
- BBC: `/search?q=<query>`.
- arXiv: `/search/?query=<query>&searchtype=...`.
- WolframAlpha: `/input?i=<query>` for computation and `/search?q=...` for
topic search.
- Google Flights: primary flight searches already use `/flights?...`; the
generic `/search?q=...` page is a local airport/city/airline helper.

HTML forms can only submit query-string values, so path-based canonical search
URLs use a small submit handler that rewrites the destination before navigation.
If JavaScript is unavailable, the route still accepts the form-submitted query
string at the same canonical prefix where possible, and the old alias remains
available.

## Regression Check

Run:

```bash
python3 scripts/check_search_url_realism.py
```

The check verifies that the UI emits canonical search URLs and that the legacy
aliases remain wired to the same route handlers.
152 changes: 152 additions & 0 deletions scripts/check_search_url_realism.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
#!/usr/bin/env python3
"""Regression checks for realistic search URL shapes."""

from pathlib import Path


ROOT = Path(__file__).resolve().parents[1]

REQUIRED = [
(
"Amazon supports canonical /s?k= search",
"sites/amazon/app.py",
["@app.route('/s')", "request.args.get('k')"],
),
(
"Amazon search UI emits k= on /s",
"sites/amazon/templates/base.html",
['action="/s"', 'name="k"'],
),
(
"Booking supports canonical /searchresults.html?ss=",
"sites/booking/app.py",
["@app.route('/searchresults.html')", "request.args.get('ss')"],
),
(
"Booking search UI emits ss=",
"sites/booking/templates/index.html",
['action="/searchresults.html"', 'name="ss"'],
),
(
"Google Maps supports canonical /maps/search/<query>",
"sites/google_map/app.py",
['@app.route("/maps/search/")', '@app.route("/maps/search/<path:maps_query>")'],
),
(
"Google Maps search UI emits path search",
"sites/google_map/templates/_search_bar.html",
['action="/maps/search/"', 'data-path-search="maps"'],
),
(
"Google Maps path-search submit handler exists",
"sites/google_map/static/js/main.js",
['form[data-path-search="maps"]', "'/maps/search/' + encodeURIComponent(query)"],
),
(
"ESPN supports canonical /search/_/q/<query>",
"sites/espn/app.py",
["@app.route('/search/_/q/<path:espn_query>')"],
),
(
"ESPN search UI emits path search",
"sites/espn/templates/search.html",
['action="/search/_/q/"', 'data-path-search="espn"'],
),
(
"ESPN path-search submit handler exists",
"sites/espn/static/js/main.js",
['form[data-path-search="espn"]', "'/search/_/q/' + encodeURIComponent(query)"],
),
(
"Apple supports canonical /search/<query>",
"sites/apple/app.py",
["@app.route('/search/<path:apple_query>')"],
),
(
"Apple search UI emits path search",
"sites/apple/templates/search.html",
['action="/search/"', 'data-path-search="apple"'],
),
(
"Apple path-search submit handler exists",
"sites/apple/static/js/main.js",
['form[data-path-search="apple"]', "'/search/' + encodeURIComponent(query)"],
),
(
"Coursera supports canonical query= search",
"sites/coursera/app.py",
["request.args.get('query')"],
),
(
"Coursera search UI emits query=",
"sites/coursera/templates/base.html",
['action="/search"', 'name="query"'],
),
(
"Hugging Face supports canonical full-text search",
"sites/huggingface/app.py",
['@app.route("/search/full-text")'],
),
(
"Hugging Face search UI emits full-text path",
"sites/huggingface/templates/base.html",
['action="/search/full-text"', 'name="q"'],
),
(
"Cambridge Dictionary supports direct search path",
"sites/cambridge_dictionary/app.py",
["@app.route('/search/direct/')", "@app.route('/search/english/direct/')"],
),
(
"Cambridge Dictionary search UI emits direct path",
"sites/cambridge_dictionary/templates/base.html",
[
'action="{{ \'/search/english-thesaurus/direct/\' if _is_thes else \'/search/direct/\' }}"',
'name="datasetsearch"',
"'english-thesaurus' if _is_thes else 'english'",
],
),
]

FORBIDDEN = [
(
"Amazon nav should not emit legacy q search",
"sites/amazon/templates/base.html",
['action="{{ url_for(\'search\') }}"', 'name="q" placeholder="Search Amazon"'],
),
(
"Booking homepage should not emit legacy q search",
"sites/booking/templates/index.html",
['action="{{ url_for(\'search\') }}"', 'name="q" placeholder="Where are you going?"'],
),
(
"Coursera nav should not emit legacy q search",
"sites/coursera/templates/base.html",
['name="q" placeholder="What do you want to learn?"'],
),
]


def main():
failed = False
for label, rel, needles in REQUIRED:
text = (ROOT / rel).read_text()
for needle in needles:
if needle not in text:
print(f"{label}: missing {needle!r} in {rel}")
failed = True

for label, rel, needles in FORBIDDEN:
text = (ROOT / rel).read_text()
for needle in needles:
if needle in text:
print(f"{label}: forbidden {needle!r} in {rel}")
failed = True

if failed:
raise SystemExit(1)
print("Search URL realism checks passed")


if __name__ == "__main__":
main()
3 changes: 2 additions & 1 deletion sites/amazon/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -646,8 +646,9 @@ def _match(t):


@app.route('/search')
@app.route('/s')
def search():
q = request.args.get('q', '').strip()
q = (request.args.get('q') or request.args.get('k') or '').strip()
query_obj = Product.query
query_obj = _apply_filters(query_obj) # apply structural filters first
candidates = query_obj.all()
Expand Down
4 changes: 2 additions & 2 deletions sites/amazon/templates/base.html
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
</div>
</a>

<form class="nav-search nav-search-form" action="{{ url_for('search') }}" method="get">
<form class="nav-search nav-search-form" action="/s" method="get">
<select class="nav-search-dropdown" name="dept" aria-label="Department">
<option>All</option>
<option>Electronics</option>
Expand All @@ -33,7 +33,7 @@
<option>Fashion</option>
<option>Beauty</option>
</select>
<input type="text" class="nav-search-input" name="q" placeholder="Search Amazon" value="{{ request.args.get('q', '') }}" autocomplete="off">
<input type="text" class="nav-search-input" name="k" placeholder="Search Amazon" value="{{ request.args.get('k') or request.args.get('q', '') }}" autocomplete="off">
<button type="submit" class="nav-search-btn" aria-label="Search">🔍</button>
</form>

Expand Down
8 changes: 4 additions & 4 deletions sites/amazon/templates/search.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
{% set qs = request.args %}
<div class="results-layout">
<aside class="results-sidebar">
<form method="GET" action="{{ url_for('search') }}" id="filter-form">
<input type="hidden" name="q" value="{{ query }}">
<form method="GET" action="/s" id="filter-form">
<input type="hidden" name="k" value="{{ query }}">
{# Preserve the currently-selected sort across filter-form submits #}
<input type="hidden" name="sort" value="{{ current_sort or '' }}">

Expand All @@ -32,7 +32,7 @@ <h3 style="margin-top:20px">Customer Reviews</h3>
</label>
</li>
{% endfor %}
<li><a href="{{ url_for('search', q=query) }}">Clear rating filter</a></li>
<li><a href="{{ url_for('search', k=query) }}">Clear rating filter</a></li>
</ul>

<h3 style="margin-top:20px">Price</h3>
Expand Down Expand Up @@ -122,7 +122,7 @@ <h1>{{ results|length }} results</h1>
</a>
<div>
<h2 class="result-card-title"><a href="{{ url_for('product_detail', slug=p.slug) }}">{{ p.name }}</a></h2>
<div class="result-card-brand">by <a href="{{ url_for('search', q=p.brand) }}">{{ p.brand }}</a></div>
<div class="result-card-brand">by <a href="{{ url_for('search', k=p.brand) }}">{{ p.brand }}</a></div>
<div class="result-card-rating rating">
<span class="stars">{% for i in range(p.rating|int) %}★{% endfor %}{% for i in range(5 - p.rating|int) %}☆{% endfor %}</span>
<span>{{ p.rating }} ({{ '{:,}'.format(p.review_count) }})</span>
Expand Down
5 changes: 3 additions & 2 deletions sites/apple/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -1259,8 +1259,9 @@ def _apply_sort(results, key):


@app.route('/search')
def search():
q = request.args.get('q', '').strip()
@app.route('/search/<path:apple_query>')
def search(apple_query=''):
q = (apple_query or request.args.get('q', '')).strip()
query_obj = Product.query
query_obj = _apply_product_filters(query_obj)
candidates = query_obj.all()
Expand Down
16 changes: 16 additions & 0 deletions sites/apple/static/js/main.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,25 @@
document.addEventListener('DOMContentLoaded', () => {
initNavbar();
initFlashMessages();
initPathSearchForms();
initScrollAnimations();
});

function initPathSearchForms() {
document.querySelectorAll('form[data-path-search="apple"]').forEach(form => {
form.addEventListener('submit', event => {
const input = form.querySelector('input[name="q"]');
const query = input ? input.value.trim() : '';
if (!query) return;
event.preventDefault();
const params = new URLSearchParams(new FormData(form));
params.delete('q');
const suffix = params.toString();
window.location.href = '/search/' + encodeURIComponent(query) + (suffix ? '?' + suffix : '');
});
});
}

/* --- Navbar scroll effect --- */
function initNavbar() {
const nav = document.getElementById('globalnav');
Expand Down
2 changes: 1 addition & 1 deletion sites/apple/templates/search.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
{% block content %}
<div class="search-page">
<div class="search-input-wrapper">
<form action="{{ url_for('search') }}" method="GET">
<form action="/search/" method="GET" data-path-search="apple">
<input type="text" name="q" class="search-input" value="{{ query }}"
placeholder="Search apple.com" autofocus>
</form>
Expand Down
3 changes: 2 additions & 1 deletion sites/booking/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -940,8 +940,9 @@ def _is_beach_relevant_city(city):


@app.route('/search')
@app.route('/searchresults.html')
def search():
q = (request.args.get('q') or '').strip()
q = (request.args.get('q') or request.args.get('ss') or '').strip()
dest = (request.args.get('dest') or request.args.get('destination') or '').strip()
near = (request.args.get('near') or '').strip()
city_id = request.args.get('city_id', type=int)
Expand Down
4 changes: 2 additions & 2 deletions sites/booking/templates/airport_taxis.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
<h1>Airport taxis — easy door-to-door transfers</h1>
<p>Pre-book your ride to your destination for a stress-free start.</p>
</div>
<form class="search-bar" action="{{ url_for('search') }}" method="get">
<form class="search-bar" action="/searchresults.html" method="get">
<div class="search-field">
<input type="text" name="q" placeholder="Airport">
<input type="text" name="ss" placeholder="Airport">
</div>
<div class="search-field">
<input type="text" placeholder="Hotel or address">
Expand Down
4 changes: 2 additions & 2 deletions sites/booking/templates/attractions.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
<h1>Attractions, tours & experiences</h1>
<p>Discover incredible things to do around the world — skip the lines.</p>
</div>
<form class="search-bar" action="{{ url_for('search') }}" method="get">
<form class="search-bar" action="/searchresults.html" method="get">
<div class="search-field">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M21 10c0 7-9 13-9 13s-9-6-9-13a9 9 0 0118 0z"/></svg>
<input type="text" name="q" placeholder="Where do you want to go?">
<input type="text" name="ss" placeholder="Where do you want to go?">
</div>
<div class="search-field">
<input type="text" maxlength="10" inputmode="numeric" pattern="\d{4}-\d{2}-\d{2}" placeholder="YYYY-MM-DD" autocomplete="off" data-datepicker="single">
Expand Down
4 changes: 2 additions & 2 deletions sites/booking/templates/car_rentals.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
<h1>Car rentals for any kind of trip</h1>
<p>Great deals at over 60,000 locations worldwide.</p>
</div>
<form class="search-bar" action="{{ url_for('search') }}" method="get">
<form class="search-bar" action="/searchresults.html" method="get">
<div class="search-field">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><circle cx="12" cy="12" r="10"/></svg>
<input type="text" name="q" placeholder="Pick-up location">
<input type="text" name="ss" placeholder="Pick-up location">
</div>
<div class="search-field">
<input type="text" maxlength="10" inputmode="numeric" pattern="\d{4}-\d{2}-\d{2}" placeholder="Pick-up (YYYY-MM-DD)" autocomplete="off" data-datepicker="checkin">
Expand Down
4 changes: 2 additions & 2 deletions sites/booking/templates/flights.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
<h1>Book cheap flights</h1>
<p>Find low fares to top destinations on our official site.</p>
</div>
<form class="search-bar" action="{{ url_for('search') }}" method="get">
<form class="search-bar" action="/searchresults.html" method="get">
<div class="search-field">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M2 12l20-8-8 20-2-9-10-3z"/></svg>
<input type="text" name="q" placeholder="From">
<input type="text" name="ss" placeholder="From">
</div>
<div class="search-field">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M2 12l20-8-8 20-2-9-10-3z" transform="rotate(90 12 12)"/></svg>
Expand Down
4 changes: 2 additions & 2 deletions sites/booking/templates/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
<h1>Find your next stay</h1>
<p>Search low prices on hotels, homes and much more...</p>
</div>
<form class="search-bar" action="{{ url_for('search') }}" method="get">
<form class="search-bar" action="/searchresults.html" method="get">
<div class="search-field">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M21 10c0 7-9 13-9 13s-9-6-9-13a9 9 0 0118 0z"/><circle cx="12" cy="10" r="3"/></svg>
<input type="text" name="q" placeholder="Where are you going?" autocomplete="off" required minlength="2">
<input type="text" name="ss" placeholder="Where are you going?" autocomplete="off" required minlength="2">
</div>
<div class="search-field">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><rect x="3" y="4" width="18" height="18" rx="2" ry="2"/><line x1="16" y1="2" x2="16" y2="6"/><line x1="8" y1="2" x2="8" y2="6"/><line x1="3" y1="10" x2="21" y2="10"/></svg>
Expand Down
Loading