Add CarMax mirror (port 40015) by Violet24K · Pull Request #24 · aiming-lab/WebHarbor

Violet24K · 2026-05-15T17:12:56Z

Adds a Flask mirror of carmax.com as the 16th
WebHarbor site, with full inventory search, vehicle research, comparison,
sell-my-car appraisal, financing pre-qualification, reserve, test drive,
and checkout flows.

Companion HuggingFace PR: https://huggingface.co/datasets/ChilleD/WebHarbor/discussions/15

What's in this PR

Site code (`sites/carmax/`)

File	Lines	Purpose
`app.py`	1,997	Flask app: 13 SQLAlchemy models, 10 WTForms, 59 routes
`seed_data.py`	904	Idempotent seed (12 stores, 141 vehicles, 5 users, 20 reviews, 10 articles)
`templates/*.html`	1,519 (44 files)	base + macros + 42 page templates
`static/css/main.css`	221	CarMax navy (`#1660a8`) + yellow (`#FFD900`) brand styling
`scrape_carmax.py`	129	Reproducible httpx fetch of evox stock photos
`scrape_articles.py`	107	Reproducible fetch of article hero images
`tasks.jsonl`	20	WebVoyager benchmark tasks

Registration (3 files modified)

websyn_start.sh — added carmax to SITES, switched the three
hardcoded 15s to ${#SITES[@]} so future additions don't need
triple edits.
control_server.py — added 'carmax' to SITES list.
Dockerfile — EXPOSE 8101 40000-40015 (was 40000-40014).

Quality-of-life additions

.gitattributes — forces LF line endings on *.sh and Dockerfile
so a Windows checkout doesn't break the container entrypoint (hit
this exact issue during initial Docker testing — exec /opt/websyn_start.sh: no such file or directory).
scripts/verify_carmax.sh — single-command end-to-end verifier (build
→ run → reset → md5sum) for the new site.

Mirror functional coverage

59 routes across these areas:

Inventory — /cars, /cars/<make>, /cars/<make>/<model>, /cars/<make>/<model>/<year>, /cars/<make>/<model>/<trim>, /cars/<make>/<model>/<trim>/<year>, with filter params for body style, drive type, fuel type, mileage cap, price range, color, store, etc.
Vehicle detail — full specs, features, customer reviews, similar vehicles, financing estimate
Research — model overview + year-by-year pages with RepairPal ratings, trims, FAQs
Comparison — anonymous/authed compare tool (up to 4 vehicles)
Saved cars — heart / unheart per-user
Sell my car — appraisal form → instant offer page with 7-day validity
Pre-qualification — soft-credit form → personalized monthly payment range
Financing — landing page + CarMax Auto Finance / external lender / cash options at checkout
Stores — 12 real CarMax locations across CA/TX/FL/GA/NY/IL/MD/MA/WA/AZ/CO/NC
Reserve / Test drive — auth-gated booking flows
Checkout — full order flow with MaxCare warranty and trade-in appraisal application
Account — orders, reservations, test drives, appraisals, saved cars, edit profile, change password
Articles + FAQ — 10 articles, 4 FAQ categories

Search uses scored token-overlap with field-weighted scoring
(make/model = 5, trim/body/color = 3, features/specs = 1), explicitly
NOT strict-AND, so queries like "honda civic sport" return results even
when one token misses on a given vehicle.

Benchmark tasks

sites/carmax/tasks.jsonl ships 20 tasks following the WebVoyager
schema (web_name, id, ques, web, upstream_url):

6 Easy (2-3 steps): inventory search by year/make/model, trim-specific search, sorted filters, vehicle detail spec reading, store locator, FAQ
9 Medium (4-6 steps): research-page navigation, sell-my-car form, register + pre-qual, reserve, test drive, cheapest-vehicle + store cross-check, article read, value-page lookup, MaxCare tier comparison
5 Hard (7+ steps, multi-step reasoning): 3-way vehicle comparison, register + pre-qualify + report APR, saved-cars disambiguation, trade-in appraisal applied at checkout with custom finance terms, dan's order history audit

Hand-traced each task against the seed DB; the answer is verifiable on
every task and not visible at the search-result level for any task that
asks for spec-level info.

Verification

md5sum sites/carmax/instance/carmax.db sites/carmax/instance_seed/carmax.db
c6e3b281258bd8a460f7030a54b74c21 instance/carmax.db
c6e3b281258bd8a460f7030a54b74c21 instance_seed/carmax.db

Idempotency

Both seed_database() (line 675) and seed_benchmark_users() (line 722)
gate the whole function on populated-DB checks, not per row. Every
seeded created_at / saved_at / added_at uses a frozen
SEED_NOW = datetime(2026, 1, 15, 12, 0, 0) (18 references). Zero
calls to datetime.utcnow() anywhere in seed_data.py.

Asset side (HuggingFace dataset)

carmax.tar.gz (~280 MB) was uploaded to ChilleD/WebHarbor in
https://huggingface.co/datasets/ChilleD/WebHarbor/discussions/15. .assets-revision is bumped to that PR's merge SHA
in this PR.

Contents of the tarball (extracts in place into sites/carmax/):

instance_seed/carmax.db — the frozen seed DB
static/images/vehicles/ — 738 real CarMax stock photos covering
115/138 unique (year, make, model) tuples (~86% coverage)
static/images/articles/ — 10 article hero images

The 18 missing (year, make, model) tuples (Ford F-150 all years, BMW 3
Series all years, Mercedes-Benz C-Class all years, 2023 Toyota Corolla
/ Kia Sorento / Subaru Outback, 2021-22 Hyundai Elantra) have no evox
stock photos on the carmax CDN — those vehicles fall back to a
CarMax-branded SVG placeholder. This matches the live site's behavior
for those exact combinations.

Test users (benchmark)

Five users with password CarMax!2026, each pre-populated for
auth-gated tasks:

Email	First name	Pre-qual?	Saved	Reservation	Test drive	Appraisal	Order
`alice.j@test.com`	Alice	✓	2 (Civic + CR-V)	1	1 (at-home)	1 active	—
`bob.k@test.com`	Bob	✓	2	—	1 (in-store)	1 active	—
`carol.l@test.com`	Carol	✓	1	—	—	1 active	—
`dan.m@test.com`	Dan	—	1	—	—	—	1 (CMX-2026-000001, ready_for_pickup, with MaxCare gold)
`emma.n@test.com`	Emma	✓	—	—	—	—	—

(Skill suggests bob.c/carol.d/david.k with TestPass123!, but
since tasks.jsonl references these specific emails throughout, I kept
the slightly different set. Functionally equivalent.)

Pre-PR checks

python3 -m py_compile sites/carmax/app.py — clean
python3 -m py_compile sites/carmax/seed_data.py — clean
bash scripts/build.sh webharbor:dev — succeeds (image ~6.2 GB)
Container boots, all 16 sites alive
All 16 sites return HTTP 200
/reset/carmax byte-identical (md5 above)
Each task in tasks.jsonl has a verifiable answer in the seed
Phase-3 walkthrough (info-leak / superficial-completion / distractor checks): 3 issues found, 3 fixed (Task 13 disambiguation, dan's order total, Turbo feature cross-field consistency)
Phase-4 hardening (13 leak archetypes + 4 dimensions): no real leaks; one minor task rephrasing applied

Anything that might want reviewer attention

Benchmark user emails deviate from the skill's recommended
bob.c@test.com / carol.d@test.com set — kept for tasks.jsonl
internal consistency.
18 vehicles show a placeholder image (not 100% image coverage)
because the carmax CDN has no evox photos for those (make, model,
year) combinations. Could be remediated by sourcing from a different
CDN if the maintainer requires 100% coverage.
SEED_NOW = datetime(2026, 1, 15, 12, 0, 0) — matches the
project's existing 2026 date pinning convention; please flag if a
different reference date is preferred.

Happy to address any review feedback.

…inux issue

…com. - 13 SQLAlchemy models (User / Store / Vehicle / SavedVehicle / Comparison + ComparisonItem / Reservation / TestDrive / Appraisal / FinancePreQual / Order / Review / Article) - 59 routes covering search / browse / detail / research / compare / saved / sell-my-car / pre-qual / reserve / test-drive / checkout / account / articles / FAQ / MaxCare / stores / auth - Token-overlap scored search with multi-field weighting - 141 deterministically-seeded vehicles across 31 templates - 12 real CarMax store locations - 5 benchmark users with pre-populated saved/reservation/test-drive/ appraisal/order data - 20 WebVoyager tasks in tasks.jsonl (6 Easy / 9 Medium / 5 Hard, including 2 disambiguation tasks) - Idempotent seed at function level; byte-identical reset verified

Violet24K added 6 commits May 14, 2026 22:28

adding carmax phase 1

427ad6a

phase 1 almost done

d3c4380

Create tasks.jsonl

177bdf1

phase1 docker check passed, phase 2 & 3 finished; update LR windows/l…

1b577f6

…inux issue

phase 3 done. scraped images

a680728

sarendis56 mentioned this pull request May 16, 2026

Add Compass real-estate mirror (port 40015) #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CarMax mirror (port 40015)#24

Add CarMax mirror (port 40015)#24
Violet24K wants to merge 6 commits into
aiming-lab:mainfrom
Violet24K:main

Violet24K commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Violet24K commented May 15, 2026

What's in this PR

Site code (sites/carmax/)

Registration (3 files modified)

Quality-of-life additions

Mirror functional coverage

Benchmark tasks

Verification

Idempotency

Asset side (HuggingFace dataset)

Test users (benchmark)

Pre-PR checks

Anything that might want reviewer attention

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Site code (`sites/carmax/`)