Skip to content

Add Compass real-estate mirror (port 40015)#25

Open
sarendis56 wants to merge 1 commit into
aiming-lab:mainfrom
sarendis56:add-compass-mirror
Open

Add Compass real-estate mirror (port 40015)#25
sarendis56 wants to merge 1 commit into
aiming-lab:mainfrom
sarendis56:add-compass-mirror

Conversation

@sarendis56
Copy link
Copy Markdown

TL;DR

Adds a Flask mirror of compass.com as the 16th
WebHarbor site, with browse / search / filter, listing detail, agent
directory, account flows (save, tour, inquiry, saved search, collection),
and 18 WebVoyager-format benchmark tasks.

Companion HuggingFace PR: https://huggingface.co/datasets/ChilleD/WebHarbor/discussions/3

Note: PRs #11, #12, #24 also claim port 40015. Whichever lands first
keeps it; happy to rebase onto the next free port if this isn't merged
first.

What's in this PR

Site code (sites/compass/)

File Lines Purpose
app.py 1,011 Flask app: 10 SQLAlchemy models, 35+ routes, token-overlap scored search
seed_data.py 659 Idempotent seed (524 listings, 20 agents, 10 cities, 4 benchmark users)
templates/*.html 33 files base + 32 page templates, hand-rolled Compass look
static/css/compass.css 327 White / black / serif palette matching the real site
listings_clean.json 524 records Normalized scrape output consumed by seed_data.py at build time
tasks.jsonl 18 WebVoyager benchmark tasks (3 hard multi-step)
_health.py End-to-end health probe
requirements.txt Pinned to image's Flask / SQLAlchemy versions

Registration (3 files modified, must stay in sync per AGENTS.md)

  • websyn_start.shcompass appended to SITES=( … ), the two 15s
    in ready-count log lines bumped to 16.
  • control_server.py'compass' appended to SITES.
  • DockerfileEXPOSE 8101 40000-40015.

Verification

All checks in AGENTS.md § Pre-PR checks pass.

  • python3 -m py_compile sites/compass/{app.py,seed_data.py} — clean.
  • ./scripts/build.sh webharbor:dev — image builds.
  • docker run on alt ports 8201 / 41000-41015:
    • /health reports all 16 sites alive with PIDs.
    • All 16 root paths return 200.
  • Byte-identical reset (the strict invariant):
    curl -X POST :8201/reset/compass
    md5sum instance/compass.db instance_seed/compass.db
    # 2a7458e3b6c3e3d0b39c32cca5d0f519  both files
    
  • All 18 tasks in tasks.jsonl walk end-to-end against the running mirror.

Design notes

  • Determinism. Passwords use PBKDF2 with a fixed per-email salt
    (`sha1("salt-" + email)[:8]`), not bcrypt, because bcrypt's random salt
    breaks byte-identical reset. `User.check_password` accepts both prefixes
    so future writes from the running app (which uses Flask-Bcrypt) still
    authenticate.
  • Search scoring. Token-overlap with city / state / neighborhood boosts
    rather than strict `LIKE %q% AND %q%` — matches the booking-site pattern
    in `sites/booking/app.py`.
  • No task-info leaks. Homepage panels (Newest, Luxury) sort by
    `Listing.id` rather than `price.desc()` so the answers to Tasks 11 / 17
    don't surface for free in the hero grid. Co-op pool was backfilled to

    = 5 candidates per filter combo used in tasks.

  • Real assets. All listing photos are the actual
    `compass.com/m/0//600x400.webp` images, resolved via Playwright
    and downloaded with httpx. No placeholders, no AI stock photos.

Assets

Heavy assets (`instance_seed/compass.db`, `static/images/`, ~129 MB
packed) ship via the companion HuggingFace PR linked above.
`.assets-revision` already pins `main`, so once the HF PR merges this
code PR Just Works.

Adds a Flask mirror of compass.com as the 16th WebHarbor site, with
browse / search / filter, listing detail, agent directory, account
flows (save, tour, inquiry, saved search, collection), and 18
WebVoyager-format benchmark tasks.

sites/compass/:
- app.py (1011 lines): 10 SQLAlchemy models, 35+ routes,
  token-overlap scored search with city/state/neighborhood boosts.
  User.check_password accepts both pbkdf2 and bcrypt prefixes so
  seed-time PBKDF2 hashes (deterministic) coexist with runtime
  Flask-Bcrypt writes.
- seed_data.py (659 lines): idempotent function-level gates;
  PBKDF2 with fixed per-email salt to preserve byte-identical reset;
  Co-op pool backfilled to keep filter-based tasks at >=5 candidates.
- 33 Jinja templates + 327-line hand-rolled CSS (white/black/serif
  to match the real Compass palette).
- tasks.jsonl: 18 WebVoyager tasks (3 hard multi-step).
- listings_clean.json: 524 normalized listings consumed by seed_data
  at build time (committed alongside the mirror, per the convention
  used by booking/, arxiv/, etc.).

Registration (3 files, must stay in sync per AGENTS.md):
- websyn_start.sh: compass appended to SITES, two ready-count 15s -> 16.
- control_server.py: 'compass' appended to SITES.
- Dockerfile: EXPOSE 8101 40000-40015.

Heavy assets (instance_seed/compass.db, static/images/, ~129 MB
packed) ship via the companion HuggingFace PR
ChilleD/WebHarbor#3. .assets-revision already pins main, so once
that merges this Just Works.

Byte-identical reset verified:
  md5sum instance/compass.db instance_seed/compass.db
  -> 2a7458e3b6c3e3d0b39c32cca5d0f519 (both files).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant