Skip to content

Scrape place reservation provider links#45

Merged
michaelmwu merged 7 commits into
mainfrom
michaelmwu/find-table-reservation-links
Jun 11, 2026
Merged

Scrape place reservation provider links#45
michaelmwu merged 7 commits into
mainfrom
michaelmwu/find-table-reservation-links

Conversation

@michaelmwu

@michaelmwu michaelmwu commented Jun 11, 2026

Copy link
Copy Markdown
Member

Summary

  • add PlaceReservationLink and expose reservation_links on PlaceDetails
  • collect visible reservation links from the place overview panel
  • click the Google Maps Find a Table/reservation dialog and collect provider links for multi-provider cases
  • normalize provider labels and dedupe links

Validation

  • uv run python3 -m unittest tests.test_place_scraper.PlaceScraperTests.test_collect_reservation_dialog_snapshot_clicks_and_reads_provider_links tests.test_place_scraper.PlaceScraperTests.test_merge_reservation_links_dedupes_overview_and_dialog_links tests.test_place_scraper.PlaceScraperTests.test_build_place_details_preserves_reservation_links tests.test_place_scraper.PlaceScraperTests.test_normalize_reservation_links_cleans_google_dialog_labels
  • scripts/lint.sh
  • scripts/typecheck.sh
  • live scrape: Mark's Tokyo -> website http://markstokyo.com/, reservation TableCheck
  • live scrape: Sushi Tokyo Ten -> website http://sushitokyo-ten.com/, reservations Ikyu, AutoReserve, TableCheck

Summary by CodeRabbit

  • New Features

    • Place pages now surface reservation/booking links with cleaned provider labels, validated HTTP(S) URLs, and deduplication (removes invalid or duplicate links).
  • Improvements

    • Improved link normalization: unwraps redirect URLs, filters non-HTTP schemes, and prefers direct provider links over generic platform reserve pages.
  • Tests

    • Added tests for dialog extraction, merging/normalization, deduplication, and serialized reservation links.

Copilot AI review requested due to automatic review settings June 11, 2026 02:30
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@michaelmwu, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 55 minutes and 4 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2e8c8e76-eb69-4767-84fc-314d293b3aff

📥 Commits

Reviewing files that changed from the base of the PR and between eff4af3 and 2fe4beb.

📒 Files selected for processing (2)
  • src/gmaps_scraper/place_scraper.py
  • tests/test_place_scraper.py
📝 Walkthrough

Walkthrough

Adds end-to-end extraction, merging, normalization, and wiring of reservation provider links into PlaceDetails, plus unit tests.

Changes

Reservation Links Collection and Normalization

Layer / File(s) Summary
Reservation link data model
src/gmaps_scraper/models.py, src/gmaps_scraper/place_scraper.py
Introduces PlaceReservationLink dataclass (label, url), adds reservation_links to PlaceDetails with default empty list, serializes in to_dict(), and imports the model in place_scraper.
Primary DOM extraction for reservation links
src/gmaps_scraper/place_scraper.py
Extends the JavaScript place panel extractor with helpers to identify reservation anchors, derive provider labels, and collect up to a bounded set of reservation links included in the extraction payload.
Reservation dialog interaction JavaScript
src/gmaps_scraper/place_scraper.py
Adds JS snippets that click the first visible reservation button, scan reservation dialogs/panels (with hostname filtering and page-body fallback), clean labels, and return up to eight provider links.
Dialog snapshot collection and merging
src/gmaps_scraper/place_scraper.py
Adds _collect_reservation_dialog_snapshot() to run the dialog JS, integrates dialog snapshot collection into DOM snapshot flow, and merges dialog-sourced links into the main snapshot.
Link merging and normalization
src/gmaps_scraper/place_scraper.py
Implements _merge_reservation_links() and _normalize_reservation_links() to validate/normalize URLs, deduplicate by normalized URL, filter Google "maps reserve" links when non-Google providers exist, and generate cleaned provider labels via heuristics and known-host mappings.
Test coverage for reservation functionality
tests/test_place_scraper.py
Adds tests for dialog snapshot extraction (button present/absent), link merging/deduplication, PlaceDetails construction and to_dict() including reservation_links, and normalization/label-cleaning behavior.

🎯 3 (Moderate) | ⏱️ ~25 minutes

"I hop through dialog trees and nets,
I tidy labels, prune the dupe;
With tiny paws I stitch up sets,
A rabbit's map of booking soup. 🐇"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Scrape place reservation provider links' clearly and directly describes the main feature added: collection of reservation provider links from Google Maps place pages.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch michaelmwu/find-table-reservation-links

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the place scraping pipeline to collect and surface third-party reservation/booking provider links from Google Maps place pages, including multi-provider “Find a table” dialog flows, and exposes them on PlaceDetails.

Changes:

  • Introduces PlaceReservationLink and adds reservation_links to PlaceDetails serialization.
  • Extracts reservation links from the place overview panel and, when applicable, clicks into the reservation dialog to capture provider links.
  • Normalizes provider labels and deduplicates reservation links across overview + dialog sources, with accompanying unit tests.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/gmaps_scraper/place_scraper.py Adds JS extractors for reservation links + dialog interaction, merges snapshots, and normalizes/dedupes reservation link payloads into model objects.
src/gmaps_scraper/models.py Adds PlaceReservationLink dataclass and exposes reservation_links on PlaceDetails + to_dict().
tests/test_place_scraper.py Adds unit tests for dialog snapshot collection, merge/dedupe behavior, label normalization, and PlaceDetails preservation/filtering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/gmaps_scraper/place_scraper.py Outdated
Comment thread src/gmaps_scraper/place_scraper.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e28a563fd7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gmaps_scraper/place_scraper.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 64ac31ebb3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gmaps_scraper/place_scraper.py Outdated
Copilot AI review requested due to automatic review settings June 11, 2026 08:04

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread src/gmaps_scraper/place_scraper.py Outdated
Comment thread src/gmaps_scraper/place_scraper.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7370830c63

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gmaps_scraper/place_scraper.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2580f5025e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gmaps_scraper/place_scraper.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_place_scraper.py (1)

198-204: ⚡ Quick win

Strengthen this test to assert behavior, not JS string tokens.

At Line 198, this test is implementation-coupled (assertIn on JS source). It can still pass after meaningful behavior regressions. Prefer asserting the extractor outcome on a minimal provider-popup DOM fixture (including no role="dialog"), then validate extracted links and close behavior.

As per coding guidelines, “Tests should encode intent, not just exercise code paths; a test is weak if it would keep passing after the relevant scraper behavior or business rule is broken.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_place_scraper.py` around lines 198 - 204, Replace the current
implementation-coupled assertions in
test_reservation_dialog_extractor_trusts_provider_popup_without_dialog_role with
a behavioral test: build a minimal provider-popup DOM fixture (no role="dialog")
that includes a sample reservation link and a close button, run the reservation
dialog extractor using the same logic that injects/executes
_PLACE_RESERVATION_DIALOG_JS, then assert the extractor returns the expected
link(s) and that invoking the extracted close action triggers the provider-close
behavior (e.g., calls closeProviderRoot or removes the popup from the DOM); keep
references to _PLACE_RESERVATION_DIALOG_JS and the test name so reviewers can
find the replacement.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_place_scraper.py`:
- Around line 198-204: Replace the current implementation-coupled assertions in
test_reservation_dialog_extractor_trusts_provider_popup_without_dialog_role with
a behavioral test: build a minimal provider-popup DOM fixture (no role="dialog")
that includes a sample reservation link and a close button, run the reservation
dialog extractor using the same logic that injects/executes
_PLACE_RESERVATION_DIALOG_JS, then assert the extractor returns the expected
link(s) and that invoking the extracted close action triggers the provider-close
behavior (e.g., calls closeProviderRoot or removes the popup from the DOM); keep
references to _PLACE_RESERVATION_DIALOG_JS and the test name so reviewers can
find the replacement.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d532c53a-e4bb-46c1-941e-806d876ba4f3

📥 Commits

Reviewing files that changed from the base of the PR and between 64ac31e and 2580f50.

📒 Files selected for processing (2)
  • src/gmaps_scraper/place_scraper.py
  • tests/test_place_scraper.py

Copilot AI review requested due to automatic review settings June 11, 2026 13:59

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread src/gmaps_scraper/place_scraper.py
Comment thread src/gmaps_scraper/place_scraper.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eff4af322a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gmaps_scraper/place_scraper.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_place_scraper.py (1)

203-206: ⚡ Quick win

Use a less brittle assertion for sort-order intent.

At Line 203, the test hard-codes the full return expression string. This can fail on harmless formatting/refactors while behavior is unchanged. Prefer assertRegex/token-based checks that verify descending area sort intent.

As per coding guidelines, “Tests should encode intent, not just exercise code paths; a test is weak if it would keep passing after the relevant scraper behavior or business rule is broken.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_place_scraper.py` around lines 203 - 206, The test currently
asserts the exact return expression string in _PLACE_RESERVATION_DIALOG_JS which
is brittle; change the assertion to verify the descending-area sort intent
instead (e.g., use self.assertRegex or token checks against
_PLACE_RESERVATION_DIALOG_JS) by asserting a regex that matches 'return'
followed by an expression containing 'rightRect.width' and 'rightRect.height'
separated by a multiplication, a minus sign, and 'leftRect.width' and
'leftRect.height' (ensuring right-side area minus left-side area), or
alternatively assert presence of the tokens 'rightRect.width',
'rightRect.height', '-', 'leftRect.width', 'leftRect.height' in that relative
order to capture intent without relying on exact formatting.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_place_scraper.py`:
- Around line 203-206: The test currently asserts the exact return expression
string in _PLACE_RESERVATION_DIALOG_JS which is brittle; change the assertion to
verify the descending-area sort intent instead (e.g., use self.assertRegex or
token checks against _PLACE_RESERVATION_DIALOG_JS) by asserting a regex that
matches 'return' followed by an expression containing 'rightRect.width' and
'rightRect.height' separated by a multiplication, a minus sign, and
'leftRect.width' and 'leftRect.height' (ensuring right-side area minus left-side
area), or alternatively assert presence of the tokens 'rightRect.width',
'rightRect.height', '-', 'leftRect.width', 'leftRect.height' in that relative
order to capture intent without relying on exact formatting.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9c503f3-2503-4bb8-8803-ac659e974ccc

📥 Commits

Reviewing files that changed from the base of the PR and between 845a07a and eff4af3.

📒 Files selected for processing (2)
  • src/gmaps_scraper/place_scraper.py
  • tests/test_place_scraper.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/gmaps_scraper/place_scraper.py

Copilot AI review requested due to automatic review settings June 11, 2026 15:35

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2fe4beb118

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1347 to +1350
}).sort((left, right) => {
const leftRect = left.getBoundingClientRect();
const rightRect = right.getBoundingClientRect();
return (rightRect.width * rightRect.height) - (leftRect.width * leftRect.height);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Choose the narrowest provider popup root

When the reservation popup lacks role='dialog' (the fallback this code is meant to handle), every visible ancestor div containing the popup text also matches this filter; sorting by descending area then selects the outer Maps container rather than the popup itself. Because that sets hasTrustedProviderRoot and the later link loop no longer requires provider-host or reservation evidence, ordinary external links in the same container (for example the place website) can be emitted as reservation links. Prefer the smallest matching provider panel or keep provider/evidence checks for links collected from this fallback root.

Useful? React with 👍 / 👎.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment on lines +1326 to +1331
const rejectHostPattern = new RegExp(
String.raw`(^|\.)google(?:\.[a-z]{2,}){1,2}$`
+ String.raw`|(^|\.)gstatic\.com$`
+ String.raw`|(^|\.)googleusercontent\.com$`,
"i",
);
Comment on lines +4167 to +4168
def _is_google_host(host: str) -> bool:
return re.search(r"(^|\.)google(?:\.[a-z0-9-]+){1,2}$", host) is not None
Comment on lines +4892 to +4898
tokens = label.split()
unique_tokens = list(dict.fromkeys(token.casefold() for token in tokens))
if len(unique_tokens) == 1 and len(tokens) > 1:
label = tokens[0]
if "." in label or re.fullmatch(r"(?:https?://)?[A-Za-z0-9.-]+/?", label):
return fallback
return label
@michaelmwu michaelmwu merged commit b6b5612 into main Jun 11, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants