Scrape place reservation provider links#45
Conversation
|
Warning Review limit reached
More reviews will be available in 55 minutes and 4 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds end-to-end extraction, merging, normalization, and wiring of reservation provider links into PlaceDetails, plus unit tests. ChangesReservation Links Collection and Normalization
🎯 3 (Moderate) | ⏱️ ~25 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR extends the place scraping pipeline to collect and surface third-party reservation/booking provider links from Google Maps place pages, including multi-provider “Find a table” dialog flows, and exposes them on PlaceDetails.
Changes:
- Introduces
PlaceReservationLinkand addsreservation_linkstoPlaceDetailsserialization. - Extracts reservation links from the place overview panel and, when applicable, clicks into the reservation dialog to capture provider links.
- Normalizes provider labels and deduplicates reservation links across overview + dialog sources, with accompanying unit tests.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/gmaps_scraper/place_scraper.py |
Adds JS extractors for reservation links + dialog interaction, merges snapshots, and normalizes/dedupes reservation link payloads into model objects. |
src/gmaps_scraper/models.py |
Adds PlaceReservationLink dataclass and exposes reservation_links on PlaceDetails + to_dict(). |
tests/test_place_scraper.py |
Adds unit tests for dialog snapshot collection, merge/dedupe behavior, label normalization, and PlaceDetails preservation/filtering. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e28a563fd7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 64ac31ebb3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7370830c63
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2580f5025e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_place_scraper.py (1)
198-204: ⚡ Quick winStrengthen this test to assert behavior, not JS string tokens.
At Line 198, this test is implementation-coupled (
assertInon JS source). It can still pass after meaningful behavior regressions. Prefer asserting the extractor outcome on a minimal provider-popup DOM fixture (including norole="dialog"), then validate extracted links and close behavior.As per coding guidelines, “Tests should encode intent, not just exercise code paths; a test is weak if it would keep passing after the relevant scraper behavior or business rule is broken.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_place_scraper.py` around lines 198 - 204, Replace the current implementation-coupled assertions in test_reservation_dialog_extractor_trusts_provider_popup_without_dialog_role with a behavioral test: build a minimal provider-popup DOM fixture (no role="dialog") that includes a sample reservation link and a close button, run the reservation dialog extractor using the same logic that injects/executes _PLACE_RESERVATION_DIALOG_JS, then assert the extractor returns the expected link(s) and that invoking the extracted close action triggers the provider-close behavior (e.g., calls closeProviderRoot or removes the popup from the DOM); keep references to _PLACE_RESERVATION_DIALOG_JS and the test name so reviewers can find the replacement.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_place_scraper.py`:
- Around line 198-204: Replace the current implementation-coupled assertions in
test_reservation_dialog_extractor_trusts_provider_popup_without_dialog_role with
a behavioral test: build a minimal provider-popup DOM fixture (no role="dialog")
that includes a sample reservation link and a close button, run the reservation
dialog extractor using the same logic that injects/executes
_PLACE_RESERVATION_DIALOG_JS, then assert the extractor returns the expected
link(s) and that invoking the extracted close action triggers the provider-close
behavior (e.g., calls closeProviderRoot or removes the popup from the DOM); keep
references to _PLACE_RESERVATION_DIALOG_JS and the test name so reviewers can
find the replacement.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d532c53a-e4bb-46c1-941e-806d876ba4f3
📒 Files selected for processing (2)
src/gmaps_scraper/place_scraper.pytests/test_place_scraper.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eff4af322a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_place_scraper.py (1)
203-206: ⚡ Quick winUse a less brittle assertion for sort-order intent.
At Line 203, the test hard-codes the full return expression string. This can fail on harmless formatting/refactors while behavior is unchanged. Prefer
assertRegex/token-based checks that verify descending area sort intent.As per coding guidelines, “Tests should encode intent, not just exercise code paths; a test is weak if it would keep passing after the relevant scraper behavior or business rule is broken.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_place_scraper.py` around lines 203 - 206, The test currently asserts the exact return expression string in _PLACE_RESERVATION_DIALOG_JS which is brittle; change the assertion to verify the descending-area sort intent instead (e.g., use self.assertRegex or token checks against _PLACE_RESERVATION_DIALOG_JS) by asserting a regex that matches 'return' followed by an expression containing 'rightRect.width' and 'rightRect.height' separated by a multiplication, a minus sign, and 'leftRect.width' and 'leftRect.height' (ensuring right-side area minus left-side area), or alternatively assert presence of the tokens 'rightRect.width', 'rightRect.height', '-', 'leftRect.width', 'leftRect.height' in that relative order to capture intent without relying on exact formatting.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_place_scraper.py`:
- Around line 203-206: The test currently asserts the exact return expression
string in _PLACE_RESERVATION_DIALOG_JS which is brittle; change the assertion to
verify the descending-area sort intent instead (e.g., use self.assertRegex or
token checks against _PLACE_RESERVATION_DIALOG_JS) by asserting a regex that
matches 'return' followed by an expression containing 'rightRect.width' and
'rightRect.height' separated by a multiplication, a minus sign, and
'leftRect.width' and 'leftRect.height' (ensuring right-side area minus left-side
area), or alternatively assert presence of the tokens 'rightRect.width',
'rightRect.height', '-', 'leftRect.width', 'leftRect.height' in that relative
order to capture intent without relying on exact formatting.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e9c503f3-2503-4bb8-8803-ac659e974ccc
📒 Files selected for processing (2)
src/gmaps_scraper/place_scraper.pytests/test_place_scraper.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/gmaps_scraper/place_scraper.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2fe4beb118
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| }).sort((left, right) => { | ||
| const leftRect = left.getBoundingClientRect(); | ||
| const rightRect = right.getBoundingClientRect(); | ||
| return (rightRect.width * rightRect.height) - (leftRect.width * leftRect.height); |
There was a problem hiding this comment.
Choose the narrowest provider popup root
When the reservation popup lacks role='dialog' (the fallback this code is meant to handle), every visible ancestor div containing the popup text also matches this filter; sorting by descending area then selects the outer Maps container rather than the popup itself. Because that sets hasTrustedProviderRoot and the later link loop no longer requires provider-host or reservation evidence, ordinary external links in the same container (for example the place website) can be emitted as reservation links. Prefer the smallest matching provider panel or keep provider/evidence checks for links collected from this fallback root.
Useful? React with 👍 / 👎.
| const rejectHostPattern = new RegExp( | ||
| String.raw`(^|\.)google(?:\.[a-z]{2,}){1,2}$` | ||
| + String.raw`|(^|\.)gstatic\.com$` | ||
| + String.raw`|(^|\.)googleusercontent\.com$`, | ||
| "i", | ||
| ); |
| def _is_google_host(host: str) -> bool: | ||
| return re.search(r"(^|\.)google(?:\.[a-z0-9-]+){1,2}$", host) is not None |
| tokens = label.split() | ||
| unique_tokens = list(dict.fromkeys(token.casefold() for token in tokens)) | ||
| if len(unique_tokens) == 1 and len(tokens) > 1: | ||
| label = tokens[0] | ||
| if "." in label or re.fullmatch(r"(?:https?://)?[A-Za-z0-9.-]+/?", label): | ||
| return fallback | ||
| return label |
Summary
PlaceReservationLinkand exposereservation_linksonPlaceDetailsValidation
uv run python3 -m unittest tests.test_place_scraper.PlaceScraperTests.test_collect_reservation_dialog_snapshot_clicks_and_reads_provider_links tests.test_place_scraper.PlaceScraperTests.test_merge_reservation_links_dedupes_overview_and_dialog_links tests.test_place_scraper.PlaceScraperTests.test_build_place_details_preserves_reservation_links tests.test_place_scraper.PlaceScraperTests.test_normalize_reservation_links_cleans_google_dialog_labelsscripts/lint.shscripts/typecheck.shhttp://markstokyo.com/, reservationTableCheckhttp://sushitokyo-ten.com/, reservationsIkyu,AutoReserve,TableCheckSummary by CodeRabbit
New Features
Improvements
Tests