feat: add Claude guidelines for scraper creation and reviews by Luis-manzur · Pull Request #1808 · freelawproject/juriscraper

Luis-manzur · 2026-02-17T18:45:21Z

add new Claude development guidelines #1793

grossir

has inexact info (using WebDriven)
Is repetitive (way too much text on usage of titlecase, which is very straightforward)
has redundant recommendations (ex: creating a session, which is already created by AbstractSite)
missing important heuristics (testing backscrapers, actually counting the number of results and comparing with what was parsed, skipping rows silently, bad XPATHs)

For the heuristics, check the PRs reviewed and find the usual pain points

I would recommend discussing ideas in the issue before implementation. Otherwise the PR itself becomes the discussion, which works too, but after trying to get a more polished starting point

grossir · 2026-02-17T19:29:28Z

+**Why OpinionSiteLinear:**
+- Modern, maintainable architecture
+- Handles any data source (JSON APIs, HTML, XML)
+- Built-in pagination support


Built-in pagination support

?

grossir · 2026-02-17T19:29:47Z

+
+## Choosing the Right Base Class
+
+**⚠️ CRITICAL: All new scrapers MUST use `OpinionSiteLinear` or `OpinionSiteLinearWebDriven`.**


We don't use this OpinionSiteLinearWebDriven because CL doesn't support it?

No mention of ClusterSite

grossir · 2026-02-17T19:33:28Z

+- [ ] Complete docstring with CourtID, Court Short Name, Author, Reviewer, History
+- [ ] Clear comments for complex logic or non-obvious code
+- [ ] Type hints for methods in new scrapers
+


missing that any PR should edit CHANGES.md, citing the original issue number, too

grossir · 2026-02-17T19:39:00Z

+from juriscraper.lib.string_utils import convert_date_string
+
+# Auto-detect format
+case_dict["date"] = convert_date_string("01/15/2024")


this is automatically done by OpinionSiteLinear

juriscraper/juriscraper/OpinionSiteLinear.py

Line 65 in 8640947

return [convert_date_string(case["date"]) for case in self.cases]

grossir · 2026-02-17T19:39:37Z

+## Advanced Topics
+
+### PDF Content Extraction
+Some scrapers need to extract metadata from downloaded PDFs:


not only PDFs

grossir · 2026-02-17T19:40:06Z

+    self.court_id = self.__module__
+
+    # Session persists across requests
+    self.request["session"] = requests.Session()


this is already present?

juriscraper/juriscraper/AbstractSite.py

Line 64 in 8640947

"session": requests.session(),

Brennan-Chesley-FLP · 2026-02-18T16:54:52Z

It may be worth considering some of the findings in this paper.

feat: add Claude guidelines for scraper creation and reviews

350c836

Luis-manzur requested a review from grossir February 17, 2026 18:45

Luis-manzur assigned grossir Feb 17, 2026

Luis-manzur added this to Sprint (Case Law) Feb 17, 2026

Luis-manzur linked an issue Feb 17, 2026 that may be closed by this pull request

Create a CLAUDE.md file for the opinions module #1793

Open

Luis-manzur moved this to PRs to Review in Sprint (Case Law) Feb 17, 2026

grossir reviewed Feb 17, 2026

View reviewed changes

grossir moved this from PRs to Review to PRs to improve in Sprint (Case Law) Feb 17, 2026

grossir assigned Luis-manzur and unassigned grossir Feb 17, 2026

grossir closed this Mar 18, 2026

github-project-automation Bot moved this from PRs to improve to Done in Sprint (Case Law) Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add Claude guidelines for scraper creation and reviews#1808

feat: add Claude guidelines for scraper creation and reviews#1808
Luis-manzur wants to merge 1 commit intomainfrom
1793-create-a-claudemd-file-for-the-opinions-module

Luis-manzur commented Feb 17, 2026

Uh oh!

grossir left a comment

Uh oh!

grossir Feb 17, 2026 •

edited

Loading

Uh oh!

grossir Feb 17, 2026

Uh oh!

grossir Feb 17, 2026

Uh oh!

grossir Feb 17, 2026

Uh oh!

grossir Feb 17, 2026

Uh oh!

grossir Feb 17, 2026

Uh oh!

grossir Feb 17, 2026

Uh oh!

Brennan-Chesley-FLP commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		## Choosing the Right Base Class

		⚠️ CRITICAL: All new scrapers MUST use `OpinionSiteLinear` or `OpinionSiteLinearWebDriven`.

Uh oh!

Conversation

Luis-manzur commented Feb 17, 2026

Uh oh!

grossir left a comment

Choose a reason for hiding this comment

Uh oh!

grossir Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grossir Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

grossir Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

grossir Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

grossir Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

grossir Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

grossir Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Brennan-Chesley-FLP commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

grossir Feb 17, 2026 •

edited

Loading