Skip to content

feat: add Claude guidelines for scraper creation and reviews#1808

Closed
Luis-manzur wants to merge 1 commit intomainfrom
1793-create-a-claudemd-file-for-the-opinions-module
Closed

feat: add Claude guidelines for scraper creation and reviews#1808
Luis-manzur wants to merge 1 commit intomainfrom
1793-create-a-claudemd-file-for-the-opinions-module

Conversation

@Luis-manzur
Copy link
Copy Markdown
Contributor

add new Claude development guidelines #1793

@Luis-manzur Luis-manzur requested a review from grossir February 17, 2026 18:45
@Luis-manzur Luis-manzur linked an issue Feb 17, 2026 that may be closed by this pull request
@Luis-manzur Luis-manzur moved this to PRs to Review in Sprint (Case Law) Feb 17, 2026
Copy link
Copy Markdown
Contributor

@grossir grossir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • has inexact info (using WebDriven)
  • Is repetitive (way too much text on usage of titlecase, which is very straightforward)
  • has redundant recommendations (ex: creating a session, which is already created by AbstractSite)
  • missing important heuristics (testing backscrapers, actually counting the number of results and comparing with what was parsed, skipping rows silently, bad XPATHs)

For the heuristics, check the PRs reviewed and find the usual pain points

I would recommend discussing ideas in the issue before implementation. Otherwise the PR itself becomes the discussion, which works too, but after trying to get a more polished starting point

**Why OpinionSiteLinear:**
- Modern, maintainable architecture
- Handles any data source (JSON APIs, HTML, XML)
- Built-in pagination support
Copy link
Copy Markdown
Contributor

@grossir grossir Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Built-in pagination support

?


## Choosing the Right Base Class

**⚠️ CRITICAL: All new scrapers MUST use `OpinionSiteLinear` or `OpinionSiteLinearWebDriven`.**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use this OpinionSiteLinearWebDriven because CL doesn't support it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No mention of ClusterSite

- [ ] Complete docstring with CourtID, Court Short Name, Author, Reviewer, History
- [ ] Clear comments for complex logic or non-obvious code
- [ ] Type hints for methods in new scrapers

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing that any PR should edit CHANGES.md, citing the original issue number, too

from juriscraper.lib.string_utils import convert_date_string

# Auto-detect format
case_dict["date"] = convert_date_string("01/15/2024")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is automatically done by OpinionSiteLinear

return [convert_date_string(case["date"]) for case in self.cases]

## Advanced Topics

### PDF Content Extraction
Some scrapers need to extract metadata from downloaded PDFs:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not only PDFs

self.court_id = self.__module__

# Session persists across requests
self.request["session"] = requests.Session()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already present?

"session": requests.session(),

@grossir grossir moved this from PRs to Review to PRs to improve in Sprint (Case Law) Feb 17, 2026
@grossir grossir assigned Luis-manzur and unassigned grossir Feb 17, 2026
@Brennan-Chesley-FLP
Copy link
Copy Markdown
Contributor

It may be worth considering some of the findings in this paper.

@grossir grossir closed this Mar 18, 2026
@github-project-automation github-project-automation Bot moved this from PRs to improve to Done in Sprint (Case Law) Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Create a CLAUDE.md file for the opinions module

3 participants