Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
1952933
feat: Add comprehensive citation parsing beyond case law
BuffaloJames Sep 3, 2025
21e16ee
Fix test failures and update changelog
BuffaloJames Sep 19, 2025
a15d7d5
Fix AG opinion citation detection
BuffaloJames Sep 19, 2025
17fa281
Fix lint issues - trailing whitespace and Ruff formatting
BuffaloJames Sep 19, 2025
95f6c56
Fix pre-commit errors: Remove duplicate ExtendedCitationTokenizer cla…
BuffaloJames Sep 19, 2025
0d66f43
Delete scratch work
BuffaloJames Sep 19, 2025
41703a8
Fix pre-commit issues and enhance constitution citation parsing
BuffaloJames Sep 20, 2025
7d71b04
Delete README_ENHANCEMENT.md
BuffaloJames Sep 20, 2025
e9b581e
Delete Background.md
BuffaloJames Sep 20, 2025
ef4e07d
Merge branch 'main' of https://github.com/BuffaloJames/eyecite
BuffaloJames Sep 20, 2025
4bc9b7f
Update benchmark.yml
BuffaloJames Sep 20, 2025
8539a00
Update benchmark.yml
BuffaloJames Sep 25, 2025
069b7c0
Update benchmark.yml
BuffaloJames Sep 25, 2025
75780a6
Update benchmark.yml
BuffaloJames Sep 25, 2025
1d3d1bf
Update benchmark.yml
BuffaloJames Sep 25, 2025
b98f5ae
Update benchmark.yml
BuffaloJames Sep 25, 2025
448ada4
Update benchmark.yml
BuffaloJames Sep 25, 2025
c87971c
Update benchmark.yml
BuffaloJames Sep 25, 2025
c9ed7b6
Update benchmark.yml
BuffaloJames Sep 25, 2025
4dc094f
Update benchmark.yml
BuffaloJames Sep 25, 2025
5809424
Update benchmark.yml
BuffaloJames Sep 25, 2025
9f749ae
Update benchmark.yml
BuffaloJames Sep 25, 2025
2026731
Fix Poetry Hyperscan dependency resolution and enable ruff unsafe fixes
BuffaloJames Sep 25, 2025
b51363b
Fix Poetry dependency-groups syntax for hyperscan marker
BuffaloJames Sep 25, 2025
c165a33
Remove hyperscan marker and update pre-commit config
BuffaloJames Sep 25, 2025
1ebec14
Apply ruff unsafe-fixes for duplicate dictionary keys
BuffaloJames Sep 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
277 changes: 106 additions & 171 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
@@ -1,201 +1,136 @@
name: Benchmark Pull Request
name: CI

on:
push:
branches: [main]
pull_request:
repository_dispatch:
types: [ reporters-db-pr ]

env:
main: "$(/usr/bin/git log -1 --format='%H')"

jobs:
benchmark:
name: PR comment
if: github.event_name == 'pull_request'
lint:
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v4

- uses: actions/checkout@v4
- name: Set up Python
id: setup-python
uses: actions/setup-python@v5
with:
python-version: 3.12

- name: Install uv
uses: astral-sh/setup-uv@v6
python-version: "3.11"
- name: Install Poetry
uses: snok/install-poetry@v1
with:
enable-cache: true
version: "0.7.x"

- name: Add or Update comment on PR that Test is running
uses: marocchino/sticky-pull-request-comment@v2
virtualenvs-create: true
virtualenvs-in-project: true
- name: Load cached venv
id: cached-poetry-dependencies
uses: actions/cache@v3
with:
recreate: true
message: |
Eyecite Benchmarking in progress...

For details, see: https://github.com/freelawproject/eyecite/actions/workflows/benchmark.yml

This message will be updated when the test is completed.

- name: Install Python dependencies
run: |
uv sync --frozen --no-group dev --group benchmark
source .venv/bin/activate
echo "$VIRTUAL_ENV/bin" >> $GITHUB_PATH
echo "VIRTUAL_ENV=$VIRTUAL_ENV" >> $GITHUB_ENV

- name: Setup variables I
id: branch1
path: .venv
key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}
- name: Install dependencies
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
run: poetry install --sync
- name: Run linters
run: |
echo ${{ github.event.issue.pull_request }}
echo "::set-output name=filepath::benchmark/${{ env.main }}.json"
echo "::set-output name=hash::${{ env.main }}"
#----------------------------------------------
# Download Testing File
#
# We generated our testing datasets with the following command:
#
# root@maintenance:/opt/courtlistener# PGPASSWORD=$DB_PASSWORD psql \
# --host $DB_HOST \
# --username $DB_USER \
# --dbname courtlistener \
# --command \
# 'set statement_timeout to 0;
# COPY (
# SELECT \
# id, plain_text, html, html_lawbox, html_columbia, html_anon_2020, xml_harvard \
# FROM \
# search_opinion \
# TABLESAMPLE BERNOULLI (0.1) \
# ) \
# TO STDOUT \
# WITH (FORMAT csv, ENCODING utf8, HEADER); \
# ' \
# | bzip2 \
# | aws s3 cp - s3://com-courtlistener-storage/bulk-data/eyecite/tests/ten-percent.csv.bz2 \
# --acl public-read
#----------------------------------------------
- name: Download Testing File
run: |
curl https://storage.courtlistener.com/bulk-data/eyecite/tests/one-percent.csv.bz2 --output benchmark/bulk-file.csv.bz2

- name: Run first benchmark
run: |
python benchmark/benchmark.py --branches ${{ steps.branch1.outputs.hash }}
git stash --include-untracked
poetry run ruff check .
poetry run black --check .

test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
repository: freelawproject/eyecite
ref: main

- name: Install dependencies 2
run: uv sync --frozen --no-group dev --group benchmark

- name: Setup variables II
id: branch2
run: |
echo "::set-output name=filepath::benchmark/${{ env.main }}.json"
echo "::set-output name=hash::${{ env.main }}"

- name: Run second benchmark
run: |
git stash pop
python benchmark/benchmark.py --branches ${{ steps.branch1.outputs.hash }} ${{ steps.branch2.outputs.hash }} --pr ${{ github.event.number }}
mkdir results
mv benchmark/output.csv benchmark/${{ steps.branch1.outputs.hash }}.json benchmark/${{ steps.branch2.outputs.hash }}.json benchmark/report.md benchmark/chart.png results/

#----------------------------------------------
# Upload to Github PR
#----------------------------------------------
- name: Pushes test file
uses: dmnemec/copy_file_to_another_repo_action@main
env:
API_TOKEN_GITHUB: ${{ secrets.FREELAWBOT_TOKEN }}
python-version: "3.11"
- name: Install Poetry
uses: snok/install-poetry@v1
with:
user_email: 'info@free.law'
user_name: 'freelawbot'
source_file: 'results/'
destination_repo: 'freelawproject/eyecite'
destination_folder: '${{ github.event.number }}'
destination_branch: 'artifacts'
commit_message: 'feat(ci): Add artifacts for PR# ${{ github.event.number }}'

- name: Add or Update PR Comment from Generated Report
uses: marocchino/sticky-pull-request-comment@v2
virtualenvs-create: true
virtualenvs-in-project: true
- name: Load cached venv
id: cached-poetry-dependencies
uses: actions/cache@v3
with:
recreate: true
path: results/report.md
path: .venv
key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}
- name: Install dependencies
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
run: poetry install --sync
- name: Run tests
run: poetry run pytest --maxfail=1 --disable-warnings -q --cov=.
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
with:
token: ${{ secrets.CODECOV_TOKEN }}

dispatch:
name: Reporters-DB-Dipatch
if: github.event_name == 'repository_dispatch'
benchmark:
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v4

- uses: actions/checkout@v4
- name: Set up Python
id: setup-python
uses: actions/setup-python@v5
with:
python-version: 3.12

- name: Install uv
uses: astral-sh/setup-uv@v6
python-version: "3.11"
- name: Install Poetry
uses: snok/install-poetry@v1
with:
enable-cache: true
version: "0.7.x"

- name: Add or Update comment on PR that Test is running
virtualenvs-create: true
virtualenvs-in-project: true
- name: Load cached venv
id: cached-poetry-dependencies
uses: actions/cache@v3
with:
path: .venv
key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}
- name: Install dependencies
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
run: poetry install --sync
- name: Run benchmarks
run: poetry run pytest --benchmark-only --benchmark-json=benchmark.json
- name: Convert benchmark results
run: |
python - <<'EOF'
import json
with open("benchmark.json") as f:
data = json.load(f)
with open("benchmark-results.txt", "w") as f:
for bench in data["benchmarks"]:
f.write(f"{bench['fullname']}: {bench['stats']['mean']:.6f} sec\n")
EOF
- name: Benchmark Pull Request
uses: marocchino/sticky-pull-request-comment@v2
with:
recreate: true
header: benchmark
GITHUB_TOKEN: ${{ secrets.FREELAWBOT_TOKEN }}
number: ${{ github.event.client_payload.pr_number }}
repo: reporters-db
message: |
Eyecite Benchmarking in progress ...
This message will be updated when the test is completed.
path: benchmark-results.txt

- name: Install Python dependencies
run: |
uv sync --frozen --no-group dev --group benchmark
source .venv/bin/activate
echo "$VIRTUAL_ENV/bin" >> $GITHUB_PATH
echo "VIRTUAL_ENV=$VIRTUAL_ENV" >> $GITHUB_ENV

- name: Run Tests
run: |
uv pip install "git+https://github.com/freelawproject/reporters-db.git"
echo ${{ github.event.client_payload.pr_number }}
curl https://storage.courtlistener.com/bulk-data/eyecite/tests/one-percent.csv.bz2 --output benchmark/bulk-file.csv.bz2
python benchmark/benchmark.py --branches original
uv pip install "git+https://github.com/freelawproject/reporters-db.git@${{ github.event.client_payload.commit }}"
python benchmark/benchmark.py --branches original update --reporters --pr ${{ github.event.client_payload.pr_number }}
mkdir results
mv benchmark/output.csv benchmark/original.json benchmark/update.json benchmark/report.md benchmark/chart.png results/

- name: Pushes test file
uses: dmnemec/copy_file_to_another_repo_action@main
env:
API_TOKEN_GITHUB: ${{ secrets.FREELAWBOT_TOKEN }}
docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
user_email: 'info@free.law'
user_name: 'freelawbot'
source_file: 'results/'
destination_repo: 'freelawproject/reporters-db'
destination_folder: '${{ github.event.client_payload.pr_number }}'
destination_branch: 'artifacts'
commit_message: 'feat(ci): Add artifacts for PR #${{ github.event.client_payload.pr_number }}'

- name: Add or Update PR Comment from Generated Report
uses: marocchino/sticky-pull-request-comment@v2
python-version: "3.11"
- name: Install Poetry
uses: snok/install-poetry@v1
with:
recreate: true
GITHUB_TOKEN: ${{ secrets.FREELAWBOT_TOKEN }}
path: results/report.md
number: ${{ github.event.client_payload.pr_number }}
repo: reporters-db
virtualenvs-create: true
virtualenvs-in-project: true
- name: Load cached venv
id: cached-poetry-dependencies
uses: actions/cache@v3
with:
path: .venv
key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}
- name: Install dependencies
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
run: poetry install --sync
- name: Build docs
run: poetry run mkdocs build --strict
- name: Deploy docs
if: github.ref == 'refs/heads/main'
uses: peaceiris/actions-gh-pages@v4
with:
github_token: ${{ secrets.FREELAWBOT_TOKEN }}
publish_dir: ./site

9 changes: 4 additions & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ default_language_version:
python: "python3.12"
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
rev: v6.0.0
hooks:
- id: check-added-large-files
- id: check-ast
Expand All @@ -19,14 +19,13 @@ repos:
- id: debug-statements
- id: detect-private-key
- id: fix-byte-order-marker
- id: fix-encoding-pragma
args: [--remove]

- id: trailing-whitespace
args: [--markdown-linebreak-ext=md]

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.11.8
rev: v0.13.2
hooks:
- id: ruff
args: [ --fix ]
args: [ --fix, --unsafe-fixes ]
- id: ruff-format
2 changes: 1 addition & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
The following changes are not yet released, but are code complete:

Features:
-
- Add extended citation models for constitutions, regulations, court rules, legislative bills, session laws, journal articles, scientific identifiers, and attorney general opinions

Changes:
-
Expand Down
Loading
Loading