Skip to content

Packaging improvements#8

Open
lmmx wants to merge 16 commits into
davidjurgens:mainfrom
lmmx:lint
Open

Packaging improvements#8
lmmx wants to merge 16 commits into
davidjurgens:mainfrom
lmmx:lint

Conversation

@lmmx
Copy link
Copy Markdown
Contributor

@lmmx lmmx commented Feb 1, 2026

Status - ready for review 🏁

This PR carries out some housekeeping and I have verified them all as I go.

First I ran some linting (ruff code formatter and flake8 static analyser) that put the code in a standard format and fixed some mistakes (e.g. unnamed variables) and clarified ambiguous code (e.g. made * imports explicit).

  • upgraded setup.py to pyproject.toml
  • bumped setuptools version
  • bumped required Python version to 3.10 (as 3.7 was end of life in 2018)
  • added README to the package metadata (current package has no documentation on PyPI)
  • added keywords to the package metadata for discoverability
  • removed transitive dependencies (blis and confection are pulled in by spacy's thinc dep)
  • added optional "extra" dependency groups:
    • plots - the matplotlib and seaborn deps
    • dev - ruff and flake8
  • made plotting library loading lazy and conditional (will not raise an error if not installed unless a plotting function is used, in which case it hints to install the biberplus[plots] extra)
  • deleted requirements.txt which is redundant with the pyproject.toml package dependencies and pinned a spaCy model version of 3.5 which would fall behind (latest is currently 3.8)
  • produced a dependency lockfile with uv (uv sync --all-extras)
  • fixed syntax warnings (slashes aren't special in Python regexes and doesn't need escaping)
  • removed failing test (invalid sentence-final interjection UH tag, another test already covers this)

Checklist

  • Check tests pass
    • One test failure was an invalid interjection, but there is already test coverage (bad duplicate), removed this test:

FAILED tests/tagger/test_additional_features.py::TestAdditionalFunctions::test_interjections_end2 - AssertionError: 'UH' not found in ['JJ']

Click to show details on the regex syntax warnings

Regex syntax warnings

There are 2 warnings:

Regex warning 1: emoticon escape

biberplus/tagger/biber_plus_tagger.py:1229
  /home/louis/dev/biberplus/biberplus/tagger/biber_plus_tagger.py:1229: SyntaxWarning: invalid escape sequence '\/'
    emoticon_pattern = re.compile("[:;=](?:-)?[)DPp\/]")
>>> import re
>>> emoticon_pattern = re.compile(r"[:;=](?:-)?[)DPp/]")
>>> emoticon_pattern.findall("hello :) how are you ;-D")
[':)', ';-D']
>>> emoticon_pattern.findall("lol :P this is fun :-) yeah =/")
[':P', ':-)', '=/']

The pattern matches:

  • [:;=] - eyes (colon, semicolon, or equals)
  • (?:-)? - optional nose (dash)
  • [)DPp/] - mouth (smile, big grin, tongue, or slant)

The error is from escaping the slanted mouth of :-/

>>> emoticon_pattern = re.compile("[:;=](?:-)?[)DPp\/]")
<stdin>:1: SyntaxWarning: invalid escape sequence '\/'
>>> emoticon_pattern.findall("hm :-/ um... =/")
[':-/', '=/']

We don't need the backslash

>>> emoticon_pattern = re.compile("[:;=](?:-)?[)DPp/]")
>>> emoticon_pattern.findall("hm :-/ um... =/")
[':-/', '=/']

Regex warning 2: URL escape

Similarly

>>> import biberplus
/home/louis/dev/biberplus/biberplus/tagger/biber_plus_tagger.py:1287: SyntaxWarning: invalid escape sequence '\/'
  url_pattern = "(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,})"

This one requires a raw string for the legitimate escapes \s and \. but we should just remove the backslash escaping the / (which Python doesn't need, that's Javascript-like regex)

-url_pattern = "(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,})"
+url_pattern = r"(https?://(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?://(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,})"

Specifically this removes the escaping of \/\/ -> // in the https:// parts of the regex

...and with that biberplus imports without a SyntaxWarning 🎉

@KenanA95
Copy link
Copy Markdown
Collaborator

KenanA95 commented Feb 2, 2026

This is great.

It will be difficult review with all of the linter changes included. Can you separate those out?

@lmmx lmmx mentioned this pull request Feb 2, 2026
@lmmx
Copy link
Copy Markdown
Contributor Author

lmmx commented Feb 2, 2026

I don't know if git will (easily) let me "undo" those commits but I went again in reverse in #9 and avoided running the ruff re-formatting (I usually use an editor that auto-formats on saves!)

I've initially stopped short of running the flake8 lints (i.e. I re-did back to commit 4 of this PR at 90db95b) because I wasn't sure if you meant lint as in just code formatters or also the flake8 ones (some of which are correctness), but I'll leave it there in case I misinterpret!

Assuming that PR is OK, remaining work is just the re-formatting and removing linter errors, please let me know how you'd want those (if you have any particular preferences) 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants