Skip to content

bugfix(patterns): word-boundary slug match — fix false positive on short slugs#6

Open
CryptoJones wants to merge 1 commit into
mainfrom
bugfix/patterns-slug-word-boundary
Open

bugfix(patterns): word-boundary slug match — fix false positive on short slugs#6
CryptoJones wants to merge 1 commit into
mainfrom
bugfix/patterns-slug-word-boundary

Conversation

@CryptoJones
Copy link
Copy Markdown
Owner

_slug_in_project did a naive substring search:

if slug.lower() in file_text.lower(): return True

That false-positives on any short slug whose letters appear inside a
longer word. Concrete cases:

slug "auth" matches "author", "authentic", "authority"
slug "api" matches "apiary", "rapidly", "tropical"
slug "db" matches "subdivision", "subdued"
slug "validate-numbers" matches "validate-numbers-attempt"

When that happens, the pattern is reported as "used elsewhere" and
silently does NOT appear in the UNUSED findings — masking genuinely
unused patterns and undermining the whole point of the report.

Fix: compile a word-boundary regex with custom boundary chars
(\w + -) so kebab-case slugs match as complete tokens but not as
prefixes/suffixes of longer kebab identifiers, and short slugs don't
match inside longer words.

(?<![\w-]){re.escape(slug)}(?![\w-])

Tests added (3):

  • slug auth in a project that contains author -> still UNUSED
  • slug validate-numbers in a project that contains
    validate-numbers-attempt -> still UNUSED
  • positive control: exact validate-numbers mention -> NOT unused

150/150 tests pass; ruff + mypy clean.

…ort slugs

`_slug_in_project` did a naive substring search:

    if slug.lower() in file_text.lower(): return True

That false-positives on any short slug whose letters appear inside a
longer word. Concrete cases:

  slug "auth"  matches "author", "authentic", "authority"
  slug "api"   matches "apiary", "rapidly", "tropical"
  slug "db"    matches "subdivision", "subdued"
  slug "validate-numbers" matches "validate-numbers-attempt"

When that happens, the pattern is reported as "used elsewhere" and
silently does NOT appear in the UNUSED findings — masking genuinely
unused patterns and undermining the whole point of the report.

Fix: compile a word-boundary regex with custom boundary chars
(`\w` + `-`) so kebab-case slugs match as complete tokens but not as
prefixes/suffixes of longer kebab identifiers, and short slugs don't
match inside longer words.

  (?<![\w-]){re.escape(slug)}(?![\w-])

Tests added (3):
- slug `auth` in a project that contains `author` -> still UNUSED
- slug `validate-numbers` in a project that contains
  `validate-numbers-attempt` -> still UNUSED
- positive control: exact `validate-numbers` mention -> NOT unused

150/150 tests pass; ruff + mypy clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant