Skip to content

feat(analyzer): add generic VIN and IMEI recognizers#2070

Open
thatomokoena wants to merge 10 commits into
data-privacy-stack:mainfrom
thatomokoena:feature/generic-vin-imei-recognizers
Open

feat(analyzer): add generic VIN and IMEI recognizers#2070
thatomokoena wants to merge 10 commits into
data-privacy-stack:mainfrom
thatomokoena:feature/generic-vin-imei-recognizers

Conversation

@thatomokoena

@thatomokoena thatomokoena commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Change Description

  • Adds generic VinRecognizer for VIN (17-character ISO 3779 vehicle identifiers) with pattern matching, context words, and North American mod-11 check-digit validation. Valid check digits boost confidence to MAX_SCORE for any region; invalid check digits are rejected for North American WMIs (prefix 1-5); non-NA mismatches keep the base pattern score.
  • Adds generic ImeiRecognizer for IMEI (15-digit mobile device identifiers) with a formatted pattern (##-######-######-#), context words, and Luhn checksum invalidation. Bare 15-digit matching is omitted to avoid collisions with other Luhn identifiers such as AMEX credit card numbers.
  • Registers both recognizers in default_recognizers.yaml (enabled by default), exports them from predefined_recognizers, and documents them in supported_entities.md and CHANGELOG.md.
  • Adds test_vin_recognizer.py and test_imei_recognizer.py; all pass locally. ruff check passes on new recognizer source files.

Issue reference

N/A

Checklist

  • I have reviewed the contribution guidelines
  • I have signed the CLA (if required)
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

Add VinRecognizer and ImeiRecognizer as enabled predefined recognizers with
pattern matching, context support, and checksum validation for vehicle and
mobile device identifiers.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copilot AI review requested due to automatic review settings June 17, 2026 14:02
@thatomokoena

Copy link
Copy Markdown
Contributor Author

@microsoft-github-policy-service agree

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds two new generic predefined recognizers (VIN and IMEI) to Presidio Analyzer and wires them into the default registry/config, along with tests and documentation updates.

Changes:

  • Introduces VinRecognizer with a VIN regex plus check-digit validation logic.
  • Introduces ImeiRecognizer with IMEI regex patterns plus Luhn checksum invalidation.
  • Registers both recognizers in package exports, default YAML config, supported entities docs, and changelog.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
presidio-analyzer/presidio_analyzer/predefined_recognizers/generic/vin_recognizer.py New VIN recognizer implementation with mod-11 check digit logic
presidio-analyzer/presidio_analyzer/predefined_recognizers/generic/imei_recognizer.py New IMEI recognizer implementation with Luhn validation
presidio-analyzer/presidio_analyzer/predefined_recognizers/generic/init.py Exposes new recognizers from the generic package
presidio-analyzer/presidio_analyzer/predefined_recognizers/init.py Exposes new recognizers at the top-level predefined module
presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml Enables loading the new recognizers by default
presidio-analyzer/tests/test_vin_recognizer.py Adds unit tests for VIN detection and validation behavior
presidio-analyzer/tests/test_imei_recognizer.py Adds unit tests for IMEI detection and invalidation behavior
docs/supported_entities.md Documents newly supported IMEI and VIN entities
CHANGELOG.md Records the addition of VIN and IMEI recognizers

Copilot AI review requested due to automatic review settings June 18, 2026 06:22

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

thatomokoena and others added 2 commits June 18, 2026 08:39
Reject invalid North American VIN check digits for WMI prefixes 1-5 while
preserving base scores for non-NA VINs. Remove bare 15-digit IMEI pattern
to avoid collisions with AMEX and other Luhn identifiers.

Co-authored-by: Cursor <cursoragent@cursor.com>
Wire sanitize_value through invalidate_result so custom separator
replacement_pairs are honored, matching other pattern recognizers.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copilot AI review requested due to automatic review settings June 18, 2026 06:44

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Align IMEI Luhn description and VIN validate_result return docs with
actual behavior per Copilot review feedback.

Co-authored-by: Cursor <cursoragent@cursor.com>
@thatoisnaked

Copy link
Copy Markdown

Hi @SharonHart @omri374. When you have time, would you be willing to take a look at this PR for adding IMEI and VIN recognizers? Happy to address any feedback. Thanks!

Reflect two new enabled predefined recognizers in the registry test
assertion.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copilot AI review requested due to automatic review settings June 18, 2026 12:32
@thatomokoena

thatomokoena commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Hi @SharonHart. The CI failure on PR #2070 is fixed.

The Test Analyzer job was failing because test_recognizer_registry.py still expected 28 enabled predefined recognizers; adding VinRecognizer and ImeiRecognizer bumped the count to 30. I updated the assertion accordingly and pushed commit b3086be (test(analyzer): update recognizer registry count for VIN and IMEI). The registry, VIN, and IMEI tests all pass locally.

CI should be green on the next run. When you have a moment, could you take another look? Thanks!

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Comment thread CHANGELOG.md Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 13:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

thatomokoena and others added 2 commits June 18, 2026 15:38
…rivacy-stack#2070

Document space-delimited IMEI formats in the changelog and add
ImeiRecognizer and VinRecognizer to PREDEFINED_RECOGNIZERS so default
engine tests cover the new recognizers.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copilot AI review requested due to automatic review settings June 22, 2026 06:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Comment on lines 19 to 21
|IBAN_CODE|The International Bank Account Number (IBAN) is an internationally agreed system of identifying bank accounts across national borders to facilitate the communication and processing of cross border transactions with a reduced risk of transcription errors.|Pattern match, context and checksum|
|IMEI|International Mobile Equipment Identity, a 15-digit identifier for mobile devices.|Pattern match, context and checksum|
|IP_ADDRESS|An Internet Protocol (IP) address (either IPv4 or IPv6).|Pattern match, context and checksum|
Comment on lines +21 to +24
("Vehicle VIN is 1HGCM82633A004352", 1, ((15, 32),), ((0.5, "max"),)),
("chassis number 1HGCM82633A004352 recorded", 1, ((15, 32),), ((0.5, "max"),)),
("vin: 1hgcm82633a004352", 1, ((5, 22),), ((0.5, "max"),)),
("The vehicle identification number is 1HGCM82633A004352", 1, ((37, 54),), ((0.5, "max"),)),
Comment on lines +51 to +53
if fn_score == "max":
fn_score = max_score
assert_result_within_score_range(
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants