feat(analyzer): add recognizer-level threshold config by rodboev · Pull Request #2116 · data-privacy-stack/presidio

rodboev · 2026-06-28T04:08:46Z

Change Description

Adds analyzer-level YAML configuration for recognizer-specific score thresholds in presidio-analyzer. The new configuration lets users set a threshold for a recognizer and optionally override it for a specific entity type, while preserving the current scalar score_threshold and default_score_threshold behavior when the new config is absent.

Proposed YAML shape:

recognizer_score_thresholds:
  CreditCardRecognizer:
    default: 0.4
    CREDIT_CARD: 0.7
  SpacyRecognizer:
    PERSON: 0.6

Recognizer and entity thresholds are applied in the existing result-filtering path after recognizer execution and context enhancement, before duplicate removal collapses equivalent spans. Explicit score_threshold arguments remain a global per-call override, and unmatched recognizers continue to fall back to default_score_threshold.

This also updates the analyzer docs, the no-code tutorial example, and the root changelog.

Issue reference

Fixes #1572

Tests

poetry run pytest tests/test_analyzer_engine.py tests/test_analyzer_engine_provider.py tests/test_configuration_validator.py -q
poetry run ruff check presidio_analyzer/analyzer_engine.py presidio_analyzer/analyzer_engine_provider.py presidio_analyzer/input_validation/schemas.py tests/test_analyzer_engine.py tests/test_analyzer_engine_provider.py tests/test_configuration_validator.py

Note on CHANGELOG

Update CHANGELOG.md under [unreleased] → Analyzer → Added.

Checklist

I have reviewed the contribution guidelines
I have signed the CLA (if required)
My code includes unit tests
All unit tests and lint checks pass locally
My PR contains documentation updates / additions if required

rodboev · 2026-06-28T04:09:00Z

@microsoft-github-policy-service agree

Copilot

Pull request overview

Adds support in presidio-analyzer for configuring score thresholds per recognizer (and optionally per entity within that recognizer) via YAML, while preserving the existing global default_score_threshold / request-level score_threshold behavior.

Changes:

Introduces recognizer_score_thresholds configuration, applies it during result filtering (before duplicate collapse) when no request-level score_threshold is provided.
Adds validation/normalization for the new configuration shape (including numeric shorthand) and expands unit test coverage for precedence and error cases.
Updates analyzer configuration docs, the no-code tutorial, and the root changelog to document the new option.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
presidio-analyzer/presidio_analyzer/analyzer_engine.py	Adds recognizer/entity-specific threshold support and applies filtering before deduplication.
presidio-analyzer/presidio_analyzer/analyzer_engine_provider.py	Loads `recognizer_score_thresholds` from analyzer YAML and passes it into `AnalyzerEngine`.
presidio-analyzer/presidio_analyzer/input_validation/schemas.py	Validates and normalizes `recognizer_score_thresholds` in analyzer configuration validation.
presidio-analyzer/presidio_analyzer/conf/default_analyzer_full.yaml	Documents the new YAML option with a commented example.
presidio-analyzer/tests/test_analyzer_engine.py	Adds deterministic tests covering precedence, shorthand, invalid values, and dedupe ordering.
presidio-analyzer/tests/test_analyzer_engine_provider.py	Verifies provider passes/normalizes thresholds from YAML and that defaults remain unchanged otherwise.
presidio-analyzer/tests/test_configuration_validator.py	Adds validator tests for valid shorthand + invalid threshold structures/types/ranges.
docs/tutorial/08_no_code.md	Updates no-code YAML example and explains when to use recognizer-level thresholds.
docs/analyzer/analyzer_engine_provider.md	Documents the new configuration key and provides a YAML example.
CHANGELOG.md	Adds an Unreleased entry documenting the new analyzer YAML capability.

rodboev added 6 commits June 27, 2026 23:16

Allow per-recognizer analyzer tuning from YAML

9f4ec7d

Keep boolean flags from slipping into threshold config

cf7c572

Keep analyzer docs aligned with the new threshold contract

0e162d2

Keep direct threshold config aligned with YAML validation

53ad752

Preserve lenient recognizers when duplicate spans collide

87c455b

Keep threshold validation errors tied to the config key

89ff6ce

Copilot AI review requested due to automatic review settings June 28, 2026 04:08

github-actions Bot added the external label Jun 28, 2026

Copilot started reviewing on behalf of rodboev June 28, 2026 04:09 View session

Copilot AI reviewed Jun 28, 2026

View reviewed changes

Comment thread presidio-analyzer/presidio_analyzer/input_validation/schemas.py

Comment thread presidio-analyzer/presidio_analyzer/analyzer_engine.py Outdated

Comment thread presidio-analyzer/presidio_analyzer/analyzer_engine.py

Comment thread presidio-analyzer/presidio_analyzer/analyzer_engine.py

Keep constructor threshold hints aligned with shared validation

9ebc0e8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(analyzer): add recognizer-level threshold config#2116

feat(analyzer): add recognizer-level threshold config#2116
rodboev wants to merge 7 commits into
data-privacy-stack:mainfrom
rodboev:pr/recognizer-threshold-config

rodboev commented Jun 28, 2026 •

edited

Loading

Uh oh!

rodboev commented Jun 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

rodboev commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Description

Issue reference

Tests

Note on CHANGELOG

Checklist

Uh oh!

rodboev commented Jun 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rodboev commented Jun 28, 2026 •

edited

Loading