Skip to content

feat(analyzer): Add ONNX Runtime backend to HuggingFaceNerRecognizer#2086

Open
yuriihavrylko wants to merge 7 commits into
data-privacy-stack:mainfrom
yuriihavrylko:feat/huggingface-recognizer-onnx
Open

feat(analyzer): Add ONNX Runtime backend to HuggingFaceNerRecognizer#2086
yuriihavrylko wants to merge 7 commits into
data-privacy-stack:mainfrom
yuriihavrylko:feat/huggingface-recognizer-onnx

Conversation

@yuriihavrylko

Copy link
Copy Markdown
Contributor

Change Description

Adds an optional ONNX Runtime inference backend to HuggingFaceNerRecognizer, selected via a new backend parameter ("torch" - default, unchanged - or "ort"). The ort backend loads token-classification models through optimum.onnxruntime, enabling:

  • Pre-quantized ONNX variants (FP16/INT8/4-bit) published on the HF Hub
  • Hardware selection via ONNX Runtime execution providers (CUDA, TensorRT, OpenVINO, CoreML, ROCm) without code changes

How

  • backend parameter on HuggingFaceNerRecognizer. For "ort", the recognizer pre-loads ORTModelForTokenClassification.from_pretrained() explicitly and hands the model object to the pipeline. This scopes loader kwargs (subfolder, file_name, provider, …) to the model loader only - required for mixed-layout repos (onnx-community/*, etc) that keep ONNX files under onnx/ while config/tokenizer live at the repo root. Passing these at the pipeline level breaks on such repos.
  • **model_kwargs pass-through (mirrors GLiNERRecognizer): extra kwargs — from Python or YAML — are forwarded to the active backend's loader. No recognizer changes needed for future loader options.
  • onnxruntime extra in pyproject.toml: pip install 'presidio-analyzer[onnxruntime]' (CPU build). GPU/accelerator builds are installed directly instead of the extra (see docs) because the onnxruntime* packages ship the same Python module and conflict.

Usage

- name: "HF NER (ONNX)"
  type: predefined
  class_name: HuggingFaceNerRecognizer
  model_name: onnx-community/stanford-deidentifier-base-ONNX
  backend: ort
  subfolder: onnx
  file_name: model_quantized.onnx
  label_mapping:
    PATIENT: PERSON
    HCW: PERSON
    PHONE: PHONE_NUMBER

Breaking change

Unknown constructor kwargs were previously logged and dropped; they are now forwarded to the model loader and raise TypeError if the loader rejects them. Typos in YAML recognizer configs that loaded silently before will now fail loudly at startup. This is intentional (silent misconfiguration of a PII detector is worse than a startup error), but existing configs with stray keys need cleanup.

Issue reference

None

Checklist

  • I have reviewed the contribution guidelines
  • I have signed the CLA (if required)
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

@yuriihavrylko yuriihavrylko requested a review from a team as a code owner June 20, 2026 13:59
Copilot AI review requested due to automatic review settings June 20, 2026 13:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends presidio-analyzer’s HuggingFaceNerRecognizer to support an optional ONNX Runtime inference path (via Optimum), enabling execution-provider based acceleration and use of pre-quantized ONNX models while preserving the existing PyTorch/transformers behavior by default.

Changes:

  • Added backend selection (torch default / ort) and **model_kwargs pass-through to support Optimum ORT model loading and future loader options.
  • Added optional dependency group (onnxruntime) plus configuration/docs for running HF NER via ONNX Runtime execution providers.
  • Expanded unit tests and added new end-to-end tests for real model loading (torch + ort paths, when optional deps are present).

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
presidio-analyzer/presidio_analyzer/predefined_recognizers/ner/huggingface_ner_recognizer.py Implements backend selection and ORT loading path, and forwards extra loader kwargs.
presidio-analyzer/pyproject.toml Adds an onnxruntime optional-dependency group for Optimum/ORT usage.
presidio-analyzer/presidio_analyzer/input_validation/schemas.py Updates config validation dump behavior to exclude None values.
presidio-analyzer/presidio_analyzer/conf/hf_ner_onnx.yaml Adds a sample analyzer config using the new ORT backend with a mixed-layout HF repo.
presidio-analyzer/tests/test_huggingface_ner_recognizer.py Updates mocked tests to reflect model_kwargs and adds ORT-path unit coverage.
presidio-analyzer/tests/test_huggingface_ner_recognizer_e2e.py Adds new opt-in E2E tests that exercise real transformers/optimum pipelines.
mkdocs.yml Adds the new documentation page to the Analyzer docs nav.
docs/analyzer/recognizer_registry_provider.md Documents the new backend parameter and links to backend guidance.
docs/analyzer/nlp_engines/gpu_usage.md Links GPU usage docs to the new HF NER backend guidance.
docs/analyzer/huggingface_ner_inference.md New detailed guide for torch vs ORT backends, installation, and execution-provider configuration.

Comment thread presidio-analyzer/presidio_analyzer/conf/hf_ner_onnx.yaml Outdated
Copilot AI review requested due to automatic review settings June 26, 2026 21:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Comment on lines +490 to +494
mock_hf_pipeline.assert_called_once_with(
"token-classification",
model="test-model",
tokenizer="test-model",
aggregation_strategy="simple",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are false positives - EntityRecognizer.__init__ calls self.load(), so constructing the recognizer already triggers the pipeline/ORT model creation - the assertions run against that.

Verified the test passes as written. Adding an explicit rec.load() would be a redundant no-op (load() early-returns when the pipeline is already built)

Comment thread presidio-analyzer/tests/test_huggingface_ner_recognizer.py
Comment thread presidio-analyzer/tests/test_huggingface_ner_recognizer.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants