Skip to content

Security: Potential Regular Expression Denial of Service (ReDoS)#2095

Open
tuanaiseo wants to merge 1 commit into
data-privacy-stack:mainfrom
tuanaiseo:contribai/fix/security/potential-regular-expression-denial-of-s
Open

Security: Potential Regular Expression Denial of Service (ReDoS)#2095
tuanaiseo wants to merge 1 commit into
data-privacy-stack:mainfrom
tuanaiseo:contribai/fix/security/potential-regular-expression-denial-of-s

Conversation

@tuanaiseo

Copy link
Copy Markdown

Problem

Multiple recognizers use complex regular expressions that could be vulnerable to ReDoS attacks if an attacker can provide crafted input. The EmailRecognizer in particular has a complex regex pattern that could exhibit catastrophic backtracking.

Severity: medium
File: presidio-analyzer/presidio_analyzer/predefined_recognizers/generic/email_recognizer.py

Solution

Review and optimize regex patterns to prevent catastrophic backtracking. Consider using a regex timeout or implementing pattern complexity limits. Use more specific character classes and avoid nested quantifiers where possible.

Changes

  • presidio-analyzer/presidio_analyzer/predefined_recognizers/generic/email_recognizer.py (modified)

Testing

  • Existing tests pass
  • Manual review completed
  • No new warnings/errors introduced

Multiple recognizers use complex regular expressions that could be vulnerable to ReDoS attacks if an attacker can provide crafted input. The EmailRecognizer in particular has a complex regex pattern that could exhibit catastrophic backtracking.

Affected files: email_recognizer.py

Signed-off-by: tuanaiseo <221258316+tuanaiseo@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 23, 2026 23:14

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a potential Regular Expression Denial of Service (ReDoS) risk in Presidio Analyzer’s EmailRecognizer by simplifying the email detection regex used by the generic predefined recognizers.

Changes:

  • Replaced the previous complex email-matching regex with a simpler, more ReDoS-resistant pattern.
  • Kept the recognizer structure and tldextract-based validate_result() behavior unchanged.

Pattern(
"Email (Medium)",
r"\b((([!#$%&'*+\-/=?^_`{|}~\w])|([!#$%&'*+\-/=?^_`{|}~\w][!#$%&'*+\-/=?^_`{|}~\.\w]{0,}[!#$%&'*+\-/=?^_`{|}~\w]))[@]\w+([-.]\w+)*\.\w+([-.]\w+)*)\b",
r"\b[A-Za-z0-9!#$%&'*+\-/=?^_`{|}~]+@[A-Za-z0-9]+([-.][A-Za-z0-9]+)*\.[A-Za-z]{2,}\b",
Pattern(
"Email (Medium)",
r"\b((([!#$%&'*+\-/=?^_`{|}~\w])|([!#$%&'*+\-/=?^_`{|}~\w][!#$%&'*+\-/=?^_`{|}~\.\w]{0,}[!#$%&'*+\-/=?^_`{|}~\w]))[@]\w+([-.]\w+)*\.\w+([-.]\w+)*)\b",
r"\b[A-Za-z0-9!#$%&'*+\-/=?^_`{|}~]+@[A-Za-z0-9]+([-.][A-Za-z0-9]+)*\.[A-Za-z]{2,}\b",
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants