Skip to content

Fix FI HETU accepting 29 Feb of a non-leap century#2111

Open
jichaowang02-lang wants to merge 1 commit into
data-privacy-stack:mainfrom
jichaowang02-lang:fix/fi-hetu-leap-century-date
Open

Fix FI HETU accepting 29 Feb of a non-leap century#2111
jichaowang02-lang wants to merge 1 commit into
data-privacy-stack:mainfrom
jichaowang02-lang:fix/fi-hetu-leap-century-date

Conversation

@jichaowang02-lang

Copy link
Copy Markdown
Contributor

Change Description

validate_result checks the date with:

datetime.strptime(date_part, "%d%m%y")   # only sees the 2-digit year -> maps to the 2000s

The actual century is carried by the 7th character (the separator), which strptime never receives. So 29 Feb of a non-leap century is wrongly accepted:

FiPersonalIdentityCodeRecognizer().analyze("290200+311B", ["FI_PERSONAL_IDENTITY_CODE"])
#   -> recognized at score 1.0

290200+311B denotes 29 February 1800 (+ = the 1800s). 1800 is not a leap year (divisible by 100 but not 400), so that date does not exist — but strptime validates it as 29 Feb 2000, which is a leap year.

Fix

Resolve the century from the separator (+ → 1800s; - Y X W V U → 1900s; A B C D E F → 2000s) and validate the full date. Unknown separators fall back to the previous strptime check, so this change is independent of the separator character-class (it does not touch which separators are allowed).

Checklist

  • I have reviewed the contribution guidelines
  • I have added tests to cover my changes
  • All new and existing tests passed

Tests

$ pytest tests/test_fi_personal_identity_code_recognizer.py -q
32 passed

Adds 290200+311B (expected: not detected) — it fails before the fix (was accepted at 1.0). Valid codes such as 131052-308T are unchanged.

validate_result checked the date with datetime.strptime(date_part, "%d%m%y"),
which only sees the 2-digit year and maps it to the 2000s. The century is
actually carried by the 7th character (the separator), which strptime never
receives. So 29 Feb of a non-leap century is wrongly accepted: "290200+311B"
denotes 29 February 1800 ("+" = 1800s), but 1800 is not a leap year, yet
strptime validates it as 29 Feb 2000 (which is a leap year).

Resolve the century from the separator and validate the full date. Unknown
separators fall back to the previous strptime check, so this is independent of
the separator character-class handling.

Adds a regression case (290200+311B) that was wrongly accepted at score 1.0.
Copilot AI review requested due to automatic review settings June 27, 2026 16:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants