Skip to content

fix: dosage proximity assignment, RxNorm threshold, non-Latin detection#24

Merged
SPerekrestova merged 6 commits intomainfrom
fix/dosage-and-fallback
Mar 25, 2026
Merged

fix: dosage proximity assignment, RxNorm threshold, non-Latin detection#24
SPerekrestova merged 6 commits intomainfrom
fix/dosage-and-fallback

Conversation

@SPerekrestova
Copy link
Copy Markdown
Owner

Summary

  • Add character offset to Dosage dataclass for position-aware matching
  • Assign dosages to drugs by proximity (nearest within 50 chars) instead of all drugs sharing the first dosage (MED-2)
  • Raise RxNorm fallback threshold from 6.0 to 10.0 to eliminate false positives like "Take Action" (HIGH-4)
  • Detect predominantly non-Latin text and return explanatory note instead of empty results (LOW-2)
  • Add note field to AnalyzeResponse schema
  • Align dosage/NER coordinate systems by extracting dosages from cleaned text (from code review)

Test plan

  • 153 tests pass (5 new tests for proximity, threshold, non-Latin)
  • Deploy and test multi-drug text: "Lisinopril 10mg, Metformin 500mg" — each drug should get its own dosage
  • Verify "Take 1 tablet daily" no longer matches to "Take Action" drug
  • Verify Chinese/Arabic text returns note explaining Latin-only support

🤖 Generated with Claude Code

SPerekrestova and others added 6 commits March 24, 2026 21:54
The start field records where each dosage appears in the source text,
enabling position-based drug-to-dosage assignment in the next commit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each drug now gets the dosage nearest to it in the text, rather than
all drugs sharing the first dosage found. Uses character offsets from
the Dosage dataclass. Max distance threshold: 50 chars.

Fixes QA report MED-2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Eliminates false positive drug matches like 'Take Action' (score 8.9)
and 'Tice BCG' (score 7.5) which occurred when instruction text or
dosage-only text was sent to the RxNorm approximate search.

Existing tests updated to use scores above the new threshold so they
continue to test the intended behavior (empty-name filtering).

Fixes QA report HIGH-4.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When input text is predominantly non-Latin characters, return empty
drugs list with a note explaining that only Latin-script drug names
are supported. Mixed-script text with mostly Latin still processes
normally.

Fixes QA report LOW-2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract dosages from cleaned text (not original) so character offsets
match NER entity positions. Also type _nearest_dosage and
_enrich_ner_results params as list[Dosage]. Caught by code review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Combines dosage/fallback changes with input hardening (PR #22) and
infrastructure (PR #23) changes. All 160 tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@SPerekrestova SPerekrestova merged commit 8b53f4c into main Mar 25, 2026
1 check passed
@SPerekrestova SPerekrestova deleted the fix/dosage-and-fallback branch March 25, 2026 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant