fix: dosage proximity assignment, RxNorm threshold, non-Latin detection by SPerekrestova · Pull Request #24 · SPerekrestova/pillchecker-api

SPerekrestova · 2026-03-25T10:41:43Z

Summary

Add character offset to Dosage dataclass for position-aware matching
Assign dosages to drugs by proximity (nearest within 50 chars) instead of all drugs sharing the first dosage (MED-2)
Raise RxNorm fallback threshold from 6.0 to 10.0 to eliminate false positives like "Take Action" (HIGH-4)
Detect predominantly non-Latin text and return explanatory note instead of empty results (LOW-2)
Add note field to AnalyzeResponse schema
Align dosage/NER coordinate systems by extracting dosages from cleaned text (from code review)

Test plan

153 tests pass (5 new tests for proximity, threshold, non-Latin)
Deploy and test multi-drug text: "Lisinopril 10mg, Metformin 500mg" — each drug should get its own dosage
Verify "Take 1 tablet daily" no longer matches to "Take Action" drug
Verify Chinese/Arabic text returns note explaining Latin-only support

🤖 Generated with Claude Code

The start field records where each dosage appears in the source text, enabling position-based drug-to-dosage assignment in the next commit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Each drug now gets the dosage nearest to it in the text, rather than all drugs sharing the first dosage found. Uses character offsets from the Dosage dataclass. Max distance threshold: 50 chars. Fixes QA report MED-2. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Eliminates false positive drug matches like 'Take Action' (score 8.9) and 'Tice BCG' (score 7.5) which occurred when instruction text or dosage-only text was sent to the RxNorm approximate search. Existing tests updated to use scores above the new threshold so they continue to test the intended behavior (empty-name filtering). Fixes QA report HIGH-4. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When input text is predominantly non-Latin characters, return empty drugs list with a note explaining that only Latin-script drug names are supported. Mixed-script text with mostly Latin still processes normally. Fixes QA report LOW-2. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Extract dosages from cleaned text (not original) so character offsets match NER entity positions. Also type _nearest_dosage and _enrich_ner_results params as list[Dosage]. Caught by code review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Combines dosage/fallback changes with input hardening (PR #22) and infrastructure (PR #23) changes. All 160 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SPerekrestova and others added 6 commits March 24, 2026 21:54

feat: add character offset to Dosage dataclass

4b7c569

The start field records where each dosage appears in the source text, enabling position-based drug-to-dosage assignment in the next commit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

merge: resolve conflicts with main (input hardening + infrastructure)

463d4ba

Combines dosage/fallback changes with input hardening (PR #22) and infrastructure (PR #23) changes. All 160 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SPerekrestova merged commit 8b53f4c into main Mar 25, 2026
1 check passed

SPerekrestova deleted the fix/dosage-and-fallback branch March 25, 2026 10:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: dosage proximity assignment, RxNorm threshold, non-Latin detection#24

fix: dosage proximity assignment, RxNorm threshold, non-Latin detection#24
SPerekrestova merged 6 commits intomainfrom
fix/dosage-and-fallback

SPerekrestova commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SPerekrestova commented Mar 25, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant