fix: dosage proximity assignment, RxNorm threshold, non-Latin detection#24
Merged
SPerekrestova merged 6 commits intomainfrom Mar 25, 2026
Merged
fix: dosage proximity assignment, RxNorm threshold, non-Latin detection#24SPerekrestova merged 6 commits intomainfrom
SPerekrestova merged 6 commits intomainfrom
Conversation
The start field records where each dosage appears in the source text, enabling position-based drug-to-dosage assignment in the next commit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each drug now gets the dosage nearest to it in the text, rather than all drugs sharing the first dosage found. Uses character offsets from the Dosage dataclass. Max distance threshold: 50 chars. Fixes QA report MED-2. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Eliminates false positive drug matches like 'Take Action' (score 8.9) and 'Tice BCG' (score 7.5) which occurred when instruction text or dosage-only text was sent to the RxNorm approximate search. Existing tests updated to use scores above the new threshold so they continue to test the intended behavior (empty-name filtering). Fixes QA report HIGH-4. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When input text is predominantly non-Latin characters, return empty drugs list with a note explaining that only Latin-script drug names are supported. Mixed-script text with mostly Latin still processes normally. Fixes QA report LOW-2. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract dosages from cleaned text (not original) so character offsets match NER entity positions. Also type _nearest_dosage and _enrich_ner_results params as list[Dosage]. Caught by code review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Dosagedataclass for position-aware matchingnotefield toAnalyzeResponseschemaTest plan
🤖 Generated with Claude Code