Reducing RAM footprint + Adding preferred term output#67
Reducing RAM footprint + Adding preferred term output#67fschlatt wants to merge 12 commits intoGeorgetown-IR-Lab:masterfrom
Conversation
|
Hmm, after finding a bug in my code which removed excluded a good portion of terms from being included in the simstring DB, the RAM reduction isn't as much as I had hoped. A couple of things are fixed, but large UMLS sets are still not processable |
|
Hi! Active Subset: excludes "legacy" sources that have not been updated for several years in the UMLS Metathesaurus. |
|
hello, thank you for your commit , but how we can get synonyms ans source (Snomed,MSH ...etc)
|
The preferred term for every match is also returned (useful for normalizing terms in a text).
The RAM footprint is reduced by removing the sets in which the terms are accumulated. Alternatively, only a set of already saved terms is kept per concept. As a consequence, duplicate terms can be insereted into the simstring database, when 2 equal terms are included different UMLS concepts. As a fix, the duplicates from the simstring database are removed when matching