Skip to content

Investigate use of all trait names vs. preferred trait names in OT evidence generation #384

@apriltuesday

Description

@apriltuesday

Context of issue:
When we do trait mapping (automated and manual), we use only preferred names, but when we annotate we attempt to use all names. Because we retain previous mappings even if they don't appear (i.e. don't appear among preferred names in current ClinVar), this means obsolete mappings can be not just retained but also used without being updated.

Example - in ClinVar:

    <TraitSet Type="Disease" ID="6307">
      <Trait ID="4675" Type="Disease">
        <Name>
          <ElementValue Type="Preferred">Malignant tumor of urinary bladder</ElementValue>
          <XRef ID="Bladder+cancer/7822" DB="Genetic Alliance"/>
          <XRef ID="399326009" DB="SNOMED CT"/>
        </Name>
        <Name>
          <ElementValue Type="Alternate">Urinary bladder cancer</ElementValue>
          <XRef ID="MONDO:0001187" DB="MONDO"/>
        </Name>
        <Name>
          <ElementValue Type="Alternate">Urinary Bladder Neoplasms</ElementValue>
          <XRef ID="D001749" DB="MeSH"/>
        </Name>
        <Name>
          <ElementValue Type="Alternate">Bladder cancer</ElementValue>
        </Name>
        <AttributeSet>
          <Attribute Type="keyword">Hereditary cancer syndrome</Attribute>
        </AttributeSet>
        <XRef ID="MONDO:0001187" DB="MONDO"/>
        <XRef ID="C0005684" DB="MedGen"/>
        <XRef Type="MIM" ID="109800" DB="OMIM"/>
      </Trait>
    </TraitSet>

In latest mappings:

# preferred name yields up-to-date mapping
$ grep -i '^Malignant tumor of urinary bladder' latest_mappings.tsv
malignant tumor of urinary bladder	http://purl.obolibrary.org/obo/MONDO_0004986	urinary bladder carcinoma

# alternate name yields obsolete mapping
$ grep -i '^Urinary bladder cancer' latest_mappings.tsv
urinary bladder cancer	http://www.ebi.ac.uk/efo/EFO_0000292	bladder carcinoma

In #383 we modified annotated XML generation to use only preferred names, observing that it decreased coverage of traits only slightly while decreasing the number of obsolete EFO terms used significantly.

The goal of this issue is to see what is the impact of making a similar change for OT evidence string generation (which is more complicated due to how it groups and explodes traits), and if it is acceptable make the change.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions