Skip to content

How to deal with the same proteins with slightly different names from the RefSeq_bac database? #52

@monaparizadeh

Description

@monaparizadeh

Hi Sam,

In my statistical analysis, some of the functions from the RefSeq_bac database are being categorized as different proteins only because of a small difference in their names like a dash (e.g. "(3R)-hydroxymyristoyl ACP dehydratase" "(3R)-hydroxymyristoyl-ACP dehydratase"), a comma, or lower/uppercase letters (e.g. "(2fe-2S)-binding domain-containing protein" and "(2Fe-2S)-binding domain-containing protein").
Also, some others are partial or complete sequences of the same protein (e.g. "(2Fe-2S) ferredoxin" and "(2Fe-2S) ferredoxin, partial").

I wanted to know if you correct those names in the database or after annotation-aggregation. And if yes, would you please guide me on how to do it?

-Mona

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions