Both the large and medium datasets on /allen contain duplicate gene names with non-duplicate gene values. Assuming these samples aren't outliers, we need a process for de-duplicating gene names in the pipeline as the duplicate ids are not only obfuscating for users but cause projection to crash.
Tasks:
Validation:
Both the large and medium datasets on
/allencontain duplicate gene names with non-duplicate gene values. Assuming these samples aren't outliers, we need a process for de-duplicating gene names in the pipeline as the duplicate ids are not only obfuscating for users but cause projection to crash.Tasks:
Validation: