Skip to content

Duplicate rows in GTEx v8 PFB export #113

@ianfore

Description

@ianfore

Following the instructions on Accessing GTEX v8 phenotypic data here.

When running PyPFB on the export of the GTEx data the sequencing.tsv file contains duplicates of almost all rows. The other tsvs do not have duplicates. It is not clear if the duplicates exist in the PFB file or are generated when PyPFB converts it to tsv. Given the sequencing file is the only one that shows this problem the more likely guess is that the duplication is present in the PFB.

46 rows in sequencing.tsv do not appear to be duplicates. These are the sequencing files related to the project as a whole rather than to samples (see parent_type). This also suggests the duplicates are present in the PFB/Avro and are not generated by PyPFB.

See also the related pull request which deals with linking between tsvs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions