Skip to content

Data corruption #28

@i-strielkov

Description

@i-strielkov

Hi, we have been using your great tool for several years and saved as a lot of disc space! However, recently we have encountered and error that appears during DSRC encoding. Algorithm occasionally skips a number of reads at seemingly random position and then continues. The resulting file contain artifacts like this:

@L183:321:CAFVJANXX:6:2213:18462:88964 3:N:0:0
TATAAATGGATTCTCTTTGTCCATGATCACAAAATAAGAAT@L183:321:CAFVJANXX:6:2213:5699:93216 3:N:0:0

Renaming the reads solves the problem.
Do you happen to know what may cause such issues?

The problems were encountered with this public dataset: https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-10175/
In particular, the problem can be reproduced with this file: http://ftp.sra.ebi.ac.uk/vol1/run/ERR539/ERR5396174/AML_low_input_AAAACT_r2.fq.gz

Many thanks for any information in advance,
Best,
Ievgen

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions