Low number of good pairs after filtering 

Hi,

I am trying to get duplex sequencing up but I find I get a very low number of 'good pairs' after filtering and consenquently, a very low number of called duplex reads. For example:



Total Reads 28801102
Read pairs (n) 10901142
Paired (%) 75
Good pairs 1151139
Good pairs (%) 4
BAM Duplex reads 1002726
Percentage of original reads (%)  3.48
Mapped  94%


So in this example, I am left with only 4% of the original reads.

I am using the basic usage as recommended:

`duplex_tools pairs_from_summary $output_dir/sequencing_summary.txt $output_dir`

`duplex_tools filter_pairs $output_dir/pair_ids.txt $output_dir`


```
nanopore_guppy guppy_basecaller_duplex \
        --input_path $input_dir \
        -r --save_path $duplex_dir \
        --device auto \
        --config $model \
        --duplex_pairing_mode from_pair_list \
        --duplex_pairing_file $output_dir/pair_ids_filtered.txt \
        --align_ref $ref \
        --bam_out
```


Questions:

Why do I get so few good pairs and subsequently good reads? 4% is a bit useless.
Should I skip the filtering step and run the second guppy run with the pair_ids.text instead?

Lastly, the duplex basecalling could benefit from simplification. Dorado usage looks good but I am getting errors so its not working at the moment. Would be great if guppy could be simplified!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low number of good pairs after filtering #52

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Low number of good pairs after filtering #52

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions