Skip to content

Low number of good pairs after filtering  #52

@myxotheles

Description

@myxotheles

Hi,

I am trying to get duplex sequencing up but I find I get a very low number of 'good pairs' after filtering and consenquently, a very low number of called duplex reads. For example:

Total Reads 28801102
Read pairs (n) 10901142
Paired (%) 75
Good pairs 1151139
Good pairs (%) 4
BAM Duplex reads 1002726
Percentage of original reads (%) 3.48
Mapped 94%

So in this example, I am left with only 4% of the original reads.

I am using the basic usage as recommended:

duplex_tools pairs_from_summary $output_dir/sequencing_summary.txt $output_dir

duplex_tools filter_pairs $output_dir/pair_ids.txt $output_dir

nanopore_guppy guppy_basecaller_duplex \
        --input_path $input_dir \
        -r --save_path $duplex_dir \
        --device auto \
        --config $model \
        --duplex_pairing_mode from_pair_list \
        --duplex_pairing_file $output_dir/pair_ids_filtered.txt \
        --align_ref $ref \
        --bam_out

Questions:

Why do I get so few good pairs and subsequently good reads? 4% is a bit useless.
Should I skip the filtering step and run the second guppy run with the pair_ids.text instead?

Lastly, the duplex basecalling could benefit from simplification. Dorado usage looks good but I am getting errors so its not working at the moment. Would be great if guppy could be simplified!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions