-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Hi,
I am trying to get duplex sequencing up but I find I get a very low number of 'good pairs' after filtering and consenquently, a very low number of called duplex reads. For example:
Total Reads 28801102
Read pairs (n) 10901142
Paired (%) 75
Good pairs 1151139
Good pairs (%) 4
BAM Duplex reads 1002726
Percentage of original reads (%) 3.48
Mapped 94%
So in this example, I am left with only 4% of the original reads.
I am using the basic usage as recommended:
duplex_tools pairs_from_summary $output_dir/sequencing_summary.txt $output_dir
duplex_tools filter_pairs $output_dir/pair_ids.txt $output_dir
nanopore_guppy guppy_basecaller_duplex \
--input_path $input_dir \
-r --save_path $duplex_dir \
--device auto \
--config $model \
--duplex_pairing_mode from_pair_list \
--duplex_pairing_file $output_dir/pair_ids_filtered.txt \
--align_ref $ref \
--bam_out
Questions:
Why do I get so few good pairs and subsequently good reads? 4% is a bit useless.
Should I skip the filtering step and run the second guppy run with the pair_ids.text instead?
Lastly, the duplex basecalling could benefit from simplification. Dorado usage looks good but I am getting errors so its not working at the moment. Would be great if guppy could be simplified!