-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hi Author(s),
I think it is super cool that you've implemented this dynamic strategy!
We where trying to normalize some amplicons on a given chromosome, so we made sequences that where unique to these amplicons and provided these as a reference file. However the minimum genome size of 10k made this impossible.
There is an unclear error when no sequence is long enough in the reference genome
2023-06-08 15:35:31,905 reading reference file
Traceback (most recent call last):
File "bossruns.py", line 14, in <module>
bossrun.initialise_OTUs()
File "/home/grid/BOSS-RUNS/bossruns/BR_core.py", line 1374, in initialise_OTUs
otu.init_reference_wgs(ref=self.args.ref, mmi=self.args.ref_idx, reject_refs=self.otu.reject_refs)
File "/home/grid/BOSS-RUNS/bossruns/BR_core.py", line 603, in init_reference_wgs
reference.load_reference()
File "/home/grid/BOSS-RUNS/bossruns/BR_reference.py", line 90, in load_reference
self.genome = integer_genome(chromosome_sequences=self.chromosome_sequences)
File "/home/grid/BOSS-RUNS/bossruns/BR_utils.py", line 204, in integer_genome
genome = np.concatenate(genome)
File "<__array_function__ internals>", line 200, in concatenate
ValueError: need at least one array to concatenate
After some digging it apears that the mmi property at 604 is None.
If we hardcode the mmi, and set min_len in BR_reference to 100: we get slightly further into the code:
2023-06-08 15:48:57,585 finished init -------
2023-06-08 15:48:57,585 loading mapping index: None
Traceback (most recent call last):
File "bossruns.py", line 15, in <module>
bossrun.initialise_merged()
File "/home/grid/BOSS-RUNS/bossruns/BR_core.py", line 1426, in initialise_merged
self.mapper.init_mappy(ref=ref)
File "/home/grid/BOSS-RUNS/bossruns/BR_mapper.py", line 25, in init_mappy
self.aligner = mappy.Aligner(fn_idx_in=ref, preset="map-ont") # , extra_flags=0x400)
File "python/mappy.pyx", line 144, in mappy.Aligner.__cinit__
TypeError: descriptor 'encode' for 'str' objects doesn't apply to a 'NoneType' object
If we print ref here, it is None
If we check the values at line 1426 in BR_core we get
Checking args.mu
Namespace(alpha=300, alphaPrior=1, base_quality_threshold=1, bucket_size=20000, ckp=None, conditions=False, cov_until=5, deletion_error=0.05, deletion_substitution_ratio=0.4, device='X2', err_missed_deletion=0.1, eta=11, fastq_dir='/data/./TP53_bossrun_test_01/TP53_multiplex/20230607_1545_X2_FAW73389_935289a8/fastq_pass', flank=10000, host='localhost', mu=400, out_dir='./bossruns_TP53_sampling', p0=0.1, ploidy=1, port=None, ref='/data/references/TP53_sampling.fa', ref_idx=None, ref_priors=1, reject_refs=None, rho=300, run_name='TP53_sampling', substitution_error=0.06, testing=False, theta=0.01, vcf=None, wait=90, wgs=1, window=100, windowSize=2000)
Checking ref variable in BR_mapper
None
Traceback (most recent call last):
File "bossruns.py", line 15, in <module>
bossrun.initialise_merged()
File "/home/grid/BOSS-RUNS/bossruns/BR_core.py", line 1428, in initialise_merged
self.mapper.init_mappy(ref=ref)
File "/home/grid/BOSS-RUNS/bossruns/BR_mapper.py", line 27, in init_mappy
self.aligner = mappy.Aligner(fn_idx_in=ref, preset="map-ont") # , extra_flags=0x400)
File "python/mappy.pyx", line 144, in mappy.Aligner.__cinit__
TypeError: descriptor 'encode' for 'str' objects doesn't apply to a 'NoneType' object
If i hardcode ref here, i get slightly futher, as it is actually aligning reads:
...
2023-06-08 15:57:47,973 reading file: /data/./TP53_bossrun_test_01/TP53_multiplex/20230607_1545_X2_FAW73389_935289a8/fastq_pass/FAW73389_pass_935289a8_f7ada654_10.fastq.gz
2023-06-08 15:57:48,130 processing 4000 reads in this batch
2023-06-08 15:59:29,869 Number of mapped reads: 291241, unmapped reads: 96759
2023-06-08 16:00:06,494 Counts and rel. proportions of observed reads:
2023-06-08 16:00:06,494 bl4: 27002 0.07
2023-06-08 16:00:06,494 bl5: 5773 0.015
2023-06-08 16:00:06,494 bl6: 95893 0.247
2023-06-08 16:00:06,494 bl7: 2277 0.006
2023-06-08 16:00:06,494 bl8: 58456 0.151
2023-06-08 16:00:06,494 bl9: 3670 0.009
2023-06-08 16:00:06,494 bl10: 822 0.002
2023-06-08 16:00:06,495 bl11: 89223 0.23
2023-06-08 16:00:06,495 bl12: 4950 0.013
2023-06-08 16:00:06,495 bl13: 3078 0.008
2023-06-08 16:00:06,495 bl15: 97 0.0
2023-06-08 16:00:06,899 updating read length distribution
2023-06-08 16:00:06,905 new approx ccl
2023-06-08 16:00:06,905 [ 856 963 1070 1202 1312 1394 1462 1531 1639 2178]
2023-06-08 16:00:06,905 updating scores
2023-06-08 16:00:13,063 switch count: off 0; on 7
Traceback (most recent call last):
File "bossruns.py", line 24, in <module>
next_update = bossrun.process_batch()
File "/home/grid/BOSS-RUNS/bossruns/BR_core.py", line 2404, in process_batch
self.update_strategy()
File "/home/grid/BOSS-RUNS/bossruns/BR_core.py", line 2051, in update_strategy
BossRun.flush_strat_buckets(strat=strat, otu=otu, switches=switches, window=self.args.window)
File "/home/grid/BOSS-RUNS/bossruns/BR_core.py", line 2011, in flush_strat_buckets
c_switches_adj = adjust_length(original=prev_strat, expanded=c_switches_rep)
File "/home/grid/BOSS-RUNS/bossruns/BR_utils.py", line 85, in adjust_length
assert repl.shape[0] == original.shape[0]
AssertionError
This is where i decided to stop digging in the codebase to hardcode "fixes".
Is there a way to make this work? How would you procede? An alternative is to add 5kb of noise to the start and end of the amplicon sequences, but that feels very hacky.