Hi,
Thanks for maintaining this tool and the great read the docs. I am having some issues running the die_test with my data and was hoping you might be able to provide some guidance. I am importing a transcriptome I generated using bambu and then filtered using SQANTI using the approach outlined in the tutorial (https://isotools.readthedocs.io/en/latest/notebooks/03b_transcriptome_import.html). I use hg38 in the from_reference command and then the novel transcriptome as the transcripts file in the add_sample_from_csv command (with infer_genes=True). However, when I try to run die_test on the object, I get the following error:
INFO:testing differential isoform expression (DIE) for GB2 (3) vs NSC (6).
ERROR:chi2_contingency([[ 706 1218]
[ 3 0]
[ 0 0]
[ 2 0]
[ 0 1]
[ 0 0]
[ 0 0]
[ 0 0]])
ValueError Traceback (most recent call last)
Cell In[20], line 2
1 grouping=dict(GB2=["GB2 1", "GB2 2", "GB2 3"], NSC=["H14 1","H14 2","H14 3","H9 1","H9 2","H9 3"])
----> 2 isoseq.die_test(groups=grouping,min_cov=1)
File ~/.conda/envs/medaka/lib/python3.10/site-packages/isotools/_transcriptome_stats.py:434, in die_test(self, groups, min_cov, n_isoforms, padj_method, **kwargs)
428 groupnames, groups, grp_idx = _check_groups(self, groups)
429 logger.info(
430 "testing differential isoform expression (DIE) for %s.",
431 " vs ".join(f"{groupnames[i]} ({len(groups[i])})" for i in range(2)),
432 )
--> 434 result = [
435 (gene.id, gene.name, gene.chrom, gene.strand, gene.start, gene.end)
436 + gene.die_test(grp_idx, min_cov, n_isoforms)
437 for gene in self.iter_genes(**kwargs)
438 ]
439 result = pd.DataFrame(
440 result,
441 columns=[
(...)
451 ],
452 )
453 mask = np.isfinite(result["pvalue"])
File ~/.conda/envs/medaka/lib/python3.10/site-packages/isotools/_transcriptome_stats.py:436, in (.0)
428 groupnames, groups, grp_idx = _check_groups(self, groups)
429 logger.info(
430 "testing differential isoform expression (DIE) for %s.",
431 " vs ".join(f"{groupnames[i]} ({len(groups[i])})" for i in range(2)),
432 )
434 result = [
435 (gene.id, gene.name, gene.chrom, gene.strand, gene.start, gene.end)
--> 436 + gene.die_test(grp_idx, min_cov, n_isoforms)
437 for gene in self.iter_genes(**kwargs)
438 ]
439 result = pd.DataFrame(
440 result,
441 columns=[
(...)
451 ],
452 )
453 mask = np.isfinite(result["pvalue"])
File ~/.conda/envs/medaka/lib/python3.10/site-packages/isotools/gene.py:1333, in Gene.die_test(self, groups, min_cov, n_isoforms)
1331 idx = np.array(range(cov.shape[0]))
1332 try:
-> 1333 _, pval, _, _ = chi2_contingency(cov)
1334 except ValueError:
1335 logger.error(f"chi2_contingency({cov})")
File ~/.local/lib/python3.10/site-packages/scipy/stats/contingency.py:340, in chi2_contingency(observed, correction, lambda_)
336 if np.any(expected == 0):
337 # Include one of the positions where expected is zero in
338 # the exception message.
339 zeropos = list(zip(*np.nonzero(expected == 0)))[0]
--> 340 raise ValueError("The internally computed table of expected "
341 f"frequencies has a zero element at {zeropos}.")
343 # The degrees of freedom
344 dof = expected.size - sum(expected.shape) + expected.ndim - 1
ValueError: The internally computed table of expected frequencies has a zero element at (2, 0).
I have tried to change the n_isoforms and min_coverage but neither is fixing this issue. To try and remove isoforms with 0 counts in all samples, I created a new csv of the filtered counts and tried imported that as well but the same error still pops up. Have you seen this error before? It seems that this arises from isoforms that are not expressed in either condition but as mentioned it seems that removing them from the input counts file does not fix the issue. Thanks so much.
Hi,
Thanks for maintaining this tool and the great read the docs. I am having some issues running the die_test with my data and was hoping you might be able to provide some guidance. I am importing a transcriptome I generated using bambu and then filtered using SQANTI using the approach outlined in the tutorial (https://isotools.readthedocs.io/en/latest/notebooks/03b_transcriptome_import.html). I use hg38 in the from_reference command and then the novel transcriptome as the transcripts file in the add_sample_from_csv command (with infer_genes=True). However, when I try to run die_test on the object, I get the following error:
I have tried to change the n_isoforms and min_coverage but neither is fixing this issue. To try and remove isoforms with 0 counts in all samples, I created a new csv of the filtered counts and tried imported that as well but the same error still pops up. Have you seen this error before? It seems that this arises from isoforms that are not expressed in either condition but as mentioned it seems that removing them from the input counts file does not fix the issue. Thanks so much.