When reclustering a given subset of cells with bc.tl.rc.recluster() function, it seems that the HVG are only computed from previous HVG genes (adata.var) instead of from all genes (adata.raw.var). Going back to the full set of genes is needed to give the algorithm a chance to find the genes that are most variable specifically within the subset of cells.
The problematic part of the function seems this one:
cluster_subset.raw = cluster_subset in line 97
Suggested fix:
# Create a new anndata object for the subcluster analysis.
# IMPORTANT: We use adata.raw to get the expression data for ALL genes.
cluster_subset = cluster_subset.raw.to_adata()
cluster_subset.raw = cluster_subset # create back a raw layer before running HVG. cluster_subset.raw.var will contain all genes, whereas cluster_subset.var only the newly identified HVG
Thanks for checking and fixing it
Best,
Llucia
When reclustering a given subset of cells with
bc.tl.rc.recluster()function, it seems that the HVG are only computed from previous HVG genes (adata.var) instead of from all genes (adata.raw.var). Going back to the full set of genes is needed to give the algorithm a chance to find the genes that are most variable specifically within the subset of cells.The problematic part of the function seems this one:
cluster_subset.raw = cluster_subsetin line 97Suggested fix:
Thanks for checking and fixing it
Best,
Llucia