-
Notifications
You must be signed in to change notification settings - Fork 190
feat: AnnData.unwriteable based on AnnData._reduce + iter_outer + refactorings of other relevant functions
#2372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
b907554
feat: `AnnData.can_write` based on `AnnData.fold`
ilan-gold 19daed5
chore: docs
ilan-gold 4125375
refactor: use accessors
ilan-gold 8be5ba2
fix: DFS order + fixes
ilan-gold 0f4d1b0
chore: add test for `uns`
ilan-gold 9338baa
Merge branch 'main' into ig/fold_can_write
ilan-gold 69daf90
feat: `raw` + `uns` traversal
ilan-gold 932d766
fix: `fold` -> `reduce`
ilan-gold e0f3ee2
chore: docs
ilan-gold 436fc68
Merge branch 'main' into ig/fold_can_write
ilan-gold ee04741
fix: `meth` not `func`
ilan-gold 6d6f454
fix: `fold` not `reduce` in relnote
ilan-gold 1f77a4c
fix: nested
ilan-gold 91adffe
chore: more `func` clarification
ilan-gold 928b72a
fix: link
ilan-gold 19a915d
fix: link
ilan-gold c0886fe
refactor: simpler
ilan-gold 6cffc05
fix: relnote number
ilan-gold 44890eb
Merge branch 'main' into ig/fold_can_write
ilan-gold 39800aa
refactor: use `iter`
ilan-gold 6cb401b
fix: oops
ilan-gold 1dfdd96
fix: why was this deleted?
ilan-gold 9ad937f
fix: doc string
ilan-gold 0c03ffb
fix: docs
ilan-gold f00db89
fix: remove `parent_type`
ilan-gold cabc914
Merge branch 'main' into ig/fold_can_write
ilan-gold 9fa978a
fix: writing none
ilan-gold e7b201f
fix: API changes
ilan-gold 95136a2
fix use `set`
ilan-gold 4eba690
fix: docs
ilan-gold fdd6b7c
fix: remove unused docs / private type
ilan-gold 5760cb2
fix: nexting
ilan-gold 7382b67
fix: ok
ilan-gold 3c747e1
fix: handle bad categoricals
ilan-gold 371d535
fix: handle index / awkward
ilan-gold d8f66c3
Merge branch 'main' into ig/fold_can_write
ilan-gold 5a0fded
refactor: `can_write` -> `unwriteable`
ilan-gold a6374ce
Merge branch 'ig/fold_can_write' of github.com:scverse/anndata into i…
ilan-gold 1ad82f4
Merge branch 'main' into ig/fold_can_write
ilan-gold File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| New {meth}`anndata.AnnData.unwriteable` for checking if an `AnnData` can be written {user}`ilan-gold` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,7 +4,7 @@ | |
|
|
||
| from __future__ import annotations | ||
|
|
||
| from collections import OrderedDict | ||
| from collections import OrderedDict, defaultdict | ||
| from collections.abc import Mapping, MutableMapping, Sequence | ||
| from copy import copy, deepcopy | ||
| from functools import singledispatchmethod | ||
|
|
@@ -26,8 +26,10 @@ | |
| from .. import utils | ||
| from .._settings import settings | ||
| from ..compat import ( | ||
| AwkArray, | ||
| DaskArray, | ||
| IndexManager, | ||
| XDataset, | ||
| ZarrArray, | ||
| _move_adj_mtx, | ||
| has_xp, | ||
|
|
@@ -39,6 +41,7 @@ | |
| axis_len, | ||
| deprecation_msg, | ||
| ensure_df_homogeneous, | ||
| iter_outer, | ||
| raise_value_error_if_multiindex_columns, | ||
| set_module, | ||
| warn, | ||
|
|
@@ -62,9 +65,12 @@ | |
| from scipy import sparse | ||
| from zarr.storage import StoreLike | ||
|
|
||
| from anndata.typing import RWAble | ||
|
|
||
| from .._types import ReduceFunc | ||
| from ..acc import AdRef, Array, MapAcc, RefAcc | ||
| from ..compat import XDataset | ||
| from ..typing import Index, Index1D, _Index1DNorm, _XDataType | ||
| from ..compat import CSArray, CSMatrix | ||
| from ..typing import AxisStorable, Index, Index1D, _Index1DNorm, _XDataType | ||
| from .aligned_mapping import AxisArraysView, LayersView, PairwiseArraysView | ||
|
|
||
|
|
||
|
|
@@ -512,53 +518,54 @@ def _init_as_actual( # noqa: PLR0912, PLR0913, PLR0915 | |
| def __sizeof__( | ||
| self, *, show_stratified: bool = False, with_disk: bool = False | ||
| ) -> int: | ||
| def get_size(X) -> int: | ||
| def cs_to_bytes(X) -> int: | ||
| return int(X.data.nbytes + X.indptr.nbytes + X.indices.nbytes) | ||
| def cs_to_bytes(X: CSArray | CSMatrix) -> int: | ||
| return int(X.data.nbytes + X.indptr.nbytes + X.indices.nbytes) | ||
|
|
||
| def get_size(X: RWAble) -> int: | ||
| if isinstance(X, h5py.Dataset) and with_disk: | ||
| return int(np.array(X.shape).prod() * X.dtype.itemsize) | ||
| elif isinstance(X, BaseCompressedSparseDataset) and with_disk: | ||
| return cs_to_bytes(X._to_backed()) | ||
| elif issparse(X): | ||
| return cs_to_bytes(X) | ||
| elif isinstance(X, dict | MutableMapping): | ||
| return sum(get_size(v) for v in X.values()) | ||
| else: | ||
| return X.__sizeof__() | ||
|
|
||
| sizes = {} | ||
| attrs = ["X", "_obs", "_var"] | ||
| attrs_multi = ["_uns", "_obsm", "_varm", "varp", "_obsp", "_layers"] | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW this never handled |
||
| for attr in attrs + attrs_multi: | ||
| if attr in attrs_multi: | ||
| keys = getattr(self, attr).keys() | ||
| s = sum(get_size(getattr(self, attr)[k]) for k in keys) | ||
| def fold_size( | ||
| elem: _XDataType | AxisStorable | pd.DataFrame | XDataset, | ||
| *, | ||
| accumulate: dict[str, int], | ||
| attr_name: str | None, # TODO: type | ||
| ): | ||
| if elem is None: | ||
| size = 0 | ||
| elif elem is self.raw: | ||
| size = ( | ||
| get_size(elem.X) | ||
| + get_size(elem.var) | ||
| + sum(get_size(v) for v in elem.varm.values()) | ||
| ) | ||
| else: | ||
| s = get_size(getattr(self, attr)) | ||
| if s > 0 and show_stratified: | ||
| size = get_size(elem) | ||
| accumulate[attr_name] = size | ||
| if size > 0 and show_stratified: | ||
| from tqdm import tqdm | ||
|
|
||
| print( | ||
| f"Size of {attr.replace('_', '.'):<7}: {tqdm.format_sizeof(s, 'B')}" | ||
| ) | ||
| sizes[attr] = s | ||
| return sum(sizes.values()) | ||
| print(f"Size of {attr_name}: {tqdm.format_sizeof(size, 'B')}") | ||
| return accumulate | ||
|
|
||
| return sum(self._reduce(fold_size, init=defaultdict(int)).values()) | ||
|
|
||
| def _gen_repr(self, n_obs, n_vars) -> str: | ||
| backed_at = f" backed at {str(self.filename)!r}" if self.isbacked else "" | ||
| descr = f"AnnData object with n_obs × n_vars = {n_obs} × {n_vars}{backed_at}" | ||
| for attr in [ | ||
| "obs", | ||
| "var", | ||
| "uns", | ||
| "obsm", | ||
| "varm", | ||
| "layers", | ||
| "obsp", | ||
| "varp", | ||
| ]: | ||
| keys = getattr(self, attr).keys() | ||
| if len(keys) > 0: | ||
| descr += f"\n {attr}: {str(list(keys))[1:-1]}" | ||
| for attr_name, elem in iter_outer(self): | ||
| if attr_name not in {"raw", "X"}: | ||
| keys = elem.keys() | ||
| if len(keys) > 0: | ||
| descr += f"\n {attr_name}: {str(list(keys))[1:-1]}" | ||
| return descr | ||
|
|
||
| def __repr__(self) -> str: | ||
|
|
@@ -1383,27 +1390,16 @@ def to_memory(self, *, copy: bool = False) -> AnnData: | |
| mem = backed[backed.obs["cluster"] == "a", :].to_memory() | ||
| """ | ||
| new = {} | ||
| for attr_name in [ | ||
| "X", | ||
| "obs", | ||
| "var", | ||
| "obsm", | ||
| "varm", | ||
| "obsp", | ||
| "varp", | ||
| "layers", | ||
| "uns", | ||
| ]: | ||
| attr = getattr(self, attr_name, None) | ||
| for attr_name, attr in iter_outer(self): | ||
| if attr is not None: | ||
| new[attr_name] = to_memory(attr, copy=copy) | ||
|
|
||
| if self.raw is not None: | ||
| new["raw"] = { | ||
| "X": to_memory(self.raw.X, copy=copy), | ||
| "var": to_memory(self.raw.var, copy=copy), | ||
| "varm": to_memory(self.raw.varm, copy=copy), | ||
| } | ||
| if attr is self.raw: | ||
| new["raw"] = { | ||
| "X": to_memory(self.raw.X, copy=copy), | ||
| "var": to_memory(self.raw.var, copy=copy), | ||
| "varm": to_memory(self.raw.varm, copy=copy), | ||
| } | ||
| else: | ||
| new[attr_name] = to_memory(attr, copy=copy) | ||
|
|
||
| if self.isbacked: | ||
| self.file.close() | ||
|
|
@@ -1436,6 +1432,100 @@ def copy(self, filename: PathLike[str] | str | None = None) -> AnnData: | |
| write_h5ad(filename, self) | ||
| return read_h5ad(filename, backed=mode) | ||
|
|
||
| def _reduce[T]( | ||
| self, | ||
| func: ReduceFunc[T], | ||
| *, | ||
| init: T, | ||
| ) -> T: | ||
| """Accumulate a value starting from init by iterating over the parent "elems"of the AnnData object i.e., raw, obs, varp etc. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| func | ||
| The function that performs the accumulation. | ||
| init | ||
| The starting value | ||
|
|
||
| Returns | ||
| ------- | ||
| An accumulated value | ||
| """ | ||
| accumulate = init | ||
| for attr_name, attr in iter_outer(self): | ||
| accumulate = func(attr, accumulate=accumulate, attr_name=attr_name) | ||
| return accumulate | ||
|
|
||
| def unwriteable(self, *, store_type: Literal["h5", "zarr"] | None) -> bool: | ||
| """Whether or not an `AnnData` object can be written to disk for a given store type. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| store_type | ||
| Which backing store - `None` indicates that it can be writeable to either. | ||
|
|
||
| Returns | ||
| ------- | ||
| Whether or not this object is writeable. | ||
| While the return type may change to include richer output about which elements cannot be written, | ||
| this new type's evaluation as a boolean will not change from the current behavior i.e., | ||
| `bool(adata.unwriteable())` will always evaluate the same. | ||
| """ | ||
|
|
||
| from anndata._io.specs.registry import _REGISTRY | ||
|
|
||
| writeable_elems = { | ||
| src_type | ||
| for (dest_type, src_type, __) in _REGISTRY.write | ||
| if store_type is None or store_type in dest_type.__module__ | ||
| } | ||
|
|
||
| def predicate( # noqa: PLR0911 | ||
| elem: RWAble, | ||
| *, | ||
| accumulate: bool, | ||
| attr_name: str | None = None, # TODO: type | ||
| ): | ||
| if elem is None: | ||
| return accumulate | ||
| if isinstance(elem, AnnData): | ||
| return accumulate and elem.unwriteable(store_type=store_type) | ||
| if isinstance(elem, pd.Categorical): | ||
| return accumulate and predicate(elem.categories, accumulate=accumulate) | ||
| if isinstance(elem, pd.Series | pd.Index): | ||
| # matches behavior in methods.py | ||
| return accumulate and predicate(elem._values, accumulate=accumulate) | ||
| if isinstance(elem, AwkArray): | ||
| import awkward as ak | ||
|
|
||
| container = ak.to_buffers(ak.to_packed(elem)) | ||
| return accumulate and all( | ||
| predicate(v, accumulate=accumulate) for v in container[2].values() | ||
| ) | ||
| if attr_name == "raw": | ||
| accumulate = accumulate and type(elem.X) in writeable_elems | ||
| return accumulate and all( | ||
| predicate(e[attr], accumulate=accumulate) | ||
| for e in [elem.var, elem.varm] | ||
| for attr in e | ||
| ) | ||
| if attr_name in { | ||
| "obs", | ||
| "obsm", | ||
| "varm", | ||
| "var", | ||
| "layers", | ||
| "varp", | ||
| "obsp", | ||
| "uns", | ||
| } or isinstance(elem, pd.DataFrame | XDataset | MutableMapping): | ||
| return accumulate and all( | ||
| predicate(elem[k], accumulate=accumulate) for k in elem | ||
| ) | ||
| return accumulate and type(elem) in writeable_elems | ||
|
|
||
| return self._reduce(predicate, init=True) | ||
|
|
||
| def var_names_make_unique(self, join: str = "-") -> None: | ||
| # Important to go through the setter so obsm dataframes are updated too | ||
| self.var_names = utils.make_index_unique(self.var.index, join) | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this gone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It got moved to the end, absorbed by
obsp: ...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, maybe replace
obsp: ...with...then to make that clear