feat: AnnData.can_write based on AnnData.fold + refactored AnnData.__sizeof__#2372
feat: AnnData.can_write based on AnnData.fold + refactored AnnData.__sizeof__#2372
AnnData.can_write based on AnnData.fold + refactored AnnData.__sizeof__#2372Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2372 +/- ##
==========================================
- Coverage 87.36% 85.40% -1.97%
==========================================
Files 49 49
Lines 7787 7830 +43
==========================================
- Hits 6803 6687 -116
- Misses 984 1143 +159
|
AnnData.can_write based on AnnData.foldAnnData.can_write based on AnnData.fold + refactored AnnData.__sizeof__
| } | ||
|
|
||
| def predicate(x: RWAble, *, accumulate: bool, ref_acc: AdRef | RefAcc | None): | ||
| if isinstance(ref_acc, AdRef) or ref_acc is None: |
There was a problem hiding this comment.
Here and above uns is ref_acc is None but I feel like there should be a better way to describe this!
There was a problem hiding this comment.
right, let’s talk about this!
|
Just looking for a first-pass vibe check on this! |
| if is_ad_ref: | ||
| key = type(ref_acc.acc) | ||
| elif is_ref_acc: | ||
| key = ref_acc.parent_type | ||
| else: | ||
| key = None |
There was a problem hiding this comment.
this was discussed during the PR for reference accessors but I think it should be easier to traverse up the tree. open to ideas!
flying-sheep
left a comment
There was a problem hiding this comment.
Very exciting! I think this mainly needs some serious documenting and examples, not much else.
For examples, maybe first something that just handles one kind of accessor/reference first to show how one can short-circuit by returning the accumulator early.
Regarding naming: the canonical name in the stdlib is reduce, it even used to be in builtins, but maybe the scientific python world deviates for no good reason again. It’s also fair if you have other objections to reduce!
| init: T, | ||
| order: Literal["DFS-pre", "DFS-post"] = "DFS-post", | ||
| ) -> T: | ||
| """Accumulate a value starting from init by iterating over the "elems"/leaf nodes of the AnnData object. |
There was a problem hiding this comment.
Please explain what this function passes to ref_acc for which elements.
Either
- link explicitly to
FoldFunc(additionally to the typehints auto link) and explain it there, or - explain it here and direct people from the
FoldFuncdocs to here.
| (is_ad_ref := isinstance(ref_acc, AdRef)) | ||
| or ( | ||
| is_ref_acc := isinstance( | ||
| ref_acc, LayerAcc | MultiAcc | GraphAcc | ||
| ) | ||
| ) |
There was a problem hiding this comment.
this is
isinstance(ref_acc, AdRef | RefAcc) and not isinstance(ref_acc, MetaAcc)because you want to handle each homogeneous array.
I think there are some ways to make this more ergonomic, e.g.:
- we can add more marker superclasses that handle this use case, or
- we could change the implementation of fold/reduce so it descends into things and returning or raising some marker would prevent descent. that might help handling things as early as possible and therefore having less complicated logic.
|
Hi @ilan-gold, interesting direction! The HTML repr PR (#2236) intersects with HTML repr (#2236): traversal overlapThe SECTION_ORDER = ("X", "obs", "var", "uns", "obsm", "varm", "layers", "obsp", "varp", "raw")
for section in SECTION_ORDER:
parts.append(_render_section(adata, section, context))This is structurally very similar to
On the positive side, the per-column iteration of obs/var via The repr would naturally use DFS-pre order (render section header first, then its entries), so it's good that option exists. If Serialization checks:
|
maybe TODOs:
fold?reduce?write_{zarr,h5ad}to prevent partial stores from being written?We introduce
AnnData.foldandAnnData.can_writefor "crawling" anAnnDataand accumulating, implementing acan_writeto tell if anAnnDatacan be written to a given format.I'm also open to
reduceas the name instead offold.For #2236, I think we could implement this more complexly by building on
foldand accumulating asetofAdRef(or similar) objects pointing at what can't be written. Open to implement this! I reimplemented__sizeof__to stress-test using reference-accessors and it went ok, but it's not amazingly smooth (likeunsand trying to figure out if you are in a "top-level" elem likeobsorvarm)If zarr-developers/zarr-python#3613 ever happens (I hope so) or we do #2238, this will be crucial to making things smoother with hdf5 for preventing incompatibilities.
backedinternal checking #2369 and moving towards something raised in feat: Add HTML representation #2236