[DNM/RFC] Track dependents without weakrefs by fjetter · Pull Request #831 · dask/dask-expr

fjetter · 2024-02-01T16:28:40Z

@phofl this is an implementation that would track the "expression graph" on the instances themselves without needing weakref semantics.

In a nutshell, the expressions all track a dictionary that maps the names of the expresions to the names of their dependencies. When one needs the dependents, this can be reverted.

I haven't tested performance on this thing yet. The instance creation is a little more expensive now but I would be surprised if this would be a significant overhead. The dependents graph inversion should be the same cost as the weakref implementation we have right now.

Tests are currently failing because the DelayedExpr thingy is a little broken but the rest passes. I'll look at performance next and will see if this can be easily combined with #798 This Draft PR is just an FYI

fjetter · 2024-02-01T16:32:55Z

dask_expr/_core.py

+    def __hash__(self):
+        raise TypeError("Don't!")


"Funny" story: The above implementation is maintaining two dicts. One that maps names and dependencies and one that just maps names to the actual objects. Initially I just tried to use the Expr objects themselves in the mappings using a mapping from Expr -> set[Expr}. That triggered funny recursion errors and it took me a while to understand that...

Whenever there is a hash collision, the implementation of set falls back to use __eq__ to compare the new object with the one that is stored under a given hash, i.e. object.__eq__(old, new). However, this doesn't evaluate to a bool but defines another expression... 💥

Yeah, definite __eq__ is pretty hazardous in Python. It would be reasonable, I think, to drop Expr.__eq__ and use explicit Eq(a, b) calls instead, and then rely on Frame.__eq__ for user comfort.

Or rather, use whatever class is defined in collections.py to hold the __eq__ method

dask_expr/_core.py

phofl · 2024-02-01T16:58:59Z

I have to take a closer look, but I like the solution. When I tried the same I only stored the dependents on the expression which caused me trouble while recreating the mapping. Simply building the mapping continuously and storing it on the expression should circumvent the issues I ran into.

Thanks for taking a stab at this

…krefs # Conflicts: # dask_expr/_core.py # dask_expr/_expr.py # dask_expr/io/_delayed.py

phofl · 2024-02-14T18:06:51Z

dask_expr/tests/test_shuffle.py

 def test_set_index_triggers_calc_when_accessing_divisions(pdf, df):
    divisions_lru.data = OrderedDict()
-    query = df.set_index("x")
+    query = df.fillna(random.randint(1, 100)).set_index("x")


This one is funny, the previous expression was cached, so we were never calculating divisions with this PR. Using a random number here avoid this

phofl · 2024-02-14T22:43:12Z

Fix for failing tests is here: #880

Will merge into this branch

fjetter added 2 commits February 1, 2024 17:22

Track dependents without weakrefs

7700b82

delete the old thing

3531583

fjetter commented Feb 1, 2024

View reviewed changes

phofl reviewed Feb 1, 2024

View reviewed changes

dask_expr/_core.py Outdated Show resolved Hide resolved

fjetter mentioned this pull request Feb 1, 2024

Expr as singleton #798

Merged

phofl added 8 commits February 14, 2024 11:53

Update dask_expr/_core.py

af85220

Merge remote-tracking branch 'upstream/main' into dependents_wout_wea…

caf94ce

…krefs # Conflicts: # dask_expr/_core.py # dask_expr/_expr.py # dask_expr/io/_delayed.py

Update

a9f9904

Update

3155405

Merge branch 'main' into dependents_wout_weakrefs

cf5d77e

Don't run concurrently

39b3a26

Don't run concurrently

431c302

Update

1faa475

phofl reviewed Feb 14, 2024

View reviewed changes

Fix reads from local dir that changes directory

bda2577

phofl added 2 commits February 14, 2024 23:43

Merge branch 'chdir' into dependents_wout_weakrefs

babb1ee

Update to keep track of refs

97029db

fjetter closed this Jul 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DNM/RFC] Track dependents without weakrefs#831

[DNM/RFC] Track dependents without weakrefs#831
fjetter wants to merge 13 commits intodask:mainfrom
fjetter:dependents_wout_weakrefs

fjetter commented Feb 1, 2024

Uh oh!

fjetter Feb 1, 2024

Uh oh!

mrocklin Feb 1, 2024

Uh oh!

mrocklin Feb 1, 2024

Uh oh!

Uh oh!

phofl commented Feb 1, 2024

Uh oh!

phofl Feb 14, 2024 •

edited

Loading

Uh oh!

phofl commented Feb 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

fjetter commented Feb 1, 2024

Uh oh!

fjetter Feb 1, 2024

Choose a reason for hiding this comment

Uh oh!

mrocklin Feb 1, 2024

Choose a reason for hiding this comment

Uh oh!

mrocklin Feb 1, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

phofl commented Feb 1, 2024

Uh oh!

phofl Feb 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phofl commented Feb 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

phofl Feb 14, 2024 •

edited

Loading