Conversation
|
Two thoughts:
|
|
Also, I agree that this isn't high priority. I'd be fine personally if people ignored this until after the dask migration is done (which is high priority I think). |
Yeah, I agree that something else should probably walk the tree to keep the protocol code simple. The current implementation is just a very simple way to demonstrate the concept.
I'm personally open to anything. I ended up using "exec" instead of "apply", since apply already has a meaning in pandas. My only hesitation from using "pandas" in the name is that it would be nice to use the same language for array expressions in the future.
Yup - Dask-expr + cudf is pretty much completely broken at the moment, so this proposal is pretty low on my list as well. Just want to make sure the "exec" idea is visible, and that there is a space to discuss it. |
|
Also cc'ing @TomNicholas and @tomwhite who have expressed interest in using dask-expr for things other than Dask. This PR isn't mature, but it's a good example of feasibility. |
Note that this is not a high-priority, but I explored the idea a few months ago and wanted to share the branch in case others had interest.
The idea here is that the expression system used in dask-expr also makes it pretty easy to directly execute the query using the backend library (rather than constructing a task graph and scheduling the tasks). I'd expect this to be useful for both debugging and for smallish-data applications. For the latter case, this effectively allows the user to apply query optimization to a "serial" pandas or cudf query.
This is only a half-baked idea for now. Only a small subset of expression types are supported (e.g.
FromPandas,Blockwise,SortValues,SetIndex,Merge,GroupbyAggregation).