Define Array methods for working with ArrayRef#9295
Define Array methods for working with ArrayRef#9295scovich wants to merge 3 commits intoapache:mainfrom
Conversation
|
@alamb interested in your thoughts here |
|
We typically just use |
| /// ``` | ||
| fn as_any(&self) -> &dyn Any; | ||
|
|
||
| /// Attempts to recover an [`ArrayRef`] from `&dyn Array`, returning `None` if not Arc-backed. |
There was a problem hiding this comment.
this API makes sense to me as a user of the library, for what it is worth, and would save some allocations rather than going to/from ArrayData
|
|
||
| fn as_array_ref_opt(&self) -> Option<ArrayRef> { | ||
| // Recursively unwrap nested Arcs to find the deepest one | ||
| Some(innermost_array_ref(Cow::Borrowed(self))) |
There was a problem hiding this comment.
I am probably confused, but why not just Arc::clone(self)? This is impl ... for ArrayRef which is Arc<dyn Array>
So technically speaking as_array_ref_opt would be recovering the ArrayRef
Are there usecases when we have wrapped multiple levels of Arrays? I think this (code tries to handle the case of Arc<Arc<PrimitiveArary>> or something, which I don't think is common 🤔
There was a problem hiding this comment.
Hehe, yes we absolutely do have Arc<Arc<...>> and there are even unit tests verifying they behave correctly.
That was a surprise to me when I started exploring the possibility to have Array: Any.
So this code might arc-clone self... but not if self harbors a deeper ArrayRef inside.
There was a problem hiding this comment.
I agree it is possible, but why on earth would you do such a thing? Tbh I'm not sure why Array is implemented for Arc<dyn Array>...
There was a problem hiding this comment.
IMO both impl Array for Arc<dyn Array> and impl<T: Array> Array for &T were questionable (design?) decisions back whenever they showed up.
I expect it was intended to be ergonomic?
- The former allows passing
&ArrayRefdirectly to functions that expect&dyn Array, without needing to litter the code with explicitas_ref()calls. - The latter allows code that expects
impl AsRef<dyn Array>to work with both owned and borrowed operands.
Part of me would love to just remove both of those, but that would definitely be a massive breaking change. As in, massive number of compilation failures (even if easily fixed).
Even within arrow-rs itself, the whole ArrayAccessor and ArrayIterator abstraction would break if we took away the blanket impl for &T, because ArrayAccessor::Item is owned for some array types (primitive, boolean, etc) and borrowed for others (GenericByteArray). I spent a few minutes trying to untangle it, without success.
There was a problem hiding this comment.
I tried removing impl Array for Arc<dyn Array> and the biggest problem is that Scalar<ArrayRef> stops working. It's not immediately clear whether/how to fix that. In particular, there's no obvious replacement for Scalar::new(new_null_array(...)).
Other smaller messes include:
- Quite a few call sites that call
Arc::new(<expression returning ArrayRef>) as ArrayRef(there's that wrapping!) - A surprising number of calls to
<ArrayRef as Array>::shrink_to_fit()(fake-mutable, no-op) - Lots of calls to
<ArrayRef as Array>::into_data()(fake-mutable, equivalent toArray::to_data) - A lot of code that has an
ArrayRefor&ArrayRef, and which lacks the necessary.as_ref()call. Mostly unit tests. - Missing
impl Datum for ArrayRef
Just getting arrow-rs to compile without it was an impressive amount of churn:
82 files changed, 883 insertions(+), 732 deletions(-)
There was a problem hiding this comment.
If nothing else, we can at least clean up call sites that are needlessly creating Arc<Arc<dyn Array>>:
| // - Same allocation/pointer is guaranteed by check #2 | ||
| // - Arc::from_raw is only called once on this pointer | ||
|
|
||
| // Unwrap nested Arc<dyn Array> if present (zero clones if not nested) |
There was a problem hiding this comment.
I feel like this is code I am not qualified to review.
I did some more googling, and it seems like we should be able to do soemthing similar using a trick like this:
https://stackoverflow.com/questions/76222743/upcasting-an-arcdyn-trait-to-an-arcdyn-any
However, I couldn't make the trait impls all line up 🤔
There was a problem hiding this comment.
If I understand correctly, the SO question is trying to go from Arc<dyn Foo> to Arc<dyn Any>. The two solutions are:
- use trait upcasting
- define an extension trait with a helper method that receives
Arc<Self>
The former requires Foo: Any + 'static as well as the unstable trait-upcasting feature. Even if the feature stabilizes, we still have the problem that Array does not satisfy Any + 'static.
The latter I thought would make the extension trait not dyn-compatible... but the code in that accepted SO answer seems to compile... let me look into that and get back to you.
There was a problem hiding this comment.
Ah -- apparently the 'static constraint allows Arc<Self> methods to be dyn-compatible (TIL!).
Unfortunately, that would impose a transitive requirement of Array: 'static which is incompatible with a bunch of existing code, including the existing impl<T> Array for &T, trait ArrayAccessor: Array, and any code that does anything with a &'a dyn Array (which is ~all of them, if you account for anonymous/erased lifetimes).
Which issue does this PR close?
Rationale for this change
Two things:
&ArrayRefcan coerce to&dyn Array, but there is currently no way to recover the originalArrayRefinstance afterward, becauseArray::as_any()returns the referent of theArrayRef.Arc<T: Array>can coerce toArrayRef, but there is currently no way to recover the orignalArc<T>becauseArraydoes not (and indeed cannot) satisfy theAny + 'staticbound thatArc::downcastrequires.What changes are included in this PR?
Define a new
Array::as_array_ref()method, with a provided implementation that returns None.<Arc<T> as Array>::as_array_ref()returnsSome(self)and<&T as Array>::as_array_ref()delegates to its referent.Also define a new
Array::to_array_ref()method that attempts to recover anArrayRefbut which falls back to creating a new Array instance viaArray->ArrayData->make_array. This provides a simple way to obtain an ownedArrayReffrom any&dyn Array, optimized to a cheap Arc clone in case the trait object is already anArrayRef.Define a new freestanding
downcast_array_reffunction that attempts to recover the originalArc<T>from an ownedArrayRef-- mirroring the behavior ofArc::downcast, but without the restrictive type bounds.Are these changes tested?
Yes, new doc tests validate the code.
Are there any user-facing changes?
New user-facing functions described above.