feat: introduce stack-allocated PyBuffer#5894
Conversation
davidhewitt
left a comment
There was a problem hiding this comment.
Thanks very much for this. Various thoughts around the accessor methods.
Also, out of scope for this PR, but I keep wondering if we should provide iterators for these structures. Especially with the strides / suboffsets etc, it's not necessarily trivial to get this right.
Yes, I think it would definitely be useful as well. I am not yet sure how this would look like though. |
davidhewitt
left a comment
There was a problem hiding this comment.
Thanks for the continued work here, looking great!
Implementing Drop directly on PyUntypedBufferView is functionally equivalent to what I had in mind from the "drop guard", so gets a 👍 from me.
Afraid I had quite a few more thoughts around some of the edge cases (plus some ideas we can probably ignore). Hopefully helps us get to the right eventual abstraction!
5430e96 to
48b5fdd
Compare
Merging this PR will degrade performance by 10.73%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ❌ | bench_pyclass_create |
3.9 µs | 4.4 µs | -10.73% |
Comparing winstxnhdw:feat/stacked-pybuffer (7f2de53) with main (962a535)
Footnotes
-
1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports. ↩
b4f1eab to
dd63129
Compare
davidhewitt
left a comment
There was a problem hiding this comment.
Very cool!
After reading through this implementation a couple of times, I really like the expressiveness this gives in the type system. I have some ideas which might refine it further (see BufferFlags struct idea).
Main concern is about generic code bloat from explosion of generic parameters. I am unsure if there's a way that we can mitigate that at all.
|
Thanks for all of this - would be interested to see what you think of these suggestions (none are mandatory, I'm just pushing out ideas of what I think I like, which may not be to others' taste!) |
|
#5870 has now merged, we'll want to add a similar API here. |
fcb9203 to
c1872f3
Compare
|
Sorry for the delay. I've been a little burnt out from school + work. I've adapted your suggestions and modified them to be what I think is appropriate. Also, if you do This is ok. let bytes = PyBytes::new(py, b"abcde");
let flags = PyBufferFlags::simple()
.writable()
.format()
.c_contiguous()
.strides();
PyUntypedBufferView::with_flags(&bytes, flags, |view| {
...
})This is not ok. let bytes = PyBytes::new(py, b"abcde");
let flags = PyBufferFlags::simple()
.writable()
.format()
.full() // errors here
.c_contiguous()
.strides();
PyUntypedBufferView::with_flags(&bytes, flags, |view| {
...
}) |
davidhewitt
left a comment
There was a problem hiding this comment.
Thanks, this is looking really cool. We're obviously venturing into new territory here so please bear with me if we have a few rounds of review while we experiment with what feels good (both to implement and to use).
A bunch of comments applied. I suspect we'll learn from these.
davidhewitt
left a comment
There was a problem hiding this comment.
Thanks, I think this is looking in pretty good shape. Let's see if we get an answer on python/cpython#148431 regarding format / itemsize interactions just in case that encourages us to make some other tweaks.
I would love feedback from some buffer API consumers about how this looks.
Ping @alex @kylebarron, I believe you both use PyO3's buffer API, please forgive the ping. I would love to ask:
- Would this view type be useful to you for performance benefit over the
PyBuffer<T>case which incurs an unconditional heap allocation? (I think it seems worth it for some consumers even if it doesn't affect your use cases, so this won't block merge if you say no.) - As consumers of the API, what do you think of the
PyBufferFlagstype which encodes the request and what fields from the exporter are meaningful in the type system. See e.g. #5894 (comment) - if the buffer request didn't ask for shape information, then.shape()accessor is not available, for example. I think we've cooked up something pretty nice here, though it would be nice to know whether others agree before we merge it 😅
See also docs for this PyUntypedBufferView on this branch at https://69dbb09a02f842688b3dcc9f--pyo3.netlify.app/main/doc/pyo3/buffer/struct.pyuntypedbufferview
|
|
||
| /// Type-safe buffer request flags. The const parameters encode which fields | ||
| /// the exporter is required to fill. | ||
| pub struct PyBufferFlags< |
There was a problem hiding this comment.
A thought: if we wanted to change CONTIGUITY to be an enum later (when possible) or add SUBOFFSETS, we would need users to be unable to directly name this type.
I think we might be able to achieve that by making this struct something like PyBufferFlagsImpl and having the real PyBufferFlags type be a thin re-export around it which uses the PyBufferFlagsImpl as a type parameter, similar to the way the view types do this.
I'm unsure of the merits of doing so, what do you think? It would be nice to have control over the implementation later if it doesn't add tons of complexity for both us and readers.
There was a problem hiding this comment.
I agree that we shouldn't expose PyBufferFlags to minimise breaking changes. I've made the change, but I think it looks a bit ugly. I've made it a separate commit so that you can easily view the diff, and for me to revert if needed. Let me know what you think.
There was a problem hiding this comment.
It doesn't seem too bad to me, what bit do you dislike?
If you don't like the name FlagsImpl, one thought I have is that we could rename PyBufferFlags -> PyBufferRequest and PyBufferFlagsImpl -> PyBufferFlags.
"Compound Requests" is the term given in the Python docs for the flag combinations, after all.
See also my other comment on .format() method which might hide many of the uses of FlagsImpl if applied to all possible patterns.
|
I'm not sure I have enough context to answer these questions. Do you have a full usage example perhaps?
What is
That does look cool. For my usage I tend to want to expose an input Python buffer as either I think this is sound if the Python user doesn't mutate the buffer while the Rust code is running? My use cases usually expect immutable buffers, even though there's no invariant to force that. I don't think I'd have much benefit from a |
|
I've also been thinking if we could implement the extractor for the view buffers. It would definitely be nicer to use. I'll investigate. |
davidhewitt
left a comment
There was a problem hiding this comment.
Thanks, some responses to the PyBufferFlagsImpl changes.
|
|
||
| /// Type-safe buffer request flags. The const parameters encode which fields | ||
| /// the exporter is required to fill. | ||
| pub struct PyBufferFlags< |
There was a problem hiding this comment.
It doesn't seem too bad to me, what bit do you dislike?
If you don't like the name FlagsImpl, one thought I have is that we could rename PyBufferFlags -> PyBufferRequest and PyBufferFlagsImpl -> PyBufferFlags.
"Compound Requests" is the term given in the Python docs for the flag combinations, after all.
See also my other comment on .format() method which might hide many of the uses of FlagsImpl if applied to all possible patterns.
| /// A [struct module style](https://docs.python.org/3/c-api/buffer.html#c.Py_buffer.format) | ||
| /// string describing the contents of a single item. | ||
| #[inline] | ||
| pub fn format(&self) -> &CStr { |
There was a problem hiding this comment.
As part of simplifying the amount of FlagsImpl bouncing around, I came up with this commit: 3a977f3
Two upsides to that commit:
- The
.format()method is now always available, but has a trait bound with a friendly error message if the buffer request does not provide (or imply) format information. - There is now no mention of
FlagsImplat all on the.format()method, just a trait bound onFlagsgeneric parameter.
What do you think of that? If you like it, maybe you can pull it and do similar for the other accessor methods?
There was a problem hiding this comment.
Wow, I didn't know you could do something like this. That looks good.
There was a problem hiding this comment.
There is now no mention of FlagsImpl at all on the .format() method, just a trait bound on Flags generic parameter.
I am not sure what's a good middle-ground for this. Your commit only has IncludesFormat, but technically, we can go all the way to include all of these traits as well.
- CanRequestFormat
- CanRequestShape
- CanRequestStrides
- CanRequestIndirect
- CanRequestWritable
- CanRequestContiguity
- IncludesFormat
- IncludesShape
- IncludesStrides
- IncludesSuboffsets
- GuaranteesWritable
- GuaranteesCContiguous
- GuaranteesFContiguous
Doing this will get rid much of the FlagsImpl, but the cost of that is adding many more lines of code.
You're absolutely correct - for large buffers uses the allocation overhead can be irrelevant, and keeping the buffer information on the stack has the downside that you can't keep the buffer export alive for later consumption. OP noticed the performance advantage was relevant to their use case, however. I can imagine this stack-based form is generally better for short-lived temporary buffers used to read (or write) a small bytes export.
Can you clarify what you're thinking here? Maybe I can offer ideas :) |
Yes, that sounds about right. @alex has previously written about how the current buffer API is deficient of guarantees against data races, for now I guess we live with that. I think @ngoldbaum is also thinking about this problem from time to time (with all three of free-threading, numpy, and PyO3 hats on, even). |
Currently, our API is kinda like this, and it isn't really coherent with the rest of PyO3's API. #[pyfunction]
fn sum_f32(py: Python<'_>, buf: PyUntypedBuffer) -> PyResult<f32> {
let obj = buf.obj(py).unwrap();
PyBufferView::<f32>::with_flags(obj, PyBufferFlags::contig(), |buf| {
Ok(buf.as_contiguous_slice(py).iter().map(|x| x.get()).sum())
})?
}Ideally, I think we would want something like this. #[pyfunction]
fn sum_f32(
py: Python<'_>,
buf: PyUntypedBufferWith<BufferRequest::ContigRo>,
) -> PyResult<f32> {
let buf = buf.as_typed::<f32>()?;
Ok(buf
.as_contiguous_slice(py)
.iter()
.map(|x| x.get())
.sum())
} |
a6d4f80 to
d22c422
Compare
Co-authored-by: David Hewitt <mail@davidhewitt.dev>
d22c422 to
7f2de53
Compare
Summary
Currently, the overhead of
PyBuffertakes a couple of microseconds to allocate on the heap, which may be too much overhead for some workloads.This PR implements a pinned stack-allocated
PyUntypedBuffervariant,PyUntypedBufferView.Closes #5836