feat: introduce stack-allocated `PyBuffer` by winstxnhdw · Pull Request #5894 · PyO3/pyo3

winstxnhdw · 2026-03-19T10:10:14Z

Summary

Currently, the overhead of PyBuffer takes a couple of microseconds to allocate on the heap, which may be too much overhead for some workloads.

This PR implements a pinned stack-allocated PyUntypedBuffer variant, PyUntypedBufferView.

Closes #5836

davidhewitt

Thanks very much for this. Various thoughts around the accessor methods.

Also, out of scope for this PR, but I keep wondering if we should provide iterators for these structures. Especially with the strides / suboffsets etc, it's not necessarily trivial to get this right.

winstxnhdw · 2026-03-20T16:47:05Z

Also, out of scope for this PR, but I keep wondering if we should provide iterators for these structures. Especially with the strides / suboffsets etc, it's not necessarily trivial to get this right.

Yes, I think it would definitely be useful as well. I am not yet sure how this would look like though.

davidhewitt

Thanks for the continued work here, looking great!

Implementing Drop directly on PyUntypedBufferView is functionally equivalent to what I had in mind from the "drop guard", so gets a 👍 from me.

Afraid I had quite a few more thoughts around some of the edge cases (plus some ideas we can probably ignore). Hopefully helps us get to the right eventual abstraction!

codspeed-hq · 2026-03-24T18:46:30Z

Merging this PR will degrade performance by 10.73%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 1 regressed benchmark
✅ 104 untouched benchmarks
⏩ 1 skipped benchmark¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
❌	`bench_pyclass_create`	3.9 µs	4.4 µs	-10.73%

_{Comparing winstxnhdw:feat/stacked-pybuffer (7f2de53) with main (962a535)}

1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports. ↩

davidhewitt

Very cool!

After reading through this implementation a couple of times, I really like the expressiveness this gives in the type system. I have some ideas which might refine it further (see BufferFlags struct idea).

Main concern is about generic code bloat from explosion of generic parameters. I am unsure if there's a way that we can mitigate that at all.

davidhewitt · 2026-03-31T09:20:15Z

Thanks for all of this - would be interested to see what you think of these suggestions (none are mandatory, I'm just pushing out ideas of what I think I like, which may not be to others' taste!)

davidhewitt · 2026-03-31T13:35:31Z

#5870 has now merged, we'll want to add a similar API here.

winstxnhdw · 2026-04-07T21:38:27Z

Sorry for the delay. I've been a little burnt out from school + work. I've adapted your suggestions and modified them to be what I think is appropriate.

Also, if you do PyBufferFlags::simple().full(), you won't be able to append let's say format() on it anymore because full() already implies format(). Just a nice DX win that I haven't really seen in other libraries like reqwest or tokio.

This is ok.

let bytes = PyBytes::new(py, b"abcde");
let flags = PyBufferFlags::simple()
    .writable()
    .format()
    .c_contiguous()
    .strides();

PyUntypedBufferView::with_flags(&bytes, flags, |view| {
 ...
})

This is not ok.

let bytes = PyBytes::new(py, b"abcde");
let flags = PyBufferFlags::simple()
    .writable()
    .format()
    .full() // errors here
    .c_contiguous()
    .strides();

PyUntypedBufferView::with_flags(&bytes, flags, |view| {
 ...
})

davidhewitt

Thanks, this is looking really cool. We're obviously venturing into new territory here so please bear with me if we have a few rounds of review while we experiment with what feels good (both to implement and to use).

A bunch of comments applied. I suspect we'll learn from these.

davidhewitt

Thanks, I think this is looking in pretty good shape. Let's see if we get an answer on python/cpython#148431 regarding format / itemsize interactions just in case that encourages us to make some other tweaks.

I would love feedback from some buffer API consumers about how this looks.

Ping @alex @kylebarron, I believe you both use PyO3's buffer API, please forgive the ping. I would love to ask:

Would this view type be useful to you for performance benefit over the PyBuffer<T> case which incurs an unconditional heap allocation? (I think it seems worth it for some consumers even if it doesn't affect your use cases, so this won't block merge if you say no.)
As consumers of the API, what do you think of the PyBufferFlags type which encodes the request and what fields from the exporter are meaningful in the type system. See e.g. #5894 (comment) - if the buffer request didn't ask for shape information, then .shape() accessor is not available, for example. I think we've cooked up something pretty nice here, though it would be nice to know whether others agree before we merge it 😅

See also docs for this PyUntypedBufferView on this branch at https://69dbb09a02f842688b3dcc9f--pyo3.netlify.app/main/doc/pyo3/buffer/struct.pyuntypedbufferview

davidhewitt · 2026-04-13T10:05:35Z

+
+/// Type-safe buffer request flags. The const parameters encode which fields
+/// the exporter is required to fill.
+pub struct PyBufferFlags<


A thought: if we wanted to change CONTIGUITY to be an enum later (when possible) or add SUBOFFSETS, we would need users to be unable to directly name this type.

I think we might be able to achieve that by making this struct something like PyBufferFlagsImpl and having the real PyBufferFlags type be a thin re-export around it which uses the PyBufferFlagsImpl as a type parameter, similar to the way the view types do this.

I'm unsure of the merits of doing so, what do you think? It would be nice to have control over the implementation later if it doesn't add tons of complexity for both us and readers.

I agree that we shouldn't expose PyBufferFlags to minimise breaking changes. I've made the change, but I think it looks a bit ugly. I've made it a separate commit so that you can easily view the diff, and for me to revert if needed. Let me know what you think.

It doesn't seem too bad to me, what bit do you dislike?

If you don't like the name FlagsImpl, one thought I have is that we could rename PyBufferFlags -> PyBufferRequest and PyBufferFlagsImpl -> PyBufferFlags.

"Compound Requests" is the term given in the Python docs for the flag combinations, after all.

See also my other comment on .format() method which might hide many of the uses of FlagsImpl if applied to all possible patterns.

kylebarron · 2026-04-13T17:55:59Z

I'm not sure I have enough context to answer these questions. Do you have a full usage example perhaps?

Would this view type be useful to you for performance benefit over the PyBuffer<T> case which incurs an unconditional heap allocation?

What is PyBuffer<T> allocating? It's not copying the entire buffer, right? Do you mean it's essentially Boxing the reference to the Python data? If so, that seems like a small price to pay to be amortized over the entire input buffer (I tend to use the buffer protocol with large inputs, so for a small overhead that's amortized over the entire input I don't mind much).

As consumers of the API, what do you think of the PyBufferFlags type which encodes the request and what fields from the exporter are meaningful in the type system. See e.g. #5894 (comment) - if the buffer request didn't ask for shape information, then .shape() accessor is not available, for example. I think we've cooked up something pretty nice here, though it would be nice to know whether others agree before we merge it 😅

That does look cool.

For my usage I tend to want to expose an input Python buffer as either Bytes or an Arrow Buffer. See for example FromPyObject for my PyBytes, which then I (unsafely) implement AsRef<[u8]> so that I can interpret the Python memory region for Bytes::from_owner.

I think this is sound if the Python user doesn't mutate the buffer while the Rust code is running? My use cases usually expect immutable buffers, even though there's no invariant to force that.

I don't think I'd have much benefit from a FnOnce closure that can access the Python region just once, even though that does look safer than my usage.

winstxnhdw · 2026-04-14T04:09:17Z

I've also been thinking if we could implement the extractor for the view buffers. It would definitely be nicer to use. I'll investigate.

davidhewitt

Thanks, some responses to the PyBufferFlagsImpl changes.

davidhewitt · 2026-04-14T07:40:13Z

+
+/// Type-safe buffer request flags. The const parameters encode which fields
+/// the exporter is required to fill.
+pub struct PyBufferFlags<


It doesn't seem too bad to me, what bit do you dislike?

If you don't like the name FlagsImpl, one thought I have is that we could rename PyBufferFlags -> PyBufferRequest and PyBufferFlagsImpl -> PyBufferFlags.

"Compound Requests" is the term given in the Python docs for the flag combinations, after all.

See also my other comment on .format() method which might hide many of the uses of FlagsImpl if applied to all possible patterns.

davidhewitt · 2026-04-14T07:44:15Z

+    /// A [struct module style](https://docs.python.org/3/c-api/buffer.html#c.Py_buffer.format)
+    /// string describing the contents of a single item.
+    #[inline]
+    pub fn format(&self) -> &CStr {


As part of simplifying the amount of FlagsImpl bouncing around, I came up with this commit: 3a977f3

Two upsides to that commit:

The .format() method is now always available, but has a trait bound with a friendly error message if the buffer request does not provide (or imply) format information.

There is now no mention of FlagsImpl at all on the .format() method, just a trait bound on Flags generic parameter.

What do you think of that? If you like it, maybe you can pull it and do similar for the other accessor methods?

Wow, I didn't know you could do something like this. That looks good.

There is now no mention of FlagsImpl at all on the .format() method, just a trait bound on Flags generic parameter.

I am not sure what's a good middle-ground for this. Your commit only has IncludesFormat, but technically, we can go all the way to include all of these traits as well.

CanRequestFormat

CanRequestShape

CanRequestStrides

CanRequestIndirect

CanRequestWritable

CanRequestContiguity

IncludesFormat

IncludesShape

IncludesStrides

IncludesSuboffsets

GuaranteesWritable

GuaranteesCContiguous

GuaranteesFContiguous

Doing this will get rid much of the FlagsImpl, but the cost of that is adding many more lines of code.

davidhewitt · 2026-04-14T07:54:20Z

What is PyBuffer<T> allocating? It's not copying the entire buffer, right? Do you mean it's essentially Boxing the reference to the Python data? If so, that seems like a small price to pay to be amortized over the entire input buffer (I tend to use the buffer protocol with large inputs, so for a small overhead that's amortized over the entire input I don't mind much).

You're absolutely correct - for large buffers uses the allocation overhead can be irrelevant, and keeping the buffer information on the stack has the downside that you can't keep the buffer export alive for later consumption. OP noticed the performance advantage was relevant to their use case, however.

I can imagine this stack-based form is generally better for short-lived temporary buffers used to read (or write) a small bytes export.

I've also been thinking if we could implement the extractor for the view buffers. It would definitely be nicer to use. I'll investigate.

Can you clarify what you're thinking here? Maybe I can offer ideas :)

davidhewitt · 2026-04-14T07:55:36Z

I think this is sound if the Python user doesn't mutate the buffer while the Rust code is running? My use cases usually expect immutable buffers, even though there's no invariant to force that.

Yes, that sounds about right. @alex has previously written about how the current buffer API is deficient of guarantees against data races, for now I guess we live with that. I think @ngoldbaum is also thinking about this problem from time to time (with all three of free-threading, numpy, and PyO3 hats on, even).

winstxnhdw · 2026-04-14T14:19:53Z

Can you clarify what you're thinking here? Maybe I can offer ideas :)

Currently, our API is kinda like this, and it isn't really coherent with the rest of PyO3's API.

#[pyfunction]
fn sum_f32(py: Python<'_>, buf: PyUntypedBuffer) -> PyResult<f32> {
    let obj = buf.obj(py).unwrap();
    PyBufferView::<f32>::with_flags(obj, PyBufferFlags::contig(), |buf| {
        Ok(buf.as_contiguous_slice(py).iter().map(|x| x.get()).sum())
    })?
}

Ideally, I think we would want something like this.

#[pyfunction]
fn sum_f32(
    py: Python<'_>,
    buf: PyUntypedBufferWith<BufferRequest::ContigRo>,
) -> PyResult<f32> {
    let buf = buf.as_typed::<f32>()?;
    Ok(buf
        .as_contiguous_slice(py)
        .iter()
        .map(|x| x.get())
        .sum())
}

Co-authored-by: David Hewitt <mail@davidhewitt.dev>

winstxnhdw commented Mar 19, 2026

View reviewed changes

Comment thread src/buffer.rs Outdated

davidhewitt reviewed Mar 19, 2026

View reviewed changes

Comment thread src/buffer.rs Outdated

Comment thread src/buffer.rs Outdated

Comment thread src/buffer.rs Outdated

Comment thread src/buffer.rs Outdated

Comment thread src/buffer.rs Outdated

Comment thread src/buffer.rs

davidhewitt reviewed Mar 22, 2026

View reviewed changes

winstxnhdw force-pushed the feat/stacked-pybuffer branch 3 times, most recently from 5430e96 to 48b5fdd Compare March 22, 2026 18:11

winstxnhdw force-pushed the feat/stacked-pybuffer branch from b4f1eab to dd63129 Compare March 27, 2026 15:03

winstxnhdw commented Mar 30, 2026

View reviewed changes

Comment thread src/buffer.rs Outdated

winstxnhdw requested a review from davidhewitt March 30, 2026 10:09

davidhewitt reviewed Mar 31, 2026

View reviewed changes

winstxnhdw force-pushed the feat/stacked-pybuffer branch 6 times, most recently from fcb9203 to c1872f3 Compare April 7, 2026 21:31

winstxnhdw requested a review from davidhewitt April 9, 2026 06:39

davidhewitt reviewed Apr 12, 2026

View reviewed changes

winstxnhdw requested a review from davidhewitt April 12, 2026 14:39

davidhewitt reviewed Apr 13, 2026

View reviewed changes

davidhewitt reviewed Apr 14, 2026

View reviewed changes

winstxnhdw force-pushed the feat/stacked-pybuffer branch 7 times, most recently from a6d4f80 to d22c422 Compare April 16, 2026 20:33

winstxnhdw and others added 17 commits April 17, 2026 04:35

feat: introduce stack-allocated PyBuffer

60a27b0

docs: update CHANGELOG

10ac306

refactor: apply suggestions

c416df7

refactor: use assume_init

61731d5

Co-authored-by: David Hewitt <mail@davidhewitt.dev>

refactor: apply some suggestions

9918908

fix: handle PyBUF_WRITABLE

8a3f5b6

refactor: encode compile-time buffer field availability

4dd25c2

style: clean up

afc060f

tests: extend coverage

52674c5

chore: add obj API

68ac13f

refactor: use PyBufferFlags

2e3801b

feat: add flag builder

5eeecfe

refactor: apply suggestions

59bd694

refactor: hide suboffsets if guaranteed null

1e08989

refactor: hide PyBufferFlags

a924b10

refactor: apply trivial suggestions

436c9b8

refactor: add diagnostic traits

7f2de53

winstxnhdw force-pushed the feat/stacked-pybuffer branch from d22c422 to 7f2de53 Compare April 16, 2026 20:35

winstxnhdw mentioned this pull request Apr 16, 2026

style: apply cargo fmt #5981

Open

Conversation

winstxnhdw commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

davidhewitt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

winstxnhdw commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidhewitt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 10.73%

Performance Changes

Footnotes

Uh oh!

Uh oh!

davidhewitt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidhewitt commented Mar 31, 2026

Uh oh!

davidhewitt commented Mar 31, 2026

Uh oh!

winstxnhdw commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidhewitt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidhewitt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

davidhewitt Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

winstxnhdw Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

winstxnhdw commented Mar 19, 2026 •

edited

Loading

winstxnhdw commented Mar 20, 2026 •

edited

Loading

codspeed-hq bot commented Mar 24, 2026 •

edited

Loading

winstxnhdw commented Apr 7, 2026 •

edited

Loading

winstxnhdw Apr 13, 2026 •

edited

Loading

kylebarron commented Apr 13, 2026 •

edited

Loading

winstxnhdw commented Apr 14, 2026 •

edited

Loading