Improve (de-)serialization performance for scalar arrays by 124C41p · Pull Request #517 · danielgtaylor/python-betterproto

124C41p · 2023-08-06T22:07:53Z

Fixes #515

Since (de-)serialization is implemented purely in Python, it is quite slow compared to native implementations. I try to circumvent that issue by not deserializing repeated scalar fields immediately, but wrapping their byte representation inside the ScalarArray[T] class instead. This class acts like a list. That is, you can call len(a), a[i], and list(a) for any ScalarArray a, and only at this point we actually deserialize (which is still very slow for big arrays).

On the other hand, when using numpy you can also call np.asarray(a) for any ScalarArray a to turn it into a numpy array in no time. Conversely, any numpy array b can be turned into a ScalarArray by calling ScalarArray.from_numpy(b) to be passed to a betterproto dataclass field (instead of a list) for faster serialization speed.

I tried to be as non-breaking as possible. That is, you can use lists everywhere you used them before. However, it was necessary to generate Sequence[T] type hints where List[T] hints were generated before. Also note that ScalarArray is an immutable data structure. So you might not be able to use .append() or .insert() on repeated fields as before (although it should be possible to make ScalarArray mutable if really needed).

What do you think about this approach?

This increases (de-)serialization speed of repeated scalar fields (of fixed length) drastically in the case they are used as numpy arrays.

unreachable code removed generic type parameter removed from ScalarArray for compatibility with Python 3.7 and 3.8 code (auto-)reformatted

Gobot1234 · 2023-12-08T13:03:01Z

Superseded by #545

124C41p marked this pull request as ready for review August 11, 2023 09:00

124C41p and others added 3 commits August 12, 2023 14:12

Class ScalarArray introduced

aeaef01

This increases (de-)serialization speed of repeated scalar fields (of fixed length) drastically in the case they are used as numpy arrays.

Clean Up

126505a

unreachable code removed generic type parameter removed from ScalarArray for compatibility with Python 3.7 and 3.8 code (auto-)reformatted

Merge branch 'master' into scalar_array_speedup

4947666

cetanu self-assigned this Oct 16, 2023

cetanu added enhancement New feature or request low priority labels Oct 16, 2023

Gobot1234 closed this Dec 8, 2023

Gobot1234 mentioned this pull request Mar 23, 2024

Use identity check with PLACEHOLDER instead of equality test #560

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve (de-)serialization performance for scalar arrays#517

Improve (de-)serialization performance for scalar arrays#517
124C41p wants to merge 3 commits intodanielgtaylor:masterfrom
124C41p:scalar_array_speedup

124C41p commented Aug 6, 2023 •

edited

Loading

Uh oh!

Gobot1234 commented Dec 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

124C41p commented Aug 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Gobot1234 commented Dec 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

124C41p commented Aug 6, 2023 •

edited

Loading