[WIP] Zero Copy String Deserialization#88
Conversation
…dation on deserialization
|
Because Strings encoded in a proto message should be UTF8, I added a feature flag |
nipunn1313
left a comment
There was a problem hiding this comment.
This looks super good!
Saw that tests weren't passing - so poke at that a bit more.
| [dependencies] | ||
| bytes = "0.5.6" | ||
| pb-jelly = "0.0.5" | ||
| pb-jelly = { path = "../pb-jelly" } |
There was a problem hiding this comment.
Per https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#multiple-locations - you can actually specify both path and version
Actually, in this case, since examples aren't being posted to crates.io, we probably should keep it as only a path dependency.
| versions = { | ||
| u"lazy_static": u' version = "1.4.0" ', | ||
| u"pb-jelly": u' version = "0.0.5" ', | ||
| u"pb-jelly": u' path = "../../../../../pb-jelly" ', |
There was a problem hiding this comment.
hmmm, we gotta find a better way to do the right thing here =\
Bazel and Spec.toml totally sidestep this issue - it's actually the entire point of Spec.toml - to template the location/version of the dependency.
This seems like a place where we'd accidentally have the wrong version provided - and it seems like our test suite will not actually be using the right pb-jelly dependency here, unless you make this change.
Maybe one idea - is that the version of pb-jelly-gen could get passed in as an argument to codegen.py - where our testsuite could inject ../../../../../pb-jelly. Then we're not hard coding a version number here.
Perhaps an idea for a separate issue?
| # of bytes is UTF-8 on deserialization shouldn't be needed. As a precaution though, we do this | ||
| # validation step. But validation can take a relatively large amount of time, so for users | ||
| # who wish to skip this validation, we provide the `zero_copy_string_no_utf8_check` feature flag. | ||
| zero_copy_string_no_utf8_check = [] |
There was a problem hiding this comment.
Is it possible for cargo bench to test with and w/o this feature flag?
Seems like it might not be possible w/ just how cargo bench works.
I thought this was a pretty interesting task, adding zero copy de-serialization for Strings! Still a work in progress, but basically I created a type
StrByteswhich is a wrapper around aBytesstruct, and on creation we assert it's valid UTF-8, which it should be because based on the protobuf spec, strings are encoded in valid UTF-8.Benchmarks (2014 MacBook Pro with an i7)
Note: The reason zero copy strings are not as fast as zero copy bytes is because we do the extra validation step