Support optimized Alternator Vector type#420
Conversation
5fe7764 to
a9aecc0
Compare
|
Thanks! This PR has a bunch of other, unrelated commits - is it planned to merge them separately? |
Yes #392 |
nyh
left a comment
There was a problem hiding this comment.
I'll let the vector-store developers review the full contents of this PR (which seems to have some unrelated patches in it). But I'm "approve"ing because the good news is that this patch does exactly what Alternator needs to support the optimized vector type. I checked, and all the tests I wrote for Alternator with this type pass with this vector-store patch.
a9aecc0 to
e216d88
Compare
|
Changelog:
|
|
BTW we're still waiting for scylladb/scylladb#29554 to decide how would we name the type. It is possible that the Vector won't be encoded with |
This discussion is relevant for the tests that need to send Alternator commands, but not the actual code. The actual code that reads the data will continue, like today, to have a single byte "5" followed by 32-bit floats. This shouldn't change, because the whole point of this feature was to make this thing efficient - and it can't be 64-bit floats or anything else. |
|
@QuerthDP can you please rebase this PR? As I noted above, the name "V" changed to the new name "FLOAT32VECTOR" in the latest version of scylladb/scylladb#29554, but I don't think this makes any difference for your patch. Thanks. I would love to see this patch merged even before the everlasting merge freeze on scylla.git is lifted, so when we want to merge scylladb/scylladb#29554, the vector store would be ready. |
I was waiting with this patch as it's blocked by both scylladb/scylladb#29554 and #392. If you really want this patch to be merged before. I may split it into separate feature and test PRs, but we would need to have it mind, that the feature will not be CI tested in that case (as we already did for the previous patch, so probably not a big deal). |
|
I see. I just updated the vector store (and forgot to use your branch) and was surprised to see this feature disappeared :-) I forgot it was an unmerged PR. I guess we can wait. The splitting of feature and test is also good I think, but it's not urgent and will cause you more work. |
TBH with so many merge conflicts after rebase on master, it would be less work to split the patch and wait for Alternator API to get in first :) |
Add support for Alternator's optimized FLOAT32VECTOR representation stored as type tag 0x05 followed by big-endian f32 values. Keep support for the existing JSON list representation under type tag 0x04 and split the parsing paths to handle both encodings explicitly. Also add validation for malformed binary payload lengths and cover the new encoding with unit tests.
e216d88 to
d8e7154
Compare
|
Changelog:
|
|
|
Motivation
Alternator can store vectors using an optimized binary encoding, but the vector extraction path only supported the JSON list form. That made
FLOAT32VECTOR-encoded vectors unusable even though they represent the same logical data.Supporting both encodings makes the vector handling path consistent with Alternator behavior and ensures queries work regardless of which representation was used to write the item.
Summary
Add support for Alternator's optimized
FLOAT32VECTORtype encoding.Until now, vector extraction only handled the existing JSON-based DynamoDB list representation. This change adds support for the optimized binary encoding used by Alternator.
Changes
FLOAT32VECTORtype payloads encoded as sequential big-endianf32valuesLrepresentationvector-storeTest Coverage
Fixes: VECTOR-650