Skip to content

Support optimized Alternator Vector type#420

Merged
ewienik merged 1 commit into
scylladb:masterfrom
QuerthDP:add-alternator-v-type
May 6, 2026
Merged

Support optimized Alternator Vector type#420
ewienik merged 1 commit into
scylladb:masterfrom
QuerthDP:add-alternator-v-type

Conversation

@QuerthDP

@QuerthDP QuerthDP commented Apr 20, 2026

Copy link
Copy Markdown
Member

Motivation

Alternator can store vectors using an optimized binary encoding, but the vector extraction path only supported the JSON list form. That made FLOAT32VECTOR-encoded vectors unusable even though they represent the same logical data.

Supporting both encodings makes the vector handling path consistent with Alternator behavior and ensures queries work regardless of which representation was used to write the item.

Summary

Add support for Alternator's optimized FLOAT32VECTOR type encoding.

Until now, vector extraction only handled the existing JSON-based DynamoDB list representation. This change adds support for the optimized binary encoding used by Alternator.

Changes

  • add parsing for Alternator FLOAT32VECTOR type payloads encoded as sequential big-endian f32 values
  • keep support for the existing JSON L representation
  • validate malformed binary payloads before decoding
  • add unit coverage for the new parsing path in vector-store

Test Coverage

  • unit tests for Alternator vector blob parsing

Fixes: VECTOR-650

@nyh

nyh commented Apr 20, 2026

Copy link
Copy Markdown
Contributor

Thanks! This PR has a bunch of other, unrelated commits - is it planned to merge them separately?

@QuerthDP

Copy link
Copy Markdown
Member Author

Thanks! This PR has a bunch of other, unrelated commits - is it planned to merge them separately?

Yes #392

Comment thread crates/vector-store/src/vector.rs Outdated
Comment thread crates/vector-store/src/vector.rs
Comment thread crates/vector-store/src/vector.rs

@nyh nyh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let the vector-store developers review the full contents of this PR (which seems to have some unrelated patches in it). But I'm "approve"ing because the good news is that this patch does exactly what Alternator needs to support the optimized vector type. I checked, and all the tests I wrote for Alternator with this type pass with this vector-store patch.

@QuerthDP

Copy link
Copy Markdown
Member Author

Changelog:

  • added commit messages
  • renamed parsing functions
  • adjusted comments and error messages

@QuerthDP QuerthDP requested a review from ewienik April 23, 2026 13:31
@QuerthDP

Copy link
Copy Markdown
Member Author

@ewienik you can take a look at the af1a7ed patch. The previous commits are coming from #392 and the last one is based on that, so let's not review that yet.

@QuerthDP

Copy link
Copy Markdown
Member Author

BTW we're still waiting for scylladb/scylladb#29554 to decide how would we name the type. It is possible that the Vector won't be encoded with V discriminator.

Comment thread crates/vector-store/src/vector.rs Outdated
Comment thread crates/vector-store/src/vector.rs Outdated
Comment thread crates/vector-store/src/vector.rs Outdated
@ewienik

ewienik commented Apr 23, 2026

Copy link
Copy Markdown
Collaborator

@ewienik you can take a look at the af1a7ed patch. The previous commits are coming from #392 and the last one is based on that, so let's not review that yet.

The commit seems ok

@nyh

nyh commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

BTW we're still waiting for scylladb/scylladb#29554 to decide how would we name the type. It is possible that the Vector won't be encoded with V discriminator.

This discussion is relevant for the tests that need to send Alternator commands, but not the actual code. The actual code that reads the data will continue, like today, to have a single byte "5" followed by 32-bit floats. This shouldn't change, because the whole point of this feature was to make this thing efficient - and it can't be 64-bit floats or anything else.

@nyh

nyh commented May 6, 2026

Copy link
Copy Markdown
Contributor

@QuerthDP can you please rebase this PR? As I noted above, the name "V" changed to the new name "FLOAT32VECTOR" in the latest version of scylladb/scylladb#29554, but I don't think this makes any difference for your patch.

Thanks. I would love to see this patch merged even before the everlasting merge freeze on scylla.git is lifted, so when we want to merge scylladb/scylladb#29554, the vector store would be ready.

@QuerthDP

QuerthDP commented May 6, 2026

Copy link
Copy Markdown
Member Author

Thanks. I would love to see this patch merged even before the everlasting merge freeze on scylla.git is lifted, so when we want to merge scylladb/scylladb#29554, the vector store would be ready.

I was waiting with this patch as it's blocked by both scylladb/scylladb#29554 and #392.
Without the Scylla one, the tests won't pass as our CI only runs on some scylla-nightly build, so that we need the Scylla patch first.
Without the Alternator test API (on which this PR is based) we cannot write tests at all!

If you really want this patch to be merged before. I may split it into separate feature and test PRs, but we would need to have it mind, that the feature will not be CI tested in that case (as we already did for the previous patch, so probably not a big deal).

@nyh

nyh commented May 6, 2026

Copy link
Copy Markdown
Contributor

I see. I just updated the vector store (and forgot to use your branch) and was surprised to see this feature disappeared :-) I forgot it was an unmerged PR.

I guess we can wait. The splitting of feature and test is also good I think, but it's not urgent and will cause you more work.

@QuerthDP

QuerthDP commented May 6, 2026

Copy link
Copy Markdown
Member Author

I see. I just updated the vector store (and forgot to use your branch) and was surprised to see this feature disappeared :-) I forgot it was an unmerged PR.

I guess we can wait. The splitting of feature and test is also good I think, but it's not urgent and will cause you more work.

TBH with so many merge conflicts after rebase on master, it would be less work to split the patch and wait for Alternator API to get in first :)

Add support for Alternator's optimized FLOAT32VECTOR representation stored
as type tag 0x05 followed by big-endian f32 values.

Keep support for the existing JSON list representation under type tag 0x04
and split the parsing paths to handle both encodings explicitly.

Also add validation for malformed binary payload lengths and cover the new
encoding with unit tests.
@QuerthDP QuerthDP force-pushed the add-alternator-v-type branch from e216d88 to d8e7154 Compare May 6, 2026 08:32
@QuerthDP

QuerthDP commented May 6, 2026

Copy link
Copy Markdown
Member Author

Changelog:

  • rebased on master
  • removed the Validator tests, those will be included in separate PR

@QuerthDP

QuerthDP commented May 6, 2026

Copy link
Copy Markdown
Member Author

cargo test failed due to known issue: VECTOR-552
Rerunning.

@QuerthDP QuerthDP marked this pull request as ready for review May 6, 2026 08:38
@QuerthDP QuerthDP requested a review from ewienik May 6, 2026 08:38
@ewienik ewienik added this pull request to the merge queue May 6, 2026
Merged via the queue into scylladb:master with commit 57ba683 May 6, 2026
34 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants