feat(pyramid,sindi,warp): add MARK_REMOVE support#2136
Hidden character warning
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Require kind labelWonderful, this rule succeeded.
🟢 Require version labelWonderful, this rule succeeded.
|
There was a problem hiding this comment.
Code Review
This pull request implements the Remove method and tracks deleted elements via delete_count_ across the Pyramid, SINDI, and WARP algorithms. However, several critical issues were identified: first, delete_count_ is not restored during deserialization in Pyramid and SINDI, leading to incorrect element counts after loading; second, there are concurrency synchronization mismatches in Pyramid (mismatched locks between Add and Remove) and SINDI (using label_lookup_mutex_ instead of global_mutex_); and third, WARP's delete_count_ is not atomic, introducing a data race during concurrent reads and writes.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| std::unique_ptr<BasicSearcher> searcher_ = nullptr; | ||
| int64_t max_capacity_{0}; | ||
| int64_t cur_element_count_{0}; | ||
| std::atomic<int64_t> delete_count_{0}; |
There was a problem hiding this comment.
The delete_count_ member variable is initialized to 0 but is never restored during index deserialization in Pyramid::Deserialize(). This will cause GetNumElements() and GetNumberRemoved() to return incorrect values after the index is loaded from disk. You should restore delete_count_ in Pyramid::Deserialize() by reading the number of deleted IDs from the label table:
delete_count_ = label_table_->GetAllDeletedIds().size();| if (mode != RemoveMode::MARK_REMOVE) { | ||
| throw VsagException(ErrorType::INVALID_ARGUMENT, "Pyramid only supports MARK_REMOVE"); | ||
| } | ||
| std::scoped_lock label_lock(this->label_lookup_mutex_); |
There was a problem hiding this comment.
There is a concurrency synchronization mismatch between Pyramid::Add and Pyramid::Remove. In Pyramid::Add, new elements are inserted into label_table_ under cur_element_count_mutex_ without acquiring label_lookup_mutex_. In Pyramid::Remove, elements are marked as removed under label_lookup_mutex_ without acquiring cur_element_count_mutex_. Since LabelTable is not thread-safe for concurrent insertions and removals, this mismatch allows concurrent modifications to the label table, leading to data races and potential corruption of label_remap_. You should ensure both operations are synchronized under a common lock.
| std::scoped_lock label_lock(this->label_lookup_mutex_); | |
| std::scoped_lock lock(this->label_lookup_mutex_, this->cur_element_count_mutex_); |
|
|
||
| int64_t cur_element_count_{0}; | ||
|
|
||
| std::atomic<int64_t> delete_count_{0}; |
There was a problem hiding this comment.
The delete_count_ member variable is initialized to 0 but is never restored during index deserialization in SINDI::Deserialize(). This will cause GetNumElements() and GetNumberRemoved() to return incorrect values after the index is loaded from disk. You should restore delete_count_ in SINDI::Deserialize() by reading the number of deleted IDs from the label table:
delete_count_ = label_table_->GetAllDeletedIds().size();| if (mode != RemoveMode::MARK_REMOVE) { | ||
| throw VsagException(ErrorType::INVALID_ARGUMENT, "SINDI only supports MARK_REMOVE"); | ||
| } | ||
| std::scoped_lock label_lock(this->label_lookup_mutex_); |
There was a problem hiding this comment.
In SINDI, concurrent operations like Add and Search are synchronized using global_mutex_. Add locks global_mutex_ in exclusive mode, and Search locks it in shared mode. Neither of them locks label_lookup_mutex_. By locking only label_lookup_mutex_ in Remove, it can run concurrently with Add and Search, leading to data races on label_table_ (especially label_remap_ which is not thread-safe for concurrent read/write) and other shared states. You should lock global_mutex_ instead.
| std::scoped_lock label_lock(this->label_lookup_mutex_); | |
| std::scoped_lock lock(this->global_mutex_); |
| } | ||
| std::scoped_lock label_lock(this->label_lookup_mutex_); | ||
| uint32_t delete_count = this->label_table_->MarkRemove(ids); | ||
| delete_count_ += delete_count; |
There was a problem hiding this comment.
In WARP, delete_count_ is a regular uint64_t and is not atomic. Since GetNumElements() and GetNumberRemoved() read delete_count_ concurrently without acquiring label_lookup_mutex_, updating it here via delete_count_ += delete_count; introduces a data race. Consider changing delete_count_ to std::atomic<uint64_t> (or std::atomic<int64_t>) to ensure thread-safe concurrent reads and writes.
21e13b7 to
f85ccf7
Compare
There was a problem hiding this comment.
Pull request overview
Adds MARK_REMOVE-style deletion support to the Pyramid, SINDI, and WARP inner indexes by introducing Remove() implementations and exposing removed/active element counts via GetNumberRemoved() and updated GetNumElements() behavior.
Changes:
- Implement
Remove(ids, RemoveMode::MARK_REMOVE)for WARP, Pyramid, and SINDI usinglabel_table_->MarkRemove(ids). - Track removed counts (new
delete_count_atomics for Pyramid/SINDI) and updateGetNumElements()to return active elements. - Add
GetNumberRemoved()for Pyramid/SINDI (WARP already had it).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/algorithm/warp/warp.h | Declares Remove() for WARP. |
| src/algorithm/warp/warp.cpp | Implements WARP::Remove() (MARK_REMOVE only) and updates delete counter. |
| src/algorithm/sindi/sindi.h | Updates element counting, adds GetNumberRemoved(), and declares Remove() plus delete_count_. |
| src/algorithm/sindi/sindi.cpp | Implements SINDI::Remove() (MARK_REMOVE only) and updates delete counter. |
| src/algorithm/pyramid/pyramid.h | Declares GetNumberRemoved()/Remove() and adds delete_count_. |
| src/algorithm/pyramid/pyramid.cpp | Updates GetNumElements(), implements GetNumberRemoved()/Remove() for MARK_REMOVE. |
| int64_t | ||
| Pyramid::GetNumElements() const { | ||
| return base_codes_->TotalCount(); | ||
| return cur_element_count_ - delete_count_; |
| Pyramid::Remove(const std::vector<int64_t>& ids, RemoveMode mode) { | ||
| if (mode != RemoveMode::MARK_REMOVE) { | ||
| throw VsagException(ErrorType::INVALID_ARGUMENT, "Pyramid only supports MARK_REMOVE"); | ||
| } | ||
| std::scoped_lock label_lock(this->label_lookup_mutex_); |
| SINDI::Remove(const std::vector<int64_t>& ids, RemoveMode mode) { | ||
| if (mode != RemoveMode::MARK_REMOVE) { | ||
| throw VsagException(ErrorType::INVALID_ARGUMENT, "SINDI only supports MARK_REMOVE"); | ||
| } | ||
| std::scoped_lock label_lock(this->label_lookup_mutex_); |
| int64_t | ||
| GetNumElements() const override; | ||
|
|
||
| int64_t | ||
| GetNumberRemoved() const override; | ||
|
|
||
| uint32_t | ||
| Remove(const std::vector<int64_t>& ids, RemoveMode mode) override; | ||
|
|
| int64_t | ||
| GetNumElements() const override { | ||
| return cur_element_count_; | ||
| return cur_element_count_ - delete_count_; | ||
| } | ||
|
|
||
| int64_t | ||
| GetNumberRemoved() const override { | ||
| return delete_count_; | ||
| } | ||
|
|
||
| uint32_t | ||
| Remove(const std::vector<int64_t>& ids, RemoveMode mode) override; | ||
|
|
f85ccf7 to
a1b928e
Compare
a1b928e to
2c60931
Compare
| auto combined_filter = std::make_shared<CombinedFilter>(); | ||
| combined_filter->AppendFilter(this->label_table_->GetDeletedIdsFilter()); | ||
| if (filter != nullptr) { | ||
| search_param.is_inner_id_allowed = | ||
| std::make_shared<InnerIdWrapperFilter>(filter, *label_table_); | ||
| combined_filter->AppendFilter( | ||
| std::make_shared<InnerIdWrapperFilter>(filter, *this->label_table_)); | ||
| } | ||
| search_param.is_inner_id_allowed = combined_filter; |
| auto combined_filter = std::make_shared<CombinedFilter>(); | ||
| combined_filter->AppendFilter(this->label_table_->GetDeletedIdsFilter()); | ||
| if (filter != nullptr) { | ||
| search_param.is_inner_id_allowed = | ||
| std::make_shared<InnerIdWrapperFilter>(filter, *label_table_); | ||
| combined_filter->AppendFilter( | ||
| std::make_shared<InnerIdWrapperFilter>(filter, *this->label_table_)); | ||
| } | ||
| search_param.is_inner_id_allowed = combined_filter; |
| uint32_t | ||
| Pyramid::Remove(const std::vector<int64_t>& ids, RemoveMode mode) { | ||
| if (mode != RemoveMode::MARK_REMOVE) { | ||
| throw VsagException(ErrorType::INVALID_ARGUMENT, "Pyramid only supports MARK_REMOVE"); | ||
| } | ||
| std::scoped_lock lock(this->label_lookup_mutex_, this->cur_element_count_mutex_); | ||
| uint32_t delete_count = this->label_table_->MarkRemove(ids); | ||
| delete_count_ += delete_count; | ||
| return delete_count; |
| auto combined_filter = std::make_shared<CombinedFilter>(); | ||
| combined_filter->AppendFilter(this->label_table_->GetDeletedIdsFilter()); | ||
| if (filter != nullptr) { | ||
| ft = std::make_shared<InnerIdWrapperFilter>(filter, *this->label_table_); | ||
| combined_filter->AppendFilter( | ||
| std::make_shared<InnerIdWrapperFilter>(filter, *this->label_table_)); | ||
| } | ||
| inner_param.is_inner_id_allowed = ft; | ||
| inner_param.is_inner_id_allowed = combined_filter; | ||
|
|
| auto combined_filter = std::make_shared<CombinedFilter>(); | ||
| combined_filter->AppendFilter(this->label_table_->GetDeletedIdsFilter()); | ||
| if (filter != nullptr) { | ||
| ft = std::make_shared<InnerIdWrapperFilter>(filter, *this->label_table_); | ||
| combined_filter->AppendFilter( | ||
| std::make_shared<InnerIdWrapperFilter>(filter, *this->label_table_)); | ||
| } | ||
| inner_param.is_inner_id_allowed = ft; | ||
| inner_param.is_inner_id_allowed = combined_filter; | ||
|
|
| SINDI::Remove(const std::vector<int64_t>& ids, RemoveMode mode) { | ||
| if (mode != RemoveMode::MARK_REMOVE) { | ||
| throw VsagException(ErrorType::INVALID_ARGUMENT, "SINDI only supports MARK_REMOVE"); | ||
| } | ||
| std::scoped_lock lock(this->global_mutex_); | ||
| uint32_t delete_count = this->label_table_->MarkRemove(ids); | ||
| delete_count_ += delete_count; | ||
| return delete_count; |
| uint32_t | ||
| WARP::Remove(const std::vector<int64_t>& ids, RemoveMode mode) { | ||
| if (mode != RemoveMode::MARK_REMOVE) { | ||
| throw VsagException(ErrorType::INVALID_ARGUMENT, "WARP only supports MARK_REMOVE"); | ||
| } | ||
| std::scoped_lock label_lock(this->label_lookup_mutex_); | ||
| uint32_t delete_count = this->label_table_->MarkRemove(ids); | ||
| delete_count_ += delete_count; | ||
| return delete_count; | ||
| } |
| uint32_t | ||
| WARP::Remove(const std::vector<int64_t>& ids, RemoveMode mode) { | ||
| if (mode != RemoveMode::MARK_REMOVE) { | ||
| throw VsagException(ErrorType::INVALID_ARGUMENT, "WARP only supports MARK_REMOVE"); | ||
| } | ||
| std::scoped_lock label_lock(this->label_lookup_mutex_); | ||
| uint32_t delete_count = this->label_table_->MarkRemove(ids); | ||
| delete_count_ += delete_count; | ||
| return delete_count; | ||
| } |
Add Remove() method to Pyramid, SINDI, and WARP indexes to support MARK_REMOVE mode. This enables marking vectors as deleted without physical removal. Changes: - WARP: Implement Remove() using existing delete_count_ - Pyramid: Add delete_count_, implement GetNumElements/GetNumberRemoved/Remove - SINDI: Add delete_count_, implement GetNumElements/GetNumberRemoved/Remove All three indexes now: - Support Remove(ids, RemoveMode::MARK_REMOVE) - Return correct counts via GetNumElements() and GetNumberRemoved() - Throw exception for unsupported FORCE_REMOVE mode Signed-off-by: LHT129 <tianlan.lht@antgroup.com> Co-authored-by: opencode <opencode@anthropic.com>
2c60931 to
044304a
Compare
|
Thanks for the careful reviews! Addressed in 044304a: Correctness (gemini)
Perf / fast-path (copilot)
Feature flags (copilot)
Tests (copilot)
All 22050 assertions in the 3 new test cases pass locally. |
Summary
Add Remove() method to Pyramid, SINDI, and WARP indexes to support MARK_REMOVE mode, enabling marking vectors as deleted without physical removal.
Changes
WARP Index
Pyramid Index
SINDI Index
Implementation Details
All three indexes:
Test Plan
Related Issue
Closes #2135
Reference