Skip to content

feat(pyramid,sindi,warp): add MARK_REMOVE support#2136

Open
LHT129 wants to merge 1 commit into
antgroup:mainfrom
LHT129:2026-06-03-支持pyramid-sindi-warp-mark-remove

Hidden character warning

The head ref may contain hidden characters: "2026-06-03-\u652f\u6301pyramid-sindi-warp-mark-remove"
Open

feat(pyramid,sindi,warp): add MARK_REMOVE support#2136
LHT129 wants to merge 1 commit into
antgroup:mainfrom
LHT129:2026-06-03-支持pyramid-sindi-warp-mark-remove

Conversation

@LHT129
Copy link
Copy Markdown
Collaborator

@LHT129 LHT129 commented Jun 3, 2026

Summary

Add Remove() method to Pyramid, SINDI, and WARP indexes to support MARK_REMOVE mode, enabling marking vectors as deleted without physical removal.

Changes

WARP Index

  • Implement Remove() method using existing delete_count_ member
  • Support RemoveMode::MARK_REMOVE only

Pyramid Index

  • Add std::atomic<int64_t> delete_count_ member variable
  • Modify GetNumElements() to return cur_element_count_ - delete_count_
  • Add GetNumberRemoved() method
  • Implement Remove() method

SINDI Index

  • Add std::atomic<int64_t> delete_count_ member variable
  • Modify GetNumElements() to return cur_element_count_ - delete_count_
  • Add GetNumberRemoved() method
  • Implement Remove() method

Implementation Details

All three indexes:

  • Support Remove(ids, RemoveMode::MARK_REMOVE) operation
  • Return correct active element count via GetNumElements()
  • Return correct deleted element count via GetNumberRemoved()
  • Throw VsagException for unsupported FORCE_REMOVE mode
  • Use label_table_->MarkRemove(ids) for marking deletions
  • Thread-safe with std::scoped_lock protection

Test Plan

  • Build: ✅ Release build successful
  • Format: ✅ Code formatted with make fmt
  • Existing tests: Pyramid, SINDI, WARP tests pass

Related Issue

Closes #2135

Reference

  • HGraph Remove: src/algorithm/hgraph/hgraph_modify.cpp:22-45
  • BruteForce Remove: src/algorithm/bruteforce/bruteforce.cpp:149-187

Copilot AI review requested due to automatic review settings June 3, 2026 08:38
@LHT129 LHT129 self-assigned this Jun 3, 2026
@LHT129 LHT129 added kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) version/1.0 labels Jun 3, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Jun 3, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Require kind label

Wonderful, this rule succeeded.
  • label~=^kind/

🟢 Require version label

Wonderful, this rule succeeded.
  • label~=^version/

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the Remove method and tracks deleted elements via delete_count_ across the Pyramid, SINDI, and WARP algorithms. However, several critical issues were identified: first, delete_count_ is not restored during deserialization in Pyramid and SINDI, leading to incorrect element counts after loading; second, there are concurrency synchronization mismatches in Pyramid (mismatched locks between Add and Remove) and SINDI (using label_lookup_mutex_ instead of global_mutex_); and third, WARP's delete_count_ is not atomic, introducing a data race during concurrent reads and writes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

std::unique_ptr<BasicSearcher> searcher_ = nullptr;
int64_t max_capacity_{0};
int64_t cur_element_count_{0};
std::atomic<int64_t> delete_count_{0};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The delete_count_ member variable is initialized to 0 but is never restored during index deserialization in Pyramid::Deserialize(). This will cause GetNumElements() and GetNumberRemoved() to return incorrect values after the index is loaded from disk. You should restore delete_count_ in Pyramid::Deserialize() by reading the number of deleted IDs from the label table:

delete_count_ = label_table_->GetAllDeletedIds().size();

Comment thread src/algorithm/pyramid/pyramid.cpp Outdated
if (mode != RemoveMode::MARK_REMOVE) {
throw VsagException(ErrorType::INVALID_ARGUMENT, "Pyramid only supports MARK_REMOVE");
}
std::scoped_lock label_lock(this->label_lookup_mutex_);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a concurrency synchronization mismatch between Pyramid::Add and Pyramid::Remove. In Pyramid::Add, new elements are inserted into label_table_ under cur_element_count_mutex_ without acquiring label_lookup_mutex_. In Pyramid::Remove, elements are marked as removed under label_lookup_mutex_ without acquiring cur_element_count_mutex_. Since LabelTable is not thread-safe for concurrent insertions and removals, this mismatch allows concurrent modifications to the label table, leading to data races and potential corruption of label_remap_. You should ensure both operations are synchronized under a common lock.

Suggested change
std::scoped_lock label_lock(this->label_lookup_mutex_);
std::scoped_lock lock(this->label_lookup_mutex_, this->cur_element_count_mutex_);


int64_t cur_element_count_{0};

std::atomic<int64_t> delete_count_{0};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The delete_count_ member variable is initialized to 0 but is never restored during index deserialization in SINDI::Deserialize(). This will cause GetNumElements() and GetNumberRemoved() to return incorrect values after the index is loaded from disk. You should restore delete_count_ in SINDI::Deserialize() by reading the number of deleted IDs from the label table:

delete_count_ = label_table_->GetAllDeletedIds().size();

Comment thread src/algorithm/sindi/sindi.cpp Outdated
if (mode != RemoveMode::MARK_REMOVE) {
throw VsagException(ErrorType::INVALID_ARGUMENT, "SINDI only supports MARK_REMOVE");
}
std::scoped_lock label_lock(this->label_lookup_mutex_);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In SINDI, concurrent operations like Add and Search are synchronized using global_mutex_. Add locks global_mutex_ in exclusive mode, and Search locks it in shared mode. Neither of them locks label_lookup_mutex_. By locking only label_lookup_mutex_ in Remove, it can run concurrently with Add and Search, leading to data races on label_table_ (especially label_remap_ which is not thread-safe for concurrent read/write) and other shared states. You should lock global_mutex_ instead.

Suggested change
std::scoped_lock label_lock(this->label_lookup_mutex_);
std::scoped_lock lock(this->global_mutex_);

}
std::scoped_lock label_lock(this->label_lookup_mutex_);
uint32_t delete_count = this->label_table_->MarkRemove(ids);
delete_count_ += delete_count;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In WARP, delete_count_ is a regular uint64_t and is not atomic. Since GetNumElements() and GetNumberRemoved() read delete_count_ concurrently without acquiring label_lookup_mutex_, updating it here via delete_count_ += delete_count; introduces a data race. Consider changing delete_count_ to std::atomic<uint64_t> (or std::atomic<int64_t>) to ensure thread-safe concurrent reads and writes.

@LHT129 LHT129 force-pushed the 2026-06-03-支持pyramid-sindi-warp-mark-remove branch from 21e13b7 to f85ccf7 Compare June 3, 2026 08:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds MARK_REMOVE-style deletion support to the Pyramid, SINDI, and WARP inner indexes by introducing Remove() implementations and exposing removed/active element counts via GetNumberRemoved() and updated GetNumElements() behavior.

Changes:

  • Implement Remove(ids, RemoveMode::MARK_REMOVE) for WARP, Pyramid, and SINDI using label_table_->MarkRemove(ids).
  • Track removed counts (new delete_count_ atomics for Pyramid/SINDI) and update GetNumElements() to return active elements.
  • Add GetNumberRemoved() for Pyramid/SINDI (WARP already had it).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/algorithm/warp/warp.h Declares Remove() for WARP.
src/algorithm/warp/warp.cpp Implements WARP::Remove() (MARK_REMOVE only) and updates delete counter.
src/algorithm/sindi/sindi.h Updates element counting, adds GetNumberRemoved(), and declares Remove() plus delete_count_.
src/algorithm/sindi/sindi.cpp Implements SINDI::Remove() (MARK_REMOVE only) and updates delete counter.
src/algorithm/pyramid/pyramid.h Declares GetNumberRemoved()/Remove() and adds delete_count_.
src/algorithm/pyramid/pyramid.cpp Updates GetNumElements(), implements GetNumberRemoved()/Remove() for MARK_REMOVE.

Comment thread src/algorithm/pyramid/pyramid.cpp Outdated
int64_t
Pyramid::GetNumElements() const {
return base_codes_->TotalCount();
return cur_element_count_ - delete_count_;
Comment thread src/algorithm/pyramid/pyramid.cpp Outdated
Comment on lines +428 to +432
Pyramid::Remove(const std::vector<int64_t>& ids, RemoveMode mode) {
if (mode != RemoveMode::MARK_REMOVE) {
throw VsagException(ErrorType::INVALID_ARGUMENT, "Pyramid only supports MARK_REMOVE");
}
std::scoped_lock label_lock(this->label_lookup_mutex_);
Comment thread src/algorithm/sindi/sindi.cpp Outdated
Comment on lines +835 to +839
SINDI::Remove(const std::vector<int64_t>& ids, RemoveMode mode) {
if (mode != RemoveMode::MARK_REMOVE) {
throw VsagException(ErrorType::INVALID_ARGUMENT, "SINDI only supports MARK_REMOVE");
}
std::scoped_lock label_lock(this->label_lookup_mutex_);
Comment on lines 168 to +176
int64_t
GetNumElements() const override;

int64_t
GetNumberRemoved() const override;

uint32_t
Remove(const std::vector<int64_t>& ids, RemoveMode mode) override;

Comment on lines 110 to +122
int64_t
GetNumElements() const override {
return cur_element_count_;
return cur_element_count_ - delete_count_;
}

int64_t
GetNumberRemoved() const override {
return delete_count_;
}

uint32_t
Remove(const std::vector<int64_t>& ids, RemoveMode mode) override;

@LHT129 LHT129 force-pushed the 2026-06-03-支持pyramid-sindi-warp-mark-remove branch from f85ccf7 to a1b928e Compare June 3, 2026 09:11
@LHT129 LHT129 force-pushed the 2026-06-03-支持pyramid-sindi-warp-mark-remove branch from a1b928e to 2c60931 Compare June 4, 2026 03:26
Copilot AI review requested due to automatic review settings June 4, 2026 03:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Comment thread src/algorithm/pyramid/pyramid.cpp Outdated
Comment on lines +264 to +270
auto combined_filter = std::make_shared<CombinedFilter>();
combined_filter->AppendFilter(this->label_table_->GetDeletedIdsFilter());
if (filter != nullptr) {
search_param.is_inner_id_allowed =
std::make_shared<InnerIdWrapperFilter>(filter, *label_table_);
combined_filter->AppendFilter(
std::make_shared<InnerIdWrapperFilter>(filter, *this->label_table_));
}
search_param.is_inner_id_allowed = combined_filter;
Comment thread src/algorithm/pyramid/pyramid.cpp Outdated
Comment on lines +309 to +315
auto combined_filter = std::make_shared<CombinedFilter>();
combined_filter->AppendFilter(this->label_table_->GetDeletedIdsFilter());
if (filter != nullptr) {
search_param.is_inner_id_allowed =
std::make_shared<InnerIdWrapperFilter>(filter, *label_table_);
combined_filter->AppendFilter(
std::make_shared<InnerIdWrapperFilter>(filter, *this->label_table_));
}
search_param.is_inner_id_allowed = combined_filter;
Comment on lines +433 to +441
uint32_t
Pyramid::Remove(const std::vector<int64_t>& ids, RemoveMode mode) {
if (mode != RemoveMode::MARK_REMOVE) {
throw VsagException(ErrorType::INVALID_ARGUMENT, "Pyramid only supports MARK_REMOVE");
}
std::scoped_lock lock(this->label_lookup_mutex_, this->cur_element_count_mutex_);
uint32_t delete_count = this->label_table_->MarkRemove(ids);
delete_count_ += delete_count;
return delete_count;
Comment on lines +276 to 283
auto combined_filter = std::make_shared<CombinedFilter>();
combined_filter->AppendFilter(this->label_table_->GetDeletedIdsFilter());
if (filter != nullptr) {
ft = std::make_shared<InnerIdWrapperFilter>(filter, *this->label_table_);
combined_filter->AppendFilter(
std::make_shared<InnerIdWrapperFilter>(filter, *this->label_table_));
}
inner_param.is_inner_id_allowed = ft;
inner_param.is_inner_id_allowed = combined_filter;

Comment on lines +439 to 446
auto combined_filter = std::make_shared<CombinedFilter>();
combined_filter->AppendFilter(this->label_table_->GetDeletedIdsFilter());
if (filter != nullptr) {
ft = std::make_shared<InnerIdWrapperFilter>(filter, *this->label_table_);
combined_filter->AppendFilter(
std::make_shared<InnerIdWrapperFilter>(filter, *this->label_table_));
}
inner_param.is_inner_id_allowed = ft;
inner_param.is_inner_id_allowed = combined_filter;

Comment on lines +842 to +849
SINDI::Remove(const std::vector<int64_t>& ids, RemoveMode mode) {
if (mode != RemoveMode::MARK_REMOVE) {
throw VsagException(ErrorType::INVALID_ARGUMENT, "SINDI only supports MARK_REMOVE");
}
std::scoped_lock lock(this->global_mutex_);
uint32_t delete_count = this->label_table_->MarkRemove(ids);
delete_count_ += delete_count;
return delete_count;
Comment on lines +673 to +682
uint32_t
WARP::Remove(const std::vector<int64_t>& ids, RemoveMode mode) {
if (mode != RemoveMode::MARK_REMOVE) {
throw VsagException(ErrorType::INVALID_ARGUMENT, "WARP only supports MARK_REMOVE");
}
std::scoped_lock label_lock(this->label_lookup_mutex_);
uint32_t delete_count = this->label_table_->MarkRemove(ids);
delete_count_ += delete_count;
return delete_count;
}
Comment on lines +673 to +682
uint32_t
WARP::Remove(const std::vector<int64_t>& ids, RemoveMode mode) {
if (mode != RemoveMode::MARK_REMOVE) {
throw VsagException(ErrorType::INVALID_ARGUMENT, "WARP only supports MARK_REMOVE");
}
std::scoped_lock label_lock(this->label_lookup_mutex_);
uint32_t delete_count = this->label_table_->MarkRemove(ids);
delete_count_ += delete_count;
return delete_count;
}
Add Remove() method to Pyramid, SINDI, and WARP indexes to support
MARK_REMOVE mode. This enables marking vectors as deleted without
physical removal.

Changes:
- WARP: Implement Remove() using existing delete_count_
- Pyramid: Add delete_count_, implement GetNumElements/GetNumberRemoved/Remove
- SINDI: Add delete_count_, implement GetNumElements/GetNumberRemoved/Remove

All three indexes now:
- Support Remove(ids, RemoveMode::MARK_REMOVE)
- Return correct counts via GetNumElements() and GetNumberRemoved()
- Throw exception for unsupported FORCE_REMOVE mode

Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Co-authored-by: opencode <opencode@anthropic.com>
@LHT129 LHT129 force-pushed the 2026-06-03-支持pyramid-sindi-warp-mark-remove branch from 2c60931 to 044304a Compare June 4, 2026 07:39
@pull-request-size pull-request-size Bot added size/L and removed size/M labels Jun 4, 2026
@LHT129
Copy link
Copy Markdown
Collaborator Author

LHT129 commented Jun 4, 2026

Thanks for the careful reviews! Addressed in 044304a:

Correctness (gemini)

  • Pyramid/SINDI Deserialize now restores delete_count_ from label_table_->GetAllDeletedIds().size() (WARP already done). Note: tombstone persistence requires support_tombstone_=true on the label table; left as opt-in to match HGraph defaults.
  • Pyramid::Remove now acquires both label_lookup_mutex_ and cur_element_count_mutex_ to match Add.
  • SINDI::Remove now locks global_mutex_ to align with Add/Search.
  • WARP delete_count_ is now std::atomic<uint64_t>.

Perf / fast-path (copilot)

  • Pyramid/SINDI KnnSearch+RangeSearch now pass nullptr instead of an empty CombinedFilter so the unfiltered fast-path is preserved when there are no deletions or external filter.

Feature flags (copilot)

  • All three indexes now set SUPPORT_DELETE_BY_ID in InitFeatures().

Tests (copilot)

  • Added [ft][remove] test cases for Pyramid, SINDI and WARP covering: initial counts, FORCE_REMOVE rejection, mark-remove count, duplicate-remove (zero deletions), and search exclusion of removed ids. Persistence of tombstones across serialize/deserialize is out of scope here (consistent with HGraph), and would require enabling the existing support_tombstone_ option.

All 22050 assertions in the 3 new test cases pass locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) module/testing size/L version/1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add MARK_REMOVE support for Pyramid, SINDI, and WARP indexes

2 participants