Skip to content

Fix/sindi auto heap insert#2147

Open
Roxanne0321 wants to merge 1 commit into
antgroup:mainfrom
Roxanne0321:fix/sindi-auto-heap-insert
Open

Fix/sindi auto heap insert#2147
Roxanne0321 wants to merge 1 commit into
antgroup:mainfrom
Roxanne0321:fix/sindi-auto-heap-insert

Conversation

@Roxanne0321
Copy link
Copy Markdown
Collaborator

Summary

  • Remove the public SINDI search parameter use_term_lists_heap_insert.
  • Derive the heap insertion strategy internally from doc_prune_ratio and query_prune_ratio.
  • Update SINDI analyzer logic, tests, and SINDI documentation in English and Chinese.

Details

SINDI now selects the heap insertion path automatically:

  • When doc_prune_ratio is 0 or 0.1, and query_prune_ratio is 0 or 0.1, use the original use_term_lists_heap_insert = false path.
  • In all other cases, use the original use_term_lists_heap_insert = true path.

Validation

  • clang-format --version -> 15.0.7
  • git diff --check
  • cmake --build build --target unittests -j2
    • SINDI and algorithm_test compiled successfully.
    • Final tests/unittests link failed due to local OpenBLAS/libgfortran unresolved symbol: _gfortran_concat_string.
  • cmake --build build --target functests -j2
    • tests/test_sindi.cpp compiled successfully.
    • Final tests/functests link failed with the same local _gfortran_concat_string issue.

Copilot AI review requested due to automatic review settings June 5, 2026 07:12
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Jun 5, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Require kind label

Wonderful, this rule succeeded.
  • label~=^kind/

🟢 Require version label

Wonderful, this rule succeeded.
  • label~=^version/

@Roxanne0321 Roxanne0321 added kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) version/1.0 and removed module/testing module/docs labels Jun 5, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR removes the use_term_lists_heap_insert SINDI search parameter and switches heap insertion strategy selection to be automatic based on doc_prune_ratio (build-time) and query_prune_ratio (search-time).

Changes:

  • Removed use_term_lists_heap_insert from SINDI search parameters (parsing, serialization, constants, tests, and docs).
  • Added automatic heap insertion strategy selection via SINDI::UseTermListsHeapInsert(...) and wired it into search/analyzer paths.
  • Updated tests to exercise pruning ratios instead of the removed boolean flag, and updated docs to reflect the new behavior.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/test_sindi.cpp Updates tests to generate search params using query_prune_ratio instead of the removed boolean flag.
tests/test_memleak.cpp Changes index parameter handling to use fmt::format(parameter, dim) inside ScopedIndex.
src/inner_string_params.h Removes the string key constant for use_term_lists_heap_insert.
src/analyzer/sindi_analyzer.cpp Switches analyzer decision to SINDI::UseTermListsHeapInsert(search_param).
src/algorithm/sindi/sindi_test.cpp Removes use_term_lists_heap_insert from unit test JSON strings.
src/algorithm/sindi/sindi_parameter_test.cpp Adds an assertion that ToJson() no longer emits the removed key.
src/algorithm/sindi/sindi_parameter.h Removes the boolean field from SINDISearchParameter.
src/algorithm/sindi/sindi_parameter.cpp Removes parse/serialize support for use_term_lists_heap_insert.
src/algorithm/sindi/sindi.h Adds UseTermListsHeapInsert(...) and stores doc_prune_ratio_.
src/algorithm/sindi/sindi.cpp Implements automatic strategy selection and updates search call sites accordingly.
docs/sindi.md Removes the parameter section and explains automatic strategy selection.
docs/docs/zh/src/indexes/sindi.md Removes the parameter row and explains automatic strategy selection (ZH).
docs/docs/en/src/indexes/sindi.md Removes the parameter row and explains automatic strategy selection (EN).

Comment thread tests/test_memleak.cpp Outdated
public:
explicit ScopedIndex(const std::string& name,
const std::string& build_parameters,
const std::string& parameter,
Comment thread tests/test_memleak.cpp Outdated
Allocator* allocator) {
resource_ = std::make_unique<vsag::Resource>(allocator, nullptr);
engine_ = std::make_unique<vsag::Engine>(resource_.get());
std::string build_parameters = fmt::format(parameter, dim);
Comment on lines +141 to +145
bool use_term_list_ordered_insert,
const SparseVector* original_query = nullptr) const;

bool
UseTermListsHeapInsert(const SINDISearchParameter& search_param) const;
Comment on lines +452 to 460
bool
SINDI::UseTermListsHeapInsert(const SINDISearchParameter& search_param) const {
auto is_special_prune_ratio = [](float ratio) {
constexpr float kEpsilon = 1e-6F;
return std::abs(ratio) < kEpsilon || std::abs(ratio - 0.1F) < kEpsilon;
};
return not(is_special_prune_ratio(doc_prune_ratio_) &&
is_special_prune_ratio(search_param.query_prune_ratio));
}
Comment on lines 150 to 152
} else {
n_candidate = DEFAULT_N_CANDIDATE;
}
Comment on lines +452 to 460
bool
SINDI::UseTermListsHeapInsert(const SINDISearchParameter& search_param) const {
auto is_special_prune_ratio = [](float ratio) {
constexpr float kEpsilon = 1e-6F;
return std::abs(ratio) < kEpsilon || std::abs(ratio - 0.1F) < kEpsilon;
};
return not(is_special_prune_ratio(doc_prune_ratio_) &&
is_special_prune_ratio(search_param.query_prune_ratio));
}
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the manual use_term_lists_heap_insert search parameter from the SINDI index, allowing the heap-insertion strategy to be chosen automatically based on doc_prune_ratio and query_prune_ratio. This change is reflected across documentation, tests, parameter parsing, and the core implementation. However, a critical issue was identified in tests/test_memleak.cpp where bypassing the generate_param helper function results in an invalid parameter string, causing CreateIndex to fail at runtime. It is recommended to revert this change to keep the memory leak tests functional.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread tests/test_memleak.cpp Outdated
Comment on lines +97 to +104
explicit ScopedIndex(const std::string& name,
const std::string& build_parameters,
const std::string& parameter,
int64_t num_vectors,
int64_t dim,
Allocator* allocator) {
resource_ = std::make_unique<vsag::Resource>(allocator, nullptr);
engine_ = std::make_unique<vsag::Engine>(resource_.get());
std::string build_parameters = fmt::format(parameter, dim);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The changes in tests/test_memleak.cpp bypass the generate_param helper function, which constructs the valid JSON parameter string required by CreateIndex. Instead, the raw quantization type string (e.g., "fp32") is passed directly as parameter to ScopedIndex, where it is formatted with dim. This results in an invalid parameter string (e.g., "fp32" instead of a valid JSON object), causing CreateIndex to fail at runtime. Since the build/link failed locally, this runtime failure was not caught. Reverting this change to use generate_param is necessary to keep the memory leak tests functional.

Suggested change
explicit ScopedIndex(const std::string& name,
const std::string& build_parameters,
const std::string& parameter,
int64_t num_vectors,
int64_t dim,
Allocator* allocator) {
resource_ = std::make_unique<vsag::Resource>(allocator, nullptr);
engine_ = std::make_unique<vsag::Engine>(resource_.get());
std::string build_parameters = fmt::format(parameter, dim);
explicit ScopedIndex(const std::string& name,
const std::string& build_parameters,
int64_t num_vectors,
int64_t dim,
Allocator* allocator) {
resource_ = std::make_unique<vsag::Resource>(allocator, nullptr);
engine_ = std::make_unique<vsag::Engine>(resource_.get());

Comment thread tests/test_memleak.cpp Outdated
Comment on lines 186 to 188
for (const auto& [index_name, index_parameter, search_parameter, num_vectors, dim] :
INDEX_PARAMS) {
auto index_parameter = generate_param(index_name, base_quantization_type, dim);
ScopedIndex scoped_index(index_name, index_parameter, num_vectors, dim, allocator);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Revert this loop to use the generate_param helper function to construct the valid JSON parameter string for ScopedIndex.

Suggested change
for (const auto& [index_name, index_parameter, search_parameter, num_vectors, dim] :
INDEX_PARAMS) {
auto index_parameter = generate_param(index_name, base_quantization_type, dim);
ScopedIndex scoped_index(index_name, index_parameter, num_vectors, dim, allocator);
for (const auto& [index_name, base_quantization_type, search_parameter, num_vectors, dim] :
INDEX_PARAMS) {
auto index_parameter = generate_param(index_name, base_quantization_type, dim);
ScopedIndex scoped_index(index_name, index_parameter, num_vectors, dim, allocator);

@Roxanne0321 Roxanne0321 force-pushed the fix/sindi-auto-heap-insert branch from 6e4f0a7 to 4fe0c89 Compare June 5, 2026 07:15
Signed-off-by: Roxanne0321 <liruoxvan020321@qq.com>
Assisted-by: Codex:GPT-5
Copilot AI review requested due to automatic review settings June 5, 2026 08:40
@Roxanne0321 Roxanne0321 force-pushed the fix/sindi-auto-heap-insert branch from 4fe0c89 to fdbf5ab Compare June 5, 2026 08:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Comment on lines 150 to 153
} else {
n_candidate = DEFAULT_N_CANDIDATE;
}

if (json[INDEX_SINDI].Contains(SPARSE_USE_TERM_LISTS_HEAP_INSERT)) {
use_term_lists_heap_insert = json[INDEX_SINDI][SPARSE_USE_TERM_LISTS_HEAP_INSERT].GetBool();
} else {
use_term_lists_heap_insert = true;
}
}
Comment on lines +452 to 460
bool
SINDI::UseTermListsHeapInsert(const SINDISearchParameter& search_param) const {
auto is_special_prune_ratio = [](float ratio) {
constexpr float epsilon = 1e-6F;
return std::abs(ratio) < epsilon || std::abs(ratio - 0.1F) < epsilon;
};
return not(is_special_prune_ratio(doc_prune_ratio_) &&
is_special_prune_ratio(search_param.query_prune_ratio));
}
Comment on lines 138 to +145
search_impl(const SparseTermComputerPtr& computer,
const InnerSearchParam& inner_param,
Allocator* allocator,
bool use_term_lists_heap_insert,
bool use_term_list_ordered_insert,
const SparseVector* original_query = nullptr) const;

bool
UseTermListsHeapInsert(const SINDISearchParameter& search_param) const;
Comment thread tests/test_sindi.cpp
Comment on lines +268 to +269
auto search_param_with_prune_ratio =
fixtures::SINDITestIndex::GenerateSearchParameter(GENERATE(0.0F, 0.2F));
Comment thread docs/sindi.md
Comment on lines +180 to +181
SINDI chooses the heap insertion strategy automatically from `doc_prune_ratio`
and `query_prune_ratio`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) module/docs module/testing size/L version/1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants