Fix race condition, memory management, debug output, and hashtable lookup in sorting.cpp (#5078)#5078
Closed
alibeklfc wants to merge 1 commit intofacebookresearch:mainfrom
Closed
Fix race condition, memory management, debug output, and hashtable lookup in sorting.cpp (#5078)#5078alibeklfc wants to merge 1 commit intofacebookresearch:mainfrom
alibeklfc wants to merge 1 commit intofacebookresearch:mainfrom
Conversation
Contributor
|
@alibeklfc has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100317917. |
…okup in sorting.cpp (facebookresearch#5078) Summary: Four fixes in `faiss/utils/sorting.cpp`: **1. OpenMP directive fix in `fvec_argsort_parallel`** The initialization loop used `#pragma omp parallel` without the `for` directive. This caused every thread to execute the entire loop independently rather than distributing iterations. With `nt` threads, each `permA[i]` was written by all `nt` threads concurrently — a data race under the C++ memory model (multiple unsynchronized writes to the same non-atomic location), and O(n * nt) wasted work instead of O(n). Fixed by changing to `#pragma omp parallel for`. In practice, all threads write the same value (`permA[i] = i`), so the output was always correct despite the UB. The fix eliminates the undefined behavior and the redundant work. **2. RAII memory management in `fvec_argsort_parallel`** Replaced `new size_t[n]` / `delete[] perm2` with `std::vector<size_t>`. The old code had no realistic exception path between allocation and deallocation (all intermediate code is either C functions or non-throwing OpenMP regions), but the manual `new`/`delete` pattern is fragile against future edits that might introduce a throwing path. The `std::vector` provides RAII lifetime management with no behavioral change. **3. Removed debug `printf` in `fvec_argsort_parallel`** A leftover `printf("merge %d %d, %d threads\n", ...)` in the parallel merge loop wrote to stdout during normal operation. Removed. **4. Missing early termination in `hashtable_int64_to_int64_lookup`** The linear probing loop did not check for empty slots (`tab[slot * 2] == -1`). In an open-addressing hash table with no deletion support, an empty slot is definitive proof that the key was not inserted — the insert function would have placed it there or earlier. Without this check, lookups for absent keys probed every slot in the bucket before the wrap-around termination at `slot == hk_i`. The fix adds the standard empty-slot check, matching the structure of the insert function (`hashtable_int64_to_int64_add`). This is a performance optimization — the old code always returned the correct result (`-1` after a full bucket scan), just slower. Differential Revision: D100317917
afca08b to
12bae86
Compare
Contributor
|
This pull request has been merged in aa3ce37. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Four fixes in
faiss/utils/sorting.cpp:1. OpenMP directive fix in
fvec_argsort_parallelThe initialization loop used
#pragma omp parallelwithout thefordirective. This caused every thread to execute the entire loop independently rather than distributing iterations. Withntthreads, eachpermA[i]was written by allntthreads concurrently — a data race under the C++ memory model (multiple unsynchronized writes to the same non-atomic location), and O(n * nt) wasted work instead of O(n). Fixed by changing to#pragma omp parallel for.In practice, all threads write the same value (
permA[i] = i), so the output was always correct despite the UB. The fix eliminates the undefined behavior and the redundant work.2. RAII memory management in
fvec_argsort_parallelReplaced
new size_t[n]/delete[] perm2withstd::vector<size_t>. The old code had no realistic exception path between allocation and deallocation (all intermediate code is either C functions or non-throwing OpenMP regions), but the manualnew/deletepattern is fragile against future edits that might introduce a throwing path. Thestd::vectorprovides RAII lifetime management with no behavioral change.3. Removed debug
printfinfvec_argsort_parallelA leftover
printf("merge %d %d, %d threads\n", ...)in the parallel merge loop wrote to stdout during normal operation. Removed.4. Missing early termination in
hashtable_int64_to_int64_lookupThe linear probing loop did not check for empty slots (
tab[slot * 2] == -1). In an open-addressing hash table with no deletion support, an empty slot is definitive proof that the key was not inserted — the insert function would have placed it there or earlier. Without this check, lookups for absent keys probed every slot in the bucket before the wrap-around termination atslot == hk_i. The fix adds the standard empty-slot check, matching the structure of the insert function (hashtable_int64_to_int64_add). This is a performance optimization — the old code always returned the correct result (-1after a full bucket scan), just slower.Differential Revision: D100317917