[bug] Fix severe performance bug caused by usage of float#94
Conversation
|
Hi, working on it, expect some more concrete feedback shortly. |
|
Thank you for spotting this bug. Given the huge allocations going on, investigating the problem directly on the source code is inconvenient, so this is a synthetic case extracting the relevant code: (https://godbolt.org/z/x4MdnTjW1) #include <boost/core/lightweight_test.hpp>
#include <boost/multi_index/detail/bucket_array.hpp>
using size_type=std::size_t;
size_type bucket_count=0;
float mlf=1.0;
size_type max_load=0;
struct dummy_bucket_array:boost::multi_index::detail::bucket_array_base<>
{
using super=boost::multi_index::detail::bucket_array_base<>;
using super::size_index;
using super::sizes;
};
void calculate_max_load()
{
float fml=mlf*static_cast<float>(bucket_count);
max_load=(std::numeric_limits<size_type>::max)();
if(max_load>fml)max_load=static_cast<size_type>(fml);
}
void reserve_for_insert(size_type n)
{
bool rehashed = false;
if(n>max_load){
size_type bc =(std::numeric_limits<size_type>::max)();
float fbc=1.0f+static_cast<float>(n)/mlf;
if(bc>fbc)bc =static_cast<size_type>(fbc);
std::size_t size_index=dummy_bucket_array::size_index(bc);
if(dummy_bucket_array::sizes[size_index]>bucket_count)rehashed=true;
bucket_count=dummy_bucket_array::sizes[size_index];
calculate_max_load();
}
}
int main()
{
for(std::size_t n:{50331653ul,201326609ul,402653185ul}){
reserve_for_insert(n);
auto bc0=bucket_count;
reserve_for_insert(max_load+1);
BOOST_TEST_GT(bucket_count,bc0);
}
return boost::report_errors();
}And it indeed fails at the selected values for This is a fixed version (https://godbolt.org/z/8jq47GqK5): #include <boost/core/lightweight_test.hpp>
#include <boost/multi_index/detail/bucket_array.hpp>
using size_type=std::size_t;
size_type bucket_count=0;
float mlf=1.0;
size_type max_load=0;
struct dummy_bucket_array:boost::multi_index::detail::bucket_array_base<>
{
using super=boost::multi_index::detail::bucket_array_base<>;
using super::size_index;
using super::sizes;
};
void calculate_max_load()
{
float fml=mlf*static_cast<float>(bucket_count);
max_load=(std::numeric_limits<size_type>::max)();
if(max_load>fml)max_load=static_cast<size_type>(fml);
}
void reserve_for_insert(size_type n)
{
bool rehashed = false;
if(n>max_load){
size_type bc =(std::numeric_limits<size_type>::max)();
float fbc=1.0f+static_cast<float>(n)/mlf;
if(bc>fbc)bc =(std::max)(static_cast<size_type>(fbc),bucket_count+1); //ONLY LINE CHANGED
std::size_t size_index=dummy_bucket_array::size_index(bc);
if(dummy_bucket_array::sizes[size_index]>bucket_count)rehashed=true;
bucket_count=dummy_bucket_array::sizes[size_index];
calculate_max_load();
}
}
int main()
{
for(std::size_t n:{50331653ul,201326609ul,402653185ul}){
reserve_for_insert(n);
auto bc0=bucket_count;
reserve_for_insert(max_load+1);
BOOST_TEST_GT(bucket_count,bc0);
}
return boost::report_errors();
}This is a one-line change. Doing the calculations with Could you please undo your changes in the PR, replace with if(bc>fbc)bc =(std::max)(static_cast<size_type>(fbc),bucket_count()+1);and rerun your tests? Looking fwd to your feedback. Thank you! |
|
Hi, I implemented it, and still works just fine! output of my program with the new fix: |
|
I've slightly modified your PR and merged it. This change will ship in Boost 1.92 (Aug 2026). Thanks for your contribution! May I ask, which project/product do you work on that is handling 400M+ elements? That makes for a sweet poster for Boost.Flyweight and Boost.MultiIndex. |
|
Hi, sure! In Recombee, we use Just one caveat regarding the 400M+ unique elements: we do really have that many elements in the running service, but in the production-tested code, they are not stored inside a single Now that I know where the bug was, we might start using a single factory. But we also might not, as the current solution works, and you know the drill: "If it ain't broke, don't fix it." |
Intro
There's a bug when using hashed_index with large amounts of elements.
It leads to the index being rehashed on multiple subsequent inserts at specific points
after about 50M elements, which ruins the amortized constant complexity of insert.
I discovered this when using
boost::flyweightwithhashed_factory, and unfortunately,this rendered it unusuable (since we had 400M+ unique elements to insert).
The bug:
Internal functions in
hashed_index.hppuse type float when calculating bucket count.From some point, float's mantissa is not wide enough to represent number of elements,
and starts rounding to the nearest representable value, which is a problem here:
The example program below "observes" when rehashes happen by measuring insert time:
output when I run it with hashed_index.cpp as it is:
container size: 49158, insert time: 1 ms
container size: 98318, insert time: 3 ms
container size: 196614, insert time: 8 ms
container size: 393242, insert time: 9 ms
container size: 786434, insert time: 18 ms
container size: 1572870, insert time: 36 ms
container size: 3145740, insert time: 77 ms
container size: 6291470, insert time: 150 ms
container size: 12582918, insert time: 292 ms
container size: 25165845, insert time: 587 ms
container size: 50331653, insert time: 1020 ms
container size: 50331654, insert time: 1178 ms
container size: 100663321, insert time: 2364 ms
container size: 201326609, insert time: 4037 ms
container size: 201326610, insert time: 4049 ms
container size: 201326611, insert time: 4046 ms
container size: 201326612, insert time: 4041 ms
container size: 201326613, insert time: 5151 ms
container size: 201326614, insert time: 4505 ms
container size: 201326615, insert time: 4121 ms
container size: 201326616, insert time: 4736 ms
container size: 402653185, insert time: 10695 ms
container size: 402653186, insert time: 10467 ms
container size: 402653187, insert time: 9885 ms
container size: 402653188, insert time: 9453 ms
container size: 402653189, insert time: 9682 ms
container size: 402653190, insert time: 9782 ms
container size: 402653191, insert time: 8572 ms
container size: 402653192, insert time: 8470 ms
container size: 402653193, insert time: 9243 ms
container size: 402653194, insert time: 8909 ms
container size: 402653195, insert time: 8564 ms
container size: 402653196, insert time: 8632 ms
container size: 402653197, insert time: 9015 ms
container size: 402653198, insert time: 11286 ms
container size: 402653199, insert time: 8990 ms
container size: 402653200, insert time: 8911 ms
container size: 402653201, insert time: 15674 ms
as we can see, the container rehashes on multiple subsequent events.
output when I run it with the fix (replacing all floats with doubles):