Skip to content

[bug] Fix severe performance bug caused by usage of float#94

Merged
joaquintides merged 5 commits into
boostorg:developfrom
kraldan:fix_float_bug
Apr 24, 2026
Merged

[bug] Fix severe performance bug caused by usage of float#94
joaquintides merged 5 commits into
boostorg:developfrom
kraldan:fix_float_bug

Conversation

@kraldan
Copy link
Copy Markdown
Contributor

@kraldan kraldan commented Apr 11, 2026

Intro

There's a bug when using hashed_index with large amounts of elements.
It leads to the index being rehashed on multiple subsequent inserts at specific points
after about 50M elements, which ruins the amortized constant complexity of insert.
I discovered this when using boost::flyweight with hashed_factory, and unfortunately,
this rendered it unusuable (since we had 400M+ unique elements to insert).

The bug:

Internal functions in hashed_index.hpp use type float when calculating bucket count.
From some point, float's mantissa is not wide enough to represent number of elements,
and starts rounding to the nearest representable value, which is a problem here:

  void reserve_for_insert(size_type n)
  {
    if(n>max_load){
      size_type bc =(std::numeric_limits<size_type>::max)();
      float     fbc=1.0f+static_cast<float>(n)/mlf;
      if(bc>fbc)bc =static_cast<size_type>(fbc);
      unchecked_rehash(bc);
    }
  }

The example program below "observes" when rehashes happen by measuring insert time:

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/hashed_index.hpp>

#include <iostream>
#include <chrono>
#include <format>

using namespace boost::multi_index;

using time_unit = std::chrono::milliseconds;

void note_time(time_unit time, size_t container_size)
{
    static unsigned long max = 0;
    const double tol = 0.25;        // we assume that if an insert takes at least 25 % of the known maximum time,
                                    // then it was due to rehashing
    const unsigned long ms = std::chrono::duration_cast<std::chrono::milliseconds>(time).count();
    if (ms > max)
    {
        max = ms;
    }
    // don't print results below 1ms
    if (ms >= 1 && ms >= max * tol)
    {
        std::cout << std::format("container size: {}, insert time: {} ms\n", container_size, ms);
    }
}

int main()
{
    using container_t = multi_index_container<
        int,
        indexed_by<
            hashed_unique<identity<int>>
        >
    >;

    container_t c;

    for (int i = 0; i < 405'000'000; ++i) {
        using clock = std::chrono::steady_clock;
        auto t0 = clock::now();
        c.insert(i);
        auto t1 = clock::now();
        auto duration = t1 - t0;
        note_time(std::chrono::duration_cast<time_unit>(duration), c.size());
    }
}

output when I run it with hashed_index.cpp as it is:

container size: 49158, insert time: 1 ms
container size: 98318, insert time: 3 ms
container size: 196614, insert time: 8 ms
container size: 393242, insert time: 9 ms
container size: 786434, insert time: 18 ms
container size: 1572870, insert time: 36 ms
container size: 3145740, insert time: 77 ms
container size: 6291470, insert time: 150 ms
container size: 12582918, insert time: 292 ms
container size: 25165845, insert time: 587 ms
container size: 50331653, insert time: 1020 ms
container size: 50331654, insert time: 1178 ms
container size: 100663321, insert time: 2364 ms
container size: 201326609, insert time: 4037 ms
container size: 201326610, insert time: 4049 ms
container size: 201326611, insert time: 4046 ms
container size: 201326612, insert time: 4041 ms
container size: 201326613, insert time: 5151 ms
container size: 201326614, insert time: 4505 ms
container size: 201326615, insert time: 4121 ms
container size: 201326616, insert time: 4736 ms

container size: 402653185, insert time: 10695 ms
container size: 402653186, insert time: 10467 ms
container size: 402653187, insert time: 9885 ms
container size: 402653188, insert time: 9453 ms
container size: 402653189, insert time: 9682 ms
container size: 402653190, insert time: 9782 ms
container size: 402653191, insert time: 8572 ms
container size: 402653192, insert time: 8470 ms
container size: 402653193, insert time: 9243 ms
container size: 402653194, insert time: 8909 ms
container size: 402653195, insert time: 8564 ms
container size: 402653196, insert time: 8632 ms
container size: 402653197, insert time: 9015 ms
container size: 402653198, insert time: 11286 ms
container size: 402653199, insert time: 8990 ms
container size: 402653200, insert time: 8911 ms
container size: 402653201, insert time: 15674 ms

as we can see, the container rehashes on multiple subsequent events.

output when I run it with the fix (replacing all floats with doubles):

container size: 49158, insert time: 1 ms
container size: 98318, insert time: 2 ms
container size: 196614, insert time: 5 ms
container size: 393242, insert time: 10 ms
container size: 786434, insert time: 18 ms
container size: 1572870, insert time: 36 ms
container size: 3145740, insert time: 73 ms
container size: 6291470, insert time: 146 ms
container size: 12582918, insert time: 293 ms
container size: 25165844, insert time: 626 ms
container size: 50331654, insert time: 1167 ms
container size: 100663320, insert time: 2314 ms
container size: 201326612, insert time: 4636 ms
container size: 402653190, insert time: 11913 ms

@joaquintides
Copy link
Copy Markdown
Member

Hi, working on it, expect some more concrete feedback shortly.

@joaquintides
Copy link
Copy Markdown
Member

joaquintides commented Apr 20, 2026

Thank you for spotting this bug. Given the huge allocations going on, investigating the problem directly on the source code is inconvenient, so this is a synthetic case extracting the relevant code:

(https://godbolt.org/z/x4MdnTjW1)

#include <boost/core/lightweight_test.hpp>
#include <boost/multi_index/detail/bucket_array.hpp>

using size_type=std::size_t;

size_type bucket_count=0;
float     mlf=1.0;
size_type max_load=0;

struct dummy_bucket_array:boost::multi_index::detail::bucket_array_base<>
{
  using super=boost::multi_index::detail::bucket_array_base<>;
  using super::size_index;
  using super::sizes;
};

void calculate_max_load()
{
  float fml=mlf*static_cast<float>(bucket_count);
  max_load=(std::numeric_limits<size_type>::max)();
  if(max_load>fml)max_load=static_cast<size_type>(fml);
}

void reserve_for_insert(size_type n)
{
  bool rehashed = false;
  if(n>max_load){
    size_type bc =(std::numeric_limits<size_type>::max)();
    float     fbc=1.0f+static_cast<float>(n)/mlf;
    if(bc>fbc)bc =static_cast<size_type>(fbc);
    std::size_t size_index=dummy_bucket_array::size_index(bc);
    if(dummy_bucket_array::sizes[size_index]>bucket_count)rehashed=true;
    bucket_count=dummy_bucket_array::sizes[size_index];
    calculate_max_load();
  }
}

int main()
{
  for(std::size_t n:{50331653ul,201326609ul,402653185ul}){
    reserve_for_insert(n);
    auto bc0=bucket_count;
    reserve_for_insert(max_load+1);
    BOOST_TEST_GT(bucket_count,bc0);
  }
  return boost::report_errors();
}

And it indeed fails at the selected values for n. The reason is that, due to rounding errors, reserve_for_insert(max_load+1) does not increase bucket_count and max_load stays the same.

This is a fixed version (https://godbolt.org/z/8jq47GqK5):

#include <boost/core/lightweight_test.hpp>
#include <boost/multi_index/detail/bucket_array.hpp>

using size_type=std::size_t;

size_type bucket_count=0;
float     mlf=1.0;
size_type max_load=0;

struct dummy_bucket_array:boost::multi_index::detail::bucket_array_base<>
{
  using super=boost::multi_index::detail::bucket_array_base<>;
  using super::size_index;
  using super::sizes;
};

void calculate_max_load()
{
  float fml=mlf*static_cast<float>(bucket_count);
  max_load=(std::numeric_limits<size_type>::max)();
  if(max_load>fml)max_load=static_cast<size_type>(fml);
}

void reserve_for_insert(size_type n)
{
  bool rehashed = false;
  if(n>max_load){
    size_type bc =(std::numeric_limits<size_type>::max)();
    float     fbc=1.0f+static_cast<float>(n)/mlf;
    if(bc>fbc)bc =(std::max)(static_cast<size_type>(fbc),bucket_count+1); //ONLY LINE CHANGED
    std::size_t size_index=dummy_bucket_array::size_index(bc);
    if(dummy_bucket_array::sizes[size_index]>bucket_count)rehashed=true;
    bucket_count=dummy_bucket_array::sizes[size_index];
    calculate_max_load();
  }
}

int main()
{
  for(std::size_t n:{50331653ul,201326609ul,402653185ul}){
    reserve_for_insert(n);
    auto bc0=bucket_count;
    reserve_for_insert(max_load+1);
    BOOST_TEST_GT(bucket_count,bc0);
  }
  return boost::report_errors();
}

This is a one-line change. Doing the calculations with doubles is not my prefered solution because rounding errors could persist at greater values of n and because the C++ standard mandates that the load factor be a float.

Could you please undo your changes in the PR, replace

if(bc>fbc)bc =static_cast<size_type>(fbc);

with

    if(bc>fbc)bc =(std::max)(static_cast<size_type>(fbc),bucket_count()+1);

and rerun your tests? Looking fwd to your feedback. Thank you!

@kraldan
Copy link
Copy Markdown
Contributor Author

kraldan commented Apr 23, 2026

Hi, I implemented it, and still works just fine!

output of my program with the new fix:

container size: 49158, insert time: 1 ms
container size: 98318, insert time: 5 ms
container size: 196614, insert time: 5 ms
container size: 393242, insert time: 13 ms
container size: 786434, insert time: 22 ms
container size: 1572870, insert time: 38 ms
container size: 3145740, insert time: 81 ms
container size: 6291470, insert time: 149 ms
container size: 12582918, insert time: 302 ms
container size: 25165845, insert time: 600 ms
container size: 50331653, insert time: 1226 ms
container size: 100663321, insert time: 2409 ms
container size: 201326609, insert time: 4998 ms
container size: 402653185, insert time: 11808 ms

@joaquintides joaquintides merged commit ac4971a into boostorg:develop Apr 24, 2026
36 checks passed
@joaquintides
Copy link
Copy Markdown
Member

I've slightly modified your PR and merged it. This change will ship in Boost 1.92 (Aug 2026). Thanks for your contribution!

May I ask, which project/product do you work on that is handling 400M+ elements? That makes for a sweet poster for Boost.Flyweight and Boost.MultiIndex.

joaquintides added a commit that referenced this pull request Apr 24, 2026
@kraldan
Copy link
Copy Markdown
Contributor Author

kraldan commented Apr 28, 2026

Hi, sure! In Recombee, we use boost::flyweight in a C++ backend service that's one of the backbones of our real-time recommendations. Feel free to use that information.

Just one caveat regarding the 400M+ unique elements: we do really have that many elements in the running service, but in the production-tested code, they are not stored inside a single hashed_factory. Rather, they are split into 8 factories each having 50M+ elements. The reason for this is that when I first started using boost::flyweight and encountered the bug from this PR, I thought I didn't have the experience/skill necessary to navigate the Boost codebase to find and fix the bug directly. So instead, I "dodged" it by splitting the load into 8 factories and thus avoiding the problematic rehashes at 200M and 400M elements.

Now that I know where the bug was, we might start using a single factory. But we also might not, as the current solution works, and you know the drill: "If it ain't broke, don't fix it."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants