Skip to content

Using unlikely markers for PR386#387

Open
lemire wants to merge 3 commits into
mainfrom
pr386
Open

Using unlikely markers for PR386#387
lemire wants to merge 3 commits into
mainfrom
pr386

Conversation

@lemire
Copy link
Copy Markdown
Member

@lemire lemire commented Jun 6, 2026

PR #386 improves performance when parsing short strings.

However, it does so by marking functions as cold.

I think that a better option is to make branches as unlikely.

This is what this PR does.

fcostaoliveira and others added 3 commits June 3, 2026 09:30
parsed_number_string_t carries two span<UC const> members (integer, fraction)
that are only read on the rare slow paths (digit_comp, and the >19-significant-
digit truncation recompute). Materializing them on every parse forces the ~56/64-
byte struct to be written out and marshaled through the by-value return, which
shows up as backend/store pressure on the hot path.

This adds a runtime `store_spans` flag (default true, so all existing callers are
unchanged) to parse_number_string; from_chars_float_advanced parses with it false,
attempts the Clinger and Eisel-Lemire fast paths inline, and only re-parses with
spans on the two rare slow branches. The re-parse is pushed into a single
`fastfloat_noinline` (noinline+cold) helper so the force-inlined hot scanner is
emitted once rather than duplicated into the caller (without this the extra inline
copies regress some targets, e.g. ARM gcc, by bloating the hot frame and lengthening
the loop-carried dependency chain).

A runtime flag is used deliberately rather than a template parameter: a template
would create a second instantiation of the whole scanner whose icache cost wipes
out the gain.

Measured (per-parser microbench, median of 5, pinned core), fast_float from_chars
<double>/<float>, vs the current tip:
  - Intel Ice Lake (Xeon 8360Y): +17-19% (gcc), Intel TMA shows backend-bound
    26.0% -> 2.2% and retiring 60.3% -> 77.3% on short floats (the eliminated span
    spill), with -36% pipeline slots.
  - Intel Cascade Lake (Xeon 6248): +18-22% (gcc), +13-23% (clang).
  - ARM Neoverse-V2 (Graviton4): +73-196% (gcc), +8-11% (clang) -- the struct spill
    dominated the gcc hot loop there.
Correctness: the full float exhaustive suite (exhaustive32, exhaustive32_64,
exhaustive32_midpoint, random64) passes, and a 2^32 sweep is byte-identical to the
current tip. Public from_chars / from_chars_advanced / parsed_number_string_t are
unchanged.
// under -std=c++17, where using it would trip -Wc++20-extensions/-Werror.
#if (__cplusplus >= 202002L || \
(defined(_MSVC_LANG) && _MSVC_LANG >= 202002L)) && \
defined(__has_cpp_attribute) && __has_cpp_attribute(unlikely) >= 201803L
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If defined(__has_cpp_attribute) is false then __has_cpp_attribute(foo) is ill formed because the undefined macro is replaced with 0 and you get 0(foo) which is an error.

Copy link
Copy Markdown
Contributor

@jwakely jwakely Jun 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid that problem with:

#if (__cplusplus >= 202002L ||                                                 \
     (defined(_MSVC_LANG) && _MSVC_LANG >= 202002L)) &&                        \
    defined(__has_cpp_attribute)
#if __has_cpp_attribute(unlikely) >= 201803L
#define fastfloat_unlikely(x) (x) [[unlikely]]
#endif
#endif

#ifndef fastfloat_unlikely
#if defined(__GNUC__) || defined(__clang__)
#define fastfloat_unlikely(x) (__builtin_expect(!!(x), 0))
#else
#define fastfloat_unlikely(x) (x)
#endif
#endif

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively something like this keeps all the messy conditions in one place, separate from the main if-else logic:

#ifdef __has_cpp_attribute
#if __has_cpp_attribute(unlikely) /* && any extra conditions you want/need */
#define FASTFLOAT_USE_UNLIKELY_ATTR 1
#endif
#endif

#ifdef FASTFLOAT_USE_UNLIKELY_ATTR
#define fastfloat_unlikely(x) (x) [[unlikely]]
#elif defined(__GNUC__) || defined(__clang__)
#define fastfloat_unlikely(x) (__builtin_expect(!!(x), 0))
#else
#define fastfloat_unlikely(x) (x)
#endif

// attribute in C++20 or newer, otherwise to __builtin_expect on GCC/Clang, or
// to a no-op elsewhere (e.g. pre-C++20 MSVC, which has no equivalent hint).
// The [[unlikely]] branch is gated on the language version, not just on
// __has_cpp_attribute: GCC and Clang report the attribute as available even
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know Clang gives that warning, but I don't think GCC does. It's annoying, because it defeats the entire purpose of standardizing things like __has_cpp_attribute and feature test macros, which was to get rid of the fragile preprocessor checks for specific versions of specific compilers 😭

You could put this around the whole of parse_number.h where the attribute is used:

#ifdef __clang__
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wc++20-extensions"
#endif
...
...
#ifdef __clang__
#pragma clang diagnostic pop
#endif

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would allow the attribute to be used if __has_cpp_attribute(unlikely) is true, without the __cplusplus >= 202002L check alongside it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants