Skip to content

gh-143732: Add tier2 specialization for TO_BOOL#148271

Open
eendebakpt wants to merge 10 commits intopython:mainfrom
eendebakpt:to_bool_specialization
Open

gh-143732: Add tier2 specialization for TO_BOOL#148271
eendebakpt wants to merge 10 commits intopython:mainfrom
eendebakpt:to_bool_specialization

Conversation

@eendebakpt
Copy link
Copy Markdown
Contributor

@eendebakpt eendebakpt commented Apr 8, 2026

See discussion at #148113.

This PR adds two tier2 opcodes for specialization of TO_BOOL. The *args and **kwargs` arguments are marked in tier2 as tuple and dict, respectively.

In this PR there is no additional type recording or tier1 opcodes, that is left to followup PRs.

Benchmark main branch
to_bool_dict_false 7.34 ms 7.37 ms: 1.00x slower
to_bool_bytes_true 10.7 ms 10.8 ms: 1.01x slower
to_bool_kwargs_nonempty 1.25 sec 911 ms: 1.37x faster
to_bool_kwargs_empty 949 ms 580 ms: 1.64x faster
to_bool_varargs_nonempty 1.23 sec 888 ms: 1.39x faster
to_bool_varargs_empty 943 ms 575 ms: 1.64x faster
Details
"""Benchmark for TO_BOOL specializations and kwargs type information.

Tests the JIT optimizer's ability to specialize TO_BOOL for:
- dict (truthiness checks)
- **kwargs dict type (known to be dict at the optimizer level, uses _TO_BOOL_DICT)
- *args tuple type (known to be tuple at the optimizer level, uses _TO_BOOL_SIZED)
"""

import pyperf


# --- TO_BOOL from dict ---

def to_bool_dict_true(n):
    d = {"a": 1}
    count = 0
    for _ in range(n):
        if d:
            count += 1
    return count


def to_bool_dict_false(n):
    d = {}
    count = 0
    for _ in range(n):
        if d:
            count += 1
    return count


# --- TO_BOOL from bytes (no tier1 specialization, uses generic _TO_BOOL) ---

def to_bool_bytes_true(n):
    b = b"hello"
    count = 0
    for _ in range(n):
        if b:
            count += 1
    return count


def to_bool_bytes_false(n):
    b = b""
    count = 0
    for _ in range(n):
        if b:
            count += 1
    return count


# --- TO_BOOL with **kwargs ---

def kwargs_to_bool_inner(**kwargs):
    """kwargs is guaranteed to be a dict by CPython."""
    count = 0
    for _ in range(200):
        if kwargs:
            count += 1
    return count


def to_bool_kwargs_nonempty(n):
    for _ in range(n):
        kwargs_to_bool_inner(x=1, y=2)


def to_bool_kwargs_empty(n):
    for _ in range(n):
        kwargs_to_bool_inner()


# --- TO_BOOL with *args (tuple, uses _TO_BOOL_SIZED) ---

def varargs_to_bool_inner(*args):
    """args is guaranteed to be a tuple by CPython."""
    count = 0
    for _ in range(200):
        if args:
            count += 1
    return count


def to_bool_varargs_nonempty(n):
    for _ in range(n):
        varargs_to_bool_inner(1, 2, 3)


def to_bool_varargs_empty(n):
    for _ in range(n):
        varargs_to_bool_inner()


# --- kwargs type used in dict operations ---

def kwargs_dict_ops_inner(**kwargs):
    """Test that kwargs is known to be dict for various operations."""
    total = 0
    for _ in range(200):
        total += len(kwargs)
        if "key" in kwargs:
            total += 1
    return total


def kwargs_dict_ops(n):
    for _ in range(n):
        kwargs_dict_ops_inner(key=42, other=99)


N = 500_000

runner = pyperf.Runner()

runner.bench_func("to_bool_dict_true", to_bool_dict_true, N)
runner.bench_func("to_bool_dict_false", to_bool_dict_false, N)
runner.bench_func("to_bool_bytes_true", to_bool_bytes_true, N)
runner.bench_func("to_bool_bytes_false", to_bool_bytes_false, N)
runner.bench_func("to_bool_kwargs_nonempty", to_bool_kwargs_nonempty, N)
runner.bench_func("to_bool_kwargs_empty", to_bool_kwargs_empty, N)
runner.bench_func("to_bool_varargs_nonempty", to_bool_varargs_nonempty, N)
runner.bench_func("to_bool_varargs_empty", to_bool_varargs_empty, N)
runner.bench_func("kwargs_dict_ops", kwargs_dict_ops, N)

@eendebakpt eendebakpt marked this pull request as ready for review April 8, 2026 22:21
_REPLACE_WITH_TRUE +
POP_TOP;

tier2 op(_TO_BOOL_DICT, (value -- res)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can merge this with _TO_BOOL_SIZED by using the fact that both do a fixed offset lookup.
In tier2 optimizer you can set the offset for where the size is stored and do size = (Py_ssize_t)((char *)obj + offset) and check that directly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. It goes into the internals of PyDict (e.g. not using PyDict_GET_SIZE, but doing manual offset calculations) and we also need to store the offset somewhere. So I think this is too much of a complication to get rid of a tier2 opcode.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to store the offset somewhere.

You can store it in the instruction operand0.

REPLACE_OP(this_instr, _TO_BOOL_DICT, 0, 0);
}
else if (tp == &PyTuple_Type ||
tp == &PySet_Type ||
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect for set as it does not uses PyObject_VAR_HEAD, this works by accident because it has fill at that offset which is incorrect if set has dummy entries.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I updated the PR to handle the set/frozenset separately.

We can also use your suggestion to fold everything into the _TO_BOOL_SIZED. That means we have to load the offset at runtime (minor cost), but it does keep the number of ops lower. I implemented this in main...eendebakpt:to_bool_specialization_v2.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means we have to load the offset at runtime (minor cost), but it does keep the number of ops lower.

I don't think so, in the JIT the offset would be burned into the machine code itself so the offset is fixed and not looked up at runtime.

Copy link
Copy Markdown
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with recording uops not being allowed after specializing uops has been fixed, so you can add a recording uop to _TO_BOOL and use the recorded information for better specialization.
#148285

}
}

op(_TO_BOOL_DICT, (value -- res)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_TO_BOOL_DICT gets inserted by this pass, so this code will never be executed.
Same for _TO_BOOL_SIZED and _TO_BOOL_ANY_SET below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants