Skip to content

Possible reference leak of the argument tuple in FunctionCall() #534

Description

@K-ANOY

What happens?

FunctionCall() passes a newly created tuple directly to PyObject_CallObject():

File: src/map.cpp

Function: FunctionCall

auto *df_obj = PyObject_CallObject(function, PyTuple_Pack(1, in_df.ptr()));

PyTuple_Pack() returns a new reference, while PyObject_CallObject() does
not steal its args reference. Because the tuple is not stored in a local
variable, it is never passed to Py_DECREF().

As a result, every invocation leaks one tuple. The tuple also owns a reference
to in_df, so the input pandas DataFrame remains alive after FunctionCall()
returns. This occurs on both successful and failed calls.

The function is used during bind-time schema inference and query execution, so
the leak is reachable through ordinary DuckDBPyRelation.map() operations.

The handling of df_obj is unrelated and correct:

auto df = py::reinterpret_steal<py::object>(df_obj);

PyObject_CallObject() returns a new reference on success, which
reinterpret_steal() adopts.

To Reproduce

This issue can be confirmed directly from the reference ownership in
src/map.cpp.

In FunctionCall(), the argument tuple is created inline:

auto *df_obj = PyObject_CallObject(function, PyTuple_Pack(1, in_df.ptr()));

According to the CPython C API reference ownership rules:

  1. PyTuple_Pack() returns a new reference.
  2. PyObject_CallObject() does not steal the reference passed as args.
  3. The tuple pointer is not stored, so there is no subsequent
    Py_DECREF() for that new reference.
  4. The tuple therefore leaks on every call and retains its reference to
    in_df.

This issue is specific to the Python API and is not reproducible through plain
SQL in the DuckDB CLI.

OS:

x86_64

DuckDB Package Version:

latest version

Python Version:

3.12

Full Name:

Ksx

Affiliation:

SMU

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have not tested with any build

Did you include all relevant data sets for reproducing the issue?

No - Other reason (please specify in the issue body)

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions