Skip to content

Enable multi-threaded execution for ScalarFunction#1140

Open
tanakakc wants to merge 3 commits intosuketa:mainfrom
tanakakc:feature/scalar-function-multithread
Open

Enable multi-threaded execution for ScalarFunction#1140
tanakakc wants to merge 3 commits intosuketa:mainfrom
tanakakc:feature/scalar-function-multithread

Conversation

@tanakakc
Copy link

@tanakakc tanakakc commented Mar 13, 2026

Summary

  • Implement executor thread pattern for ScalarFunction to enable multi-threaded execution without SET threads=1
  • Release GVL during duckdb_query() in connection.c to prevent deadlock with the executor thread
  • Remove SET threads=1 requirement from scalar function registration and all related tests

Approach

DuckDB worker threads (non-Ruby native threads) cannot call rb_funcall() directly because they don't hold the GVL. This PR implements a 3-way dispatch in scalar_function_callback:

  1. Ruby thread with GVL → direct execution
  2. Ruby thread without GVL → reacquire GVL via rb_thread_call_with_gvl()
  3. Non-Ruby thread (DuckDB worker) → dispatch to a global executor thread via pthread_mutex/pthread_cond

This pattern is inspired by FFI gem's async callback dispatcher. The global executor thread is a Ruby thread created via rb_thread_create() that waits for callback requests and executes them with GVL protection.

Changes

File Description
ext/duckdb/scalar_function.c 3-way dispatch + global executor thread implementation
ext/duckdb/connection.c Release GVL during duckdb_query() via rb_thread_call_without_gvl
lib/duckdb/connection.rb Remove check_threads from register_scalar_function
lib/duckdb/scalar_function.rb Remove threads=1 documentation note
test/duckdb_test/scalar_function_test.rb Remove SET threads=1, add multi-thread test with 1M rows
test/duckdb_test/connection_test.rb Remove SET threads=1 from scalar function tests
test/duckdb_test/gc_stress_test.rb Keep SET threads=1 only for table function tests

Test plan

  • All 706 tests pass (0 failures, 0 errors, 5 skips)
  • Verified Case 3 (executor thread dispatch) is exercised with 1M row dataset via debug logging
  • GC stress tests pass with both scalar and table functions
  • Docker build and test (docker compose run --rm ubuntu bash -c "rake build && rake test")

Closes #1135

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Scalar functions can run safely under multi-threaded execution.
  • Improvements

    • Query execution no longer blocks the main runtime, improving concurrency and responsiveness.
    • Callback handling for scalar functions extended to support non-main/worker threads.
  • Tests

    • Added a multithreaded scalar function test and updated related tests to allow multi-threaded runs.
  • Documentation

    • Removed guidance recommending forcing single-threaded execution for scalar functions.

ScalarFunction previously required `SET threads=1` because DuckDB worker
threads called Ruby callbacks without holding the GVL. This implements an
executor thread pattern (inspired by FFI's async callback dispatcher) with
3-way dispatch:

- Case 1: Ruby thread with GVL - direct execution
- Case 2: Ruby thread without GVL - reacquire via rb_thread_call_with_gvl
- Case 3: Non-Ruby thread (DuckDB worker) - dispatch to global executor
  thread via pthread mutex/condvar

Also releases GVL during duckdb_query() in connection.c to prevent deadlock
with the executor thread.

Closes suketa#1135

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

📝 Walkthrough

Walkthrough

Adds a GVL-free query execution path and a thread-safe dispatch for Ruby-backed scalar-function callbacks, removing the prior requirement to set DuckDB threads=1; corresponding docs and tests updated for multi-threaded scalar-function support.

Changes

Cohort / File(s) Summary
Connection GVL-free query path
ext/duckdb/connection.c
Adds a nogvl worker and args struct; extracts SQL while holding GVL, calls duckdb_query without GVL via rb_thread_call_without_gvl, protects the Ruby String from GC, and surfaces DuckDB errors after the nogvl call.
Scalar function threading & executor
ext/duckdb/scalar_function.c
Implements global executor thread, request queue, mutex/cond synchronization, thread-detection (GVL/native), and three dispatch paths: direct (Ruby+GVL), rb_thread_call_with_gvl (Ruby without GVL), and enqueue-to-executor (non-Ruby threads). Adds startup/shutdown and error reporting back to DuckDB.
Ruby API docs/validation changes
lib/duckdb/connection.rb, lib/duckdb/scalar_function.rb
Removes runtime thread-count validation in register_scalar_function and deletes documentation advising SET threads=1.
Tests updated and new multithread test
test/duckdb_test/connection_test.rb, test/duckdb_test/gc_stress_test.rb, test/duckdb_test/scalar_function_test.rb
Removes global SET threads=1 in many tests, adds per-test SET threads=1 where table functions require it, and adds test_scalar_function_with_multithread validating scalar functions under multi-threaded execution.

Sequence Diagram

sequenceDiagram
    participant RubyThread as Ruby Thread
    participant DuckDBWorker as DuckDB Worker
    participant Executor as Global Executor
    participant RubyVM as Ruby VM

    alt Ruby thread holds GVL
        RubyThread->>RubyVM: run callback directly
        RubyVM-->>RubyThread: return value
        RubyThread-->>DuckDBWorker: deliver result
    else Ruby thread without GVL
        DuckDBWorker->>RubyThread: request callback
        RubyThread->>RubyVM: rb_thread_call_with_gvl (acquire GVL)
        RubyVM-->>RubyThread: return value
        RubyThread-->>DuckDBWorker: deliver result
    else Non-Ruby worker thread
        DuckDBWorker->>Executor: enqueue callback_request
        DuckDBWorker->>DuckDBWorker: wait on condvar for completion
        Executor->>RubyVM: acquire GVL, run callback
        RubyVM-->>Executor: return value (or exception)
        Executor->>DuckDBWorker: signal completion with result/error
        DuckDBWorker->>DuckDBWorker: resume with result
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • suketa

Poem

🐰 I nudged the GVL and hopped away,
Executors hum while callbacks play.
Threads now dance where once was one,
Tiny paws applaud the job well done. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main objective of the PR: enabling multi-threaded execution for ScalarFunction, which is the primary technical change across all modified files.
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from #1135: removes SET threads=1 requirement for scalar functions, implements a 3-way thread dispatch mechanism to safely handle callbacks from DuckDB worker threads without GVL, and verifies correctness with multi-threaded tests.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the objective to enable multi-threaded scalar function execution. Changes include the executor thread implementation, GVL management, test updates removing thread restrictions, and documentation cleanup—all in scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Remove extra blank lines and unnecessary Metrics/AbcSize disable directive.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (2)
ext/duckdb/scalar_function.c (1)

72-223: Prefix newly added executor symbols with rbduckdb_.

New globals/helpers (g_executor_*, executor_*, dispatch_callback_to_executor, callback_with_gvl) don’t follow the project’s C symbol prefix rule.

As per coding guidelines, ext/duckdb/**/*.c: C symbols must be prefixed with rbduckdb_ to avoid namespace conflicts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ext/duckdb/scalar_function.c` around lines 72 - 223, The new executor globals
and functions must be renamed to use the project prefix; rename all occurrences
of g_executor_mutex, g_executor_cond, g_request_list, g_executor_thread,
g_executor_started to rbduckdb_g_executor_mutex, rbduckdb_g_executor_cond,
rbduckdb_g_request_list, rbduckdb_g_executor_thread, rbduckdb_g_executor_started
and rename struct executor_wait_data, executor_wait_func, executor_stop_func,
execute_callback_protected, executor_thread_func, ensure_executor_started,
dispatch_callback_to_executor, and callback_with_gvl to
rbduckdb_executor_wait_data (or rbduckdb_executor_wait_* for struct),
rbduckdb_executor_wait_func, rbduckdb_executor_stop_func,
rbduckdb_execute_callback_protected, rbduckdb_executor_thread_func,
rbduckdb_ensure_executor_started, rbduckdb_dispatch_callback_to_executor, and
rbduckdb_callback_with_gvl respectively; update every reference in this file to
the new names (including usages inside rb_thread_create,
rb_thread_call_without_gvl, pthread calls, and any forward declarations) and
keep the same static linkage and behavior. Ensure mutex/cond initializers and
rb_global_variable use the new symbol names so no other code changes are
required.
ext/duckdb/connection.c (1)

150-166: Rename new C symbols to follow extension namespacing rules.

query_nogvl_args / duckdb_query_nogvl should be prefixed with rbduckdb_ to avoid symbol collisions and match project conventions.

As per coding guidelines, ext/duckdb/**/*.c: C symbols must be prefixed with rbduckdb_ to avoid namespace conflicts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ext/duckdb/connection.c` around lines 150 - 166, The new C symbols violate
extension namespacing: rename struct query_nogvl_args to
rbduckdb_query_nogvl_args and function duckdb_query_nogvl to
rbduckdb_query_nogvl, and update all uses/declares accordingly (e.g., cast
sites, initializations, and any thread/GVL call sites that reference these
symbols) so the struct and function are consistently prefixed with rbduckdb_ to
follow ext/duckdb naming conventions and avoid collisions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ext/duckdb/connection.c`:
- Around line 182-195: The code currently passes a Ruby string buffer pointer
(from StringValueCStr into sql) into rb_thread_call_without_gvl which can lead
to the Ruby string being mutated concurrently; fix by making a C-owned copy of
the SQL bytes before releasing the GVL: allocate a buffer (e.g., via
xmalloc/strdup) sized from RSTRING_LEN(str), copy the contents, set args.sql to
that copied buffer, call rb_thread_call_without_gvl(duckdb_query_nogvl, &args,
...), then free the copied buffer after the call; keep RB_GC_GUARD(str) as
before and ensure retval/out_result usage in struct query_nogvl_args remains
unchanged.
- Around line 185-193: The code passes args.con (ctx->con) into
duckdb_query_nogvl and then drops the GVL via rb_thread_call_without_gvl,
allowing another Ruby thread to call disconnect and invalidate the handle; fix
by acquiring a connection lifetime guard (e.g. increment a reference count or
take the connection mutex) before filling struct query_nogvl_args.args.con and
calling rb_thread_call_without_gvl(duckdb_query_nogvl,...), and release that
guard immediately after the call returns; also ensure disconnect checks/releases
the same guard so it waits or defers actual teardown until the guard is
released.

In `@ext/duckdb/scalar_function.c`:
- Around line 349-356: The current Case 1 path calls
execute_callback_protected((VALUE)&arg) directly and can let Ruby exceptions
escape; change Case 1 to use the same protected/error-conversion path as Cases
2/3 by invoking rb_protect (or the existing execute_callback_protected wrapper
that uses rb_protect) so any exception is captured and then translated to
duckdb_scalar_function_set_error; locate the branch using ruby_native_thread_p()
/ ruby_thread_has_gvl_p() and replace the direct call to
execute_callback_protected with the protected invocation (keeping the
rb_thread_call_with_gvl(callback_with_gvl, &arg) behavior for Case 2) so all
cases funnel through the exception-to-duckdb_scalar_function_set_error
conversion.
- Around line 8-9: Add a configure-time feature check for ruby_thread_has_gvl_p:
update extconf.rb to call have_func("ruby_thread_has_gvl_p") and then wrap the
extern declaration and any uses of ruby_thread_has_gvl_p in scalar_function.c
with `#ifdef` HAVE_RUBY_THREAD_HAS_GVL_P / `#endif` so compilation/linking won't
fail on Ruby ≤3.2; leave the extern and usage of ruby_native_thread_p unchanged.

In `@test/duckdb_test/scalar_function_test.rb`:
- Around line 122-125: Remove the redundant RuboCop disable for Metrics/AbcSize
in the method definition for test_scalar_function_with_two_parameters: edit the
line that currently reads "def test_scalar_function_with_two_parameters #
rubocop:disable Metrics/MethodLength, Metrics/AbcSize" to only disable
Metrics/MethodLength (i.e., drop ", Metrics/AbcSize") so the
Lint/RedundantCopDisableDirective is resolved while keeping the necessary
Metrics/MethodLength exemption.
- Line 106: There are leftover unnecessary blank lines in
test/duckdb_test/scalar_function_test.rb (they originated when per-test "SET
threads=1" lines were removed) causing Layout/EmptyLines linter failures; remove
the empty lines at the specified locations (lines around 106, 125, 145, 164,
183, 202, 221, 240, 259, 281, 303, 325, 351, 374, 397, 421, 444, 467, 490-491,
521-522, 553, 583-584, 604-605, 625-626, 669-670, 686-687, 703-704) so there are
no stray blank-only lines between tests or at file end, and run the linter to
confirm Layout/EmptyLines is satisfied.
- Around line 722-739: The test test_scalar_function_with_multithread currently
doesn't force a multi-threaded executor path; before creating the table and
registering the scalar function, explicitly enable multi-threading on the
connection by setting threads > 1 (e.g. call `@con.execute`('PRAGMA threads=2') or
use the DB config API) so the callback dispatch will run the multi-threaded case
when `@con.execute`('CREATE TABLE large_test ...') and
`@con.register_scalar_function`(sf) are exercised; keep the existing
Gem.win_platform? skip and the rest of the test unchanged.

---

Nitpick comments:
In `@ext/duckdb/connection.c`:
- Around line 150-166: The new C symbols violate extension namespacing: rename
struct query_nogvl_args to rbduckdb_query_nogvl_args and function
duckdb_query_nogvl to rbduckdb_query_nogvl, and update all uses/declares
accordingly (e.g., cast sites, initializations, and any thread/GVL call sites
that reference these symbols) so the struct and function are consistently
prefixed with rbduckdb_ to follow ext/duckdb naming conventions and avoid
collisions.

In `@ext/duckdb/scalar_function.c`:
- Around line 72-223: The new executor globals and functions must be renamed to
use the project prefix; rename all occurrences of g_executor_mutex,
g_executor_cond, g_request_list, g_executor_thread, g_executor_started to
rbduckdb_g_executor_mutex, rbduckdb_g_executor_cond, rbduckdb_g_request_list,
rbduckdb_g_executor_thread, rbduckdb_g_executor_started and rename struct
executor_wait_data, executor_wait_func, executor_stop_func,
execute_callback_protected, executor_thread_func, ensure_executor_started,
dispatch_callback_to_executor, and callback_with_gvl to
rbduckdb_executor_wait_data (or rbduckdb_executor_wait_* for struct),
rbduckdb_executor_wait_func, rbduckdb_executor_stop_func,
rbduckdb_execute_callback_protected, rbduckdb_executor_thread_func,
rbduckdb_ensure_executor_started, rbduckdb_dispatch_callback_to_executor, and
rbduckdb_callback_with_gvl respectively; update every reference in this file to
the new names (including usages inside rb_thread_create,
rb_thread_call_without_gvl, pthread calls, and any forward declarations) and
keep the same static linkage and behavior. Ensure mutex/cond initializers and
rb_global_variable use the new symbol names so no other code changes are
required.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3920366a-3df8-4f5d-a450-672cff5c1a78

📥 Commits

Reviewing files that changed from the base of the PR and between 429d306 and 52975ec.

📒 Files selected for processing (7)
  • ext/duckdb/connection.c
  • ext/duckdb/scalar_function.c
  • lib/duckdb/connection.rb
  • lib/duckdb/scalar_function.rb
  • test/duckdb_test/connection_test.rb
  • test/duckdb_test/gc_stress_test.rb
  • test/duckdb_test/scalar_function_test.rb
💤 Files with no reviewable changes (3)
  • lib/duckdb/connection.rb
  • lib/duckdb/scalar_function.rb
  • test/duckdb_test/connection_test.rb

Comment on lines +182 to +195
/* Extract C string before releasing GVL (StringValueCStr is a Ruby operation) */
const char *sql = StringValueCStr(str);

struct query_nogvl_args args = {
.con = ctx->con,
.sql = sql,
.out_result = &(ctxr->result),
.retval = DuckDBError,
};

rb_thread_call_without_gvl(duckdb_query_nogvl, &args, RUBY_UBF_IO, 0);

RB_GC_GUARD(str);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Copy SQL into C-owned memory before releasing the GVL.

Line 183 stores a Ruby String buffer pointer and then releases the GVL; another Ruby thread can mutate that String while duckdb_query is reading it.

💡 Suggested fix
-    const char *sql = StringValueCStr(str);
+    VALUE sql_str = rb_obj_as_string(str);
+    long sql_len = RSTRING_LEN(sql_str);
+    char *sql_buf = ALLOC_N(char, (size_t)sql_len + 1);
+    memcpy(sql_buf, RSTRING_PTR(sql_str), (size_t)sql_len);
+    sql_buf[sql_len] = '\0';

     struct query_nogvl_args args = {
         .con = ctx->con,
-        .sql = sql,
+        .sql = sql_buf,
         .out_result = &(ctxr->result),
         .retval = DuckDBError,
     };

     rb_thread_call_without_gvl(duckdb_query_nogvl, &args, RUBY_UBF_IO, 0);
-
-    RB_GC_GUARD(str);
+    xfree(sql_buf);
+    RB_GC_GUARD(sql_str);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ext/duckdb/connection.c` around lines 182 - 195, The code currently passes a
Ruby string buffer pointer (from StringValueCStr into sql) into
rb_thread_call_without_gvl which can lead to the Ruby string being mutated
concurrently; fix by making a C-owned copy of the SQL bytes before releasing the
GVL: allocate a buffer (e.g., via xmalloc/strdup) sized from RSTRING_LEN(str),
copy the contents, set args.sql to that copied buffer, call
rb_thread_call_without_gvl(duckdb_query_nogvl, &args, ...), then free the copied
buffer after the call; keep RB_GC_GUARD(str) as before and ensure
retval/out_result usage in struct query_nogvl_args remains unchanged.

Comment on lines +185 to +193
struct query_nogvl_args args = {
.con = ctx->con,
.sql = sql,
.out_result = &(ctxr->result),
.retval = DuckDBError,
};

rb_thread_call_without_gvl(duckdb_query_nogvl, &args, RUBY_UBF_IO, 0);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard connection lifetime while query runs without GVL.

args.con is used outside the GVL with no in-flight query guard. Another Ruby thread can call disconnect concurrently and invalidate the same handle during duckdb_query.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ext/duckdb/connection.c` around lines 185 - 193, The code passes args.con
(ctx->con) into duckdb_query_nogvl and then drops the GVL via
rb_thread_call_without_gvl, allowing another Ruby thread to call disconnect and
invalidate the handle; fix by acquiring a connection lifetime guard (e.g.
increment a reference count or take the connection mutex) before filling struct
query_nogvl_args.args.con and calling
rb_thread_call_without_gvl(duckdb_query_nogvl,...), and release that guard
immediately after the call returns; also ensure disconnect checks/releases the
same guard so it waits or defers actual teardown until the guard is released.

Comment on lines +8 to +9
extern int ruby_thread_has_gvl_p(void);
extern int ruby_native_thread_p(void);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Are ruby_thread_has_gvl_p(void)andruby_native_thread_p(void) public/exported Ruby C APIs across Ruby 2.7, 3.0, 3.1, 3.2, and 3.3? Which headers should be included, and are feature checks/macros required before use?

💡 Result:

ruby_thread_has_gvl_p(void)

  • Ruby 2.7, 3.0, 3.1, 3.2, 3.3: Not a public/installed C API (i.e., not declared by the public headers you’re supposed to include). It exists in the implementation, but its declaration lived in an internal header (internal/thread.h). (docs.ruby-lang.org)
  • Header: none public in those versions; do not rely on internal headers (internal/thread.h is not part of the supported extension API). (docs.ruby-lang.org)
  • Feature checks/macros: if you choose to use it anyway, you must treat it as optional:
    • Use mkmf have_func("ruby_thread_has_gvl_p") and compile-time-guard on the generated HAVE_RUBY_THREAD_HAS_GVL_P (or whatever macro your extconf emits).
    • Otherwise you risk build failures (missing prototype / missing symbol depending on platform/build). (bugs.ruby-lang.org)
  • Status change: Ruby core discussion/changes to expose it as public happened after 3.3 (Feature #20877 / #20750 discussion), so you cannot assume it’s public in 2.7–3.3. (bugs.ruby-lang.org)

ruby_native_thread_p(void)

  • Ruby 2.7, 3.0, 3.1, 3.2, 3.3: Public/installed API (declared from the main public header set; documented as “Queries if the thread which calls this function is a ruby’s thread”). (docs.ruby-lang.org)
  • Header to include: #include <ruby.h> is sufficient (it brings in the public Ruby C API surface). (docs.ruby-lang.org)
  • Feature checks/macros:
    • Generally no extra feature check is required if you support Ruby 2.7+.
    • If you want to be extra defensive/portable across nonstandard builds, you can still have_func("ruby_native_thread_p") and guard on HAVE_RUBY_NATIVE_THREAD_P. (docs.ruby-lang.org)

Citations:


🏁 Script executed:

find . -name "extconf.rb" -type f

Repository: suketa/ruby-duckdb

Length of output: 85


🏁 Script executed:

cat -n ./ext/duckdb/extconf.rb

Repository: suketa/ruby-duckdb

Length of output: 2065


🏁 Script executed:

cat -n ./ext/duckdb/scalar_function.c | head -50

Repository: suketa/ruby-duckdb

Length of output: 2325


🏁 Script executed:

grep -n "ruby_thread_has_gvl_p\|ruby_native_thread_p" ./ext/duckdb/scalar_function.c

Repository: suketa/ruby-duckdb

Length of output: 225


🏁 Script executed:

sed -n '345,365p' ./ext/duckdb/scalar_function.c

Repository: suketa/ruby-duckdb

Length of output: 795


Add feature checks for ruby_thread_has_gvl_p in extconf.rb.

ruby_thread_has_gvl_p() is not a public C API in Ruby 3.2 (the minimum supported version); it only became public after Ruby 3.3. The current extern declaration will cause compilation or linking failures. Add have_func("ruby_thread_has_gvl_p") in extconf.rb and guard its use with a compile-time check (e.g., #ifdef HAVE_RUBY_THREAD_HAS_GVL_P). ruby_native_thread_p() is public and requires no changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ext/duckdb/scalar_function.c` around lines 8 - 9, Add a configure-time
feature check for ruby_thread_has_gvl_p: update extconf.rb to call
have_func("ruby_thread_has_gvl_p") and then wrap the extern declaration and any
uses of ruby_thread_has_gvl_p in scalar_function.c with `#ifdef`
HAVE_RUBY_THREAD_HAS_GVL_P / `#endif` so compilation/linking won't fail on Ruby
≤3.2; leave the extern and usage of ruby_native_thread_p unchanged.

Comment on lines +349 to +356
if (ruby_native_thread_p()) {
if (ruby_thread_has_gvl_p()) {
/* Case 1: Ruby thread with GVL - call directly */
execute_callback_protected((VALUE)&arg);
} else {
/* Case 2: Ruby thread without GVL - reacquire GVL */
rb_thread_call_with_gvl(callback_with_gvl, &arg);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the same protected error path for Case 1 callbacks.

Case 1 (Line 352) bypasses rb_protect, while Cases 2/3 capture exceptions and convert them to duckdb_scalar_function_set_error. This inconsistency can propagate Ruby exceptions through DuckDB callback frames.

💡 Suggested fix
     if (ruby_native_thread_p()) {
         if (ruby_thread_has_gvl_p()) {
-            /* Case 1: Ruby thread with GVL - call directly */
-            execute_callback_protected((VALUE)&arg);
+            /* Case 1: Ruby thread with GVL - protected execution */
+            callback_with_gvl(&arg);
         } else {
             /* Case 2: Ruby thread without GVL - reacquire GVL */
             rb_thread_call_with_gvl(callback_with_gvl, &arg);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ext/duckdb/scalar_function.c` around lines 349 - 356, The current Case 1 path
calls execute_callback_protected((VALUE)&arg) directly and can let Ruby
exceptions escape; change Case 1 to use the same protected/error-conversion path
as Cases 2/3 by invoking rb_protect (or the existing execute_callback_protected
wrapper that uses rb_protect) so any exception is captured and then translated
to duckdb_scalar_function_set_error; locate the branch using
ruby_native_thread_p() / ruby_thread_has_gvl_p() and replace the direct call to
execute_callback_protected with the protected invocation (keeping the
rb_thread_call_with_gvl(callback_with_gvl, &arg) behavior for Case 2) so all
cases funnel through the exception-to-duckdb_scalar_function_set_error
conversion.

Comment on lines +122 to +125
def test_scalar_function_with_two_parameters # rubocop:disable Metrics/MethodLength, Metrics/AbcSize
skip 'Scalar functions with Ruby test' if Gem.win_platform?

@con.execute('SET threads=1')

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove the now-redundant Metrics/AbcSize disable.

Line 122 now triggers Lint/RedundantCopDisableDirective, so the file stays red until this annotation is dropped.

🧹 Proposed fix
-    def test_scalar_function_with_two_parameters # rubocop:disable Metrics/MethodLength, Metrics/AbcSize
+    def test_scalar_function_with_two_parameters # rubocop:disable Metrics/MethodLength
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_scalar_function_with_two_parameters # rubocop:disable Metrics/MethodLength, Metrics/AbcSize
skip 'Scalar functions with Ruby test' if Gem.win_platform?
@con.execute('SET threads=1')
def test_scalar_function_with_two_parameters # rubocop:disable Metrics/MethodLength
skip 'Scalar functions with Ruby test' if Gem.win_platform?
🧰 Tools
🪛 GitHub Actions: Linter

[error] 122-122: Lint/RedundantCopDisableDirective: Unnecessary disabling of Metrics/AbcSize.


[error] 125-125: Layout/EmptyLines: Extra blank line detected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/duckdb_test/scalar_function_test.rb` around lines 122 - 125, Remove the
redundant RuboCop disable for Metrics/AbcSize in the method definition for
test_scalar_function_with_two_parameters: edit the line that currently reads
"def test_scalar_function_with_two_parameters # rubocop:disable
Metrics/MethodLength, Metrics/AbcSize" to only disable Metrics/MethodLength
(i.e., drop ", Metrics/AbcSize") so the Lint/RedundantCopDisableDirective is
resolved while keeping the necessary Metrics/MethodLength exemption.

Comment on lines +722 to +739
def test_scalar_function_with_multithread
skip 'Scalar functions with Ruby test' if Gem.win_platform?

# Use range() to generate enough rows to trigger DuckDB's multi-threaded execution
@con.execute('CREATE TABLE large_test AS SELECT range::INTEGER AS value FROM range(1000000)')

sf = DuckDB::ScalarFunction.new
sf.name = 'triple'
sf.add_parameter(DuckDB::LogicalType::INTEGER)
sf.return_type = DuckDB::LogicalType::BIGINT
sf.set_function { |v| v * 3 }

@con.register_scalar_function(sf)
result = @con.execute('SELECT SUM(triple(value)) FROM large_test')

# sum(0..999999) * 3 = 499999500000 * 3 = 1499998500000
assert_equal 1_499_998_500_000, result.first.first
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Show the test body, any thread-setting statements in this file, and the callback dispatch logic.
rg -n -A18 -B2 'def test_scalar_function_with_multithread' test/duckdb_test/scalar_function_test.rb
printf '\n--- any SET threads in scalar_function_test.rb ---\n'
rg -n 'SET threads' test/duckdb_test/scalar_function_test.rb
printf '\n--- callback dispatch in ext/duckdb/scalar_function.c ---\n'
rg -n -A20 -B5 'scalar_function_callback|ruby_native_thread_p|dispatch_callback_to_executor' ext/duckdb/scalar_function.c

Repository: suketa/ruby-duckdb

Length of output: 7052


Explicitly set threads > 1 in this multithread test to exercise the executor dispatch path.

The test currently does not set threads. Without this, DuckDB may execute callbacks in the Ruby thread (Cases 1–2 in the callback dispatch logic), bypassing the new executor dispatch (Case 3) introduced in this PR. This means the test can pass on single-threaded configs without actually covering the multi-threaded behavior it claims to test.

Proposed fix
    def test_scalar_function_with_multithread
      skip 'Scalar functions with Ruby test' if Gem.win_platform?

+      `@con.execute`('SET threads=4')
+
      # Use range() to generate enough rows to trigger DuckDB's multi-threaded execution
      `@con.execute`('CREATE TABLE large_test AS SELECT range::INTEGER AS value FROM range(1000000)')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/duckdb_test/scalar_function_test.rb` around lines 722 - 739, The test
test_scalar_function_with_multithread currently doesn't force a multi-threaded
executor path; before creating the table and registering the scalar function,
explicitly enable multi-threading on the connection by setting threads > 1 (e.g.
call `@con.execute`('PRAGMA threads=2') or use the DB config API) so the callback
dispatch will run the multi-threaded case when `@con.execute`('CREATE TABLE
large_test ...') and `@con.register_scalar_function`(sf) are exercised; keep the
existing Gem.win_platform? skip and the rest of the test unchanged.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test/duckdb_test/scalar_function_test.rb (1)

687-704: ⚠️ Potential issue | 🟠 Major

Force threads > 1 to actually exercise the multithread dispatch path.

At Line 690+, this test currently depends on environment defaults. In single-thread configs it can pass without covering the new executor-thread callback route.

💡 Suggested patch
 def test_scalar_function_with_multithread
   skip 'Scalar functions with Ruby test' if Gem.win_platform?
 
+  `@con.execute`('SET threads=4')
+
   # Use range() to generate enough rows to trigger DuckDB's multi-threaded execution
   `@con.execute`('CREATE TABLE large_test AS SELECT range::INTEGER AS value FROM range(1000000)')
#!/bin/bash
# Verify whether thread settings are explicitly controlled in tests/helpers,
# and inspect this test block directly.

rg -n -C2 "SET\\s+threads|PRAGMA\\s+threads|current_setting\\('threads'\\)"
rg -n -A25 -B3 "def test_scalar_function_with_multithread" test/duckdb_test/scalar_function_test.rb
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/duckdb_test/scalar_function_test.rb` around lines 687 - 704, The test
test_scalar_function_with_multithread relies on environment defaults and may not
exercise the multithread path; explicitly set DuckDB threads > 1 at the start of
the test (for example run a PRAGMA/SET like "PRAGMA threads=4" via `@con.execute`)
before creating the large_test table and registering the scalar function, and
optionally restore the previous thread setting at the end of the test to avoid
affecting other tests; update the test body around the `@con.execute` calls
(test_scalar_function_with_multithread, creation of large_test, and registration
of sf) to enforce the multi-thread configuration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/duckdb_test/scalar_function_test.rb`:
- Around line 687-704: The test test_scalar_function_with_multithread relies on
environment defaults and may not exercise the multithread path; explicitly set
DuckDB threads > 1 at the start of the test (for example run a PRAGMA/SET like
"PRAGMA threads=4" via `@con.execute`) before creating the large_test table and
registering the scalar function, and optionally restore the previous thread
setting at the end of the test to avoid affecting other tests; update the test
body around the `@con.execute` calls (test_scalar_function_with_multithread,
creation of large_test, and registration of sf) to enforce the multi-thread
configuration.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ebcb8809-717d-4fb1-bb6e-fac60b547a10

📥 Commits

Reviewing files that changed from the base of the PR and between 52975ec and d060f36.

📒 Files selected for processing (1)
  • test/duckdb_test/scalar_function_test.rb

MSVC does not provide <pthread.h>. Use CRITICAL_SECTION and
CONDITION_VARIABLE (Windows API) on MSVC, keeping pthread on
POSIX and MinGW-w64 (mingw, ucrt) which provide winpthreads.

Conditional on _MSC_VER to minimize the scope of changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
ext/duckdb/scalar_function.c (2)

21-22: ⚠️ Potential issue | 🟠 Major

Guard ruby_thread_has_gvl_p behind configure-time feature detection.

ruby_thread_has_gvl_p() is not guaranteed on all supported Ruby builds; unconditional declaration/usage can break compile or link. Please gate declaration and call sites with HAVE_RUBY_THREAD_HAS_GVL_P generated from have_func("ruby_thread_has_gvl_p") in extconf.rb.

#!/bin/bash
# Verify feature detection + guards for ruby_thread_has_gvl_p
rg -n --type=ruby 'have_func\("ruby_thread_has_gvl_p"\)' -C2
rg -n --type=c 'HAVE_RUBY_THREAD_HAS_GVL_P|ruby_thread_has_gvl_p\(' -C2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ext/duckdb/scalar_function.c` around lines 21 - 22, The declaration and any
uses of ruby_thread_has_gvl_p must be guarded by a configure-time macro; update
extconf.rb to call have_func("ruby_thread_has_gvl_p") so it defines
HAVE_RUBY_THREAD_HAS_GVL_P, then wrap the extern declaration and all call sites
of ruby_thread_has_gvl_p in `#ifdef` HAVE_RUBY_THREAD_HAS_GVL_P / `#endif` blocks
(or provide a safe fallback when the macro is not defined). Specifically,
protect the extern int ruby_thread_has_gvl_p(void); line and any places that
call ruby_thread_has_gvl_p() so compilation/linking won’t fail on Rubies that
lack that symbol.

429-433: ⚠️ Potential issue | 🟠 Major

Route Case 1 through the same protected error path as Case 2/3.

Case 1 directly calls execute_callback_protected, so Ruby exceptions can bypass duckdb_scalar_function_set_error conversion. Use the same protected wrapper path for consistency.

💡 Suggested fix
     if (ruby_native_thread_p()) {
         if (ruby_thread_has_gvl_p()) {
-            /* Case 1: Ruby thread with GVL - call directly */
-            execute_callback_protected((VALUE)&arg);
+            /* Case 1: Ruby thread with GVL - protected execution */
+            callback_with_gvl(&arg);
         } else {
             /* Case 2: Ruby thread without GVL - reacquire GVL */
             rb_thread_call_with_gvl(callback_with_gvl, &arg);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ext/duckdb/scalar_function.c` around lines 429 - 433, The current Case 1 path
directly calls execute_callback_protected((VALUE)&arg), allowing Ruby exceptions
to bypass duckdb_scalar_function_set_error; change Case 1 to use the same
protected wrapper used by Case 2/3 (i.e., invoke execute_callback_protected via
rb_protect/its existing wrapper and handle the VALUE/exception result the same
way the other branches do), so any Ruby exception is caught and converted by
duckdb_scalar_function_set_error; update the branch around
ruby_native_thread_p() / ruby_thread_has_gvl_p() to call the protected wrapper
and follow the same post-call exception handling logic as the other cases
(reference execute_callback_protected and duckdb_scalar_function_set_error).
🧹 Nitpick comments (1)
ext/duckdb/scalar_function.c (1)

109-303: Align newly added C helper symbols with the rbduckdb_ prefix guideline.

The newly introduced helper symbols (executor_wait_func, executor_thread_func, dispatch_callback_to_executor, callback_with_gvl, callback_request, g_request_list, etc.) are unprefixed. Renaming to rbduckdb_* would keep this file aligned with repository naming rules.

As per coding guidelines: ext/duckdb/**/*.c: C symbols must be prefixed with rbduckdb_ to avoid namespace conflicts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ext/duckdb/scalar_function.c` around lines 109 - 303, The new
executor-related C symbols must be renamed to the rbduckdb_ prefix to follow
project namespace rules: rename functions executor_wait_func,
executor_stop_func, execute_callback_protected, executor_thread_func,
ensure_executor_started, dispatch_callback_to_executor, callback_with_gvl to
rbduckdb_wait_func, rbduckdb_stop_func, rbduckdb_execute_callback_protected,
rbduckdb_thread_func, rbduckdb_ensure_started, rbduckdb_dispatch_callback,
rbduckdb_callback_with_gvl respectively; rename types/vars callback_request,
callback_arg, executor_wait_data, g_request_list, g_executor_thread,
g_executor_started, g_executor_mutex/cond/lock/cond names to use rbduckdb_
prefix and update every use site in this file (including rb_thread_create,
rb_thread_call_without_gvl, rb_protect, rb_errinfo handling, and any
enqueue/dequeue and signaling code) so all declarations and references remain
consistent. Ensure header declarations (if any) are updated to the new names and
that platform-specific code paths (MSC_VER vs pthread) reference the renamed
symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@ext/duckdb/scalar_function.c`:
- Around line 21-22: The declaration and any uses of ruby_thread_has_gvl_p must
be guarded by a configure-time macro; update extconf.rb to call
have_func("ruby_thread_has_gvl_p") so it defines HAVE_RUBY_THREAD_HAS_GVL_P,
then wrap the extern declaration and all call sites of ruby_thread_has_gvl_p in
`#ifdef` HAVE_RUBY_THREAD_HAS_GVL_P / `#endif` blocks (or provide a safe fallback
when the macro is not defined). Specifically, protect the extern int
ruby_thread_has_gvl_p(void); line and any places that call
ruby_thread_has_gvl_p() so compilation/linking won’t fail on Rubies that lack
that symbol.
- Around line 429-433: The current Case 1 path directly calls
execute_callback_protected((VALUE)&arg), allowing Ruby exceptions to bypass
duckdb_scalar_function_set_error; change Case 1 to use the same protected
wrapper used by Case 2/3 (i.e., invoke execute_callback_protected via
rb_protect/its existing wrapper and handle the VALUE/exception result the same
way the other branches do), so any Ruby exception is caught and converted by
duckdb_scalar_function_set_error; update the branch around
ruby_native_thread_p() / ruby_thread_has_gvl_p() to call the protected wrapper
and follow the same post-call exception handling logic as the other cases
(reference execute_callback_protected and duckdb_scalar_function_set_error).

---

Nitpick comments:
In `@ext/duckdb/scalar_function.c`:
- Around line 109-303: The new executor-related C symbols must be renamed to the
rbduckdb_ prefix to follow project namespace rules: rename functions
executor_wait_func, executor_stop_func, execute_callback_protected,
executor_thread_func, ensure_executor_started, dispatch_callback_to_executor,
callback_with_gvl to rbduckdb_wait_func, rbduckdb_stop_func,
rbduckdb_execute_callback_protected, rbduckdb_thread_func,
rbduckdb_ensure_started, rbduckdb_dispatch_callback, rbduckdb_callback_with_gvl
respectively; rename types/vars callback_request, callback_arg,
executor_wait_data, g_request_list, g_executor_thread, g_executor_started,
g_executor_mutex/cond/lock/cond names to use rbduckdb_ prefix and update every
use site in this file (including rb_thread_create, rb_thread_call_without_gvl,
rb_protect, rb_errinfo handling, and any enqueue/dequeue and signaling code) so
all declarations and references remain consistent. Ensure header declarations
(if any) are updated to the new names and that platform-specific code paths
(MSC_VER vs pthread) reference the renamed symbols.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 371e8706-18bc-4e6b-a19e-1d4c3d67c5cb

📥 Commits

Reviewing files that changed from the base of the PR and between d060f36 and 37d3d26.

📒 Files selected for processing (1)
  • ext/duckdb/scalar_function.c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

make DuckDB::ScalarFunction work in multi thread duckdb.

1 participant