Enable multi-threaded execution for ScalarFunction#1140
Enable multi-threaded execution for ScalarFunction#1140tanakakc wants to merge 3 commits intosuketa:mainfrom
Conversation
ScalarFunction previously required `SET threads=1` because DuckDB worker threads called Ruby callbacks without holding the GVL. This implements an executor thread pattern (inspired by FFI's async callback dispatcher) with 3-way dispatch: - Case 1: Ruby thread with GVL - direct execution - Case 2: Ruby thread without GVL - reacquire via rb_thread_call_with_gvl - Case 3: Non-Ruby thread (DuckDB worker) - dispatch to global executor thread via pthread mutex/condvar Also releases GVL during duckdb_query() in connection.c to prevent deadlock with the executor thread. Closes suketa#1135 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds a GVL-free query execution path and a thread-safe dispatch for Ruby-backed scalar-function callbacks, removing the prior requirement to set DuckDB Changes
Sequence DiagramsequenceDiagram
participant RubyThread as Ruby Thread
participant DuckDBWorker as DuckDB Worker
participant Executor as Global Executor
participant RubyVM as Ruby VM
alt Ruby thread holds GVL
RubyThread->>RubyVM: run callback directly
RubyVM-->>RubyThread: return value
RubyThread-->>DuckDBWorker: deliver result
else Ruby thread without GVL
DuckDBWorker->>RubyThread: request callback
RubyThread->>RubyVM: rb_thread_call_with_gvl (acquire GVL)
RubyVM-->>RubyThread: return value
RubyThread-->>DuckDBWorker: deliver result
else Non-Ruby worker thread
DuckDBWorker->>Executor: enqueue callback_request
DuckDBWorker->>DuckDBWorker: wait on condvar for completion
Executor->>RubyVM: acquire GVL, run callback
RubyVM-->>Executor: return value (or exception)
Executor->>DuckDBWorker: signal completion with result/error
DuckDBWorker->>DuckDBWorker: resume with result
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Remove extra blank lines and unnecessary Metrics/AbcSize disable directive. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (2)
ext/duckdb/scalar_function.c (1)
72-223: Prefix newly added executor symbols withrbduckdb_.New globals/helpers (
g_executor_*,executor_*,dispatch_callback_to_executor,callback_with_gvl) don’t follow the project’s C symbol prefix rule.As per coding guidelines,
ext/duckdb/**/*.c: C symbols must be prefixed withrbduckdb_to avoid namespace conflicts.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ext/duckdb/scalar_function.c` around lines 72 - 223, The new executor globals and functions must be renamed to use the project prefix; rename all occurrences of g_executor_mutex, g_executor_cond, g_request_list, g_executor_thread, g_executor_started to rbduckdb_g_executor_mutex, rbduckdb_g_executor_cond, rbduckdb_g_request_list, rbduckdb_g_executor_thread, rbduckdb_g_executor_started and rename struct executor_wait_data, executor_wait_func, executor_stop_func, execute_callback_protected, executor_thread_func, ensure_executor_started, dispatch_callback_to_executor, and callback_with_gvl to rbduckdb_executor_wait_data (or rbduckdb_executor_wait_* for struct), rbduckdb_executor_wait_func, rbduckdb_executor_stop_func, rbduckdb_execute_callback_protected, rbduckdb_executor_thread_func, rbduckdb_ensure_executor_started, rbduckdb_dispatch_callback_to_executor, and rbduckdb_callback_with_gvl respectively; update every reference in this file to the new names (including usages inside rb_thread_create, rb_thread_call_without_gvl, pthread calls, and any forward declarations) and keep the same static linkage and behavior. Ensure mutex/cond initializers and rb_global_variable use the new symbol names so no other code changes are required.ext/duckdb/connection.c (1)
150-166: Rename new C symbols to follow extension namespacing rules.
query_nogvl_args/duckdb_query_nogvlshould be prefixed withrbduckdb_to avoid symbol collisions and match project conventions.As per coding guidelines,
ext/duckdb/**/*.c: C symbols must be prefixed withrbduckdb_to avoid namespace conflicts.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ext/duckdb/connection.c` around lines 150 - 166, The new C symbols violate extension namespacing: rename struct query_nogvl_args to rbduckdb_query_nogvl_args and function duckdb_query_nogvl to rbduckdb_query_nogvl, and update all uses/declares accordingly (e.g., cast sites, initializations, and any thread/GVL call sites that reference these symbols) so the struct and function are consistently prefixed with rbduckdb_ to follow ext/duckdb naming conventions and avoid collisions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ext/duckdb/connection.c`:
- Around line 182-195: The code currently passes a Ruby string buffer pointer
(from StringValueCStr into sql) into rb_thread_call_without_gvl which can lead
to the Ruby string being mutated concurrently; fix by making a C-owned copy of
the SQL bytes before releasing the GVL: allocate a buffer (e.g., via
xmalloc/strdup) sized from RSTRING_LEN(str), copy the contents, set args.sql to
that copied buffer, call rb_thread_call_without_gvl(duckdb_query_nogvl, &args,
...), then free the copied buffer after the call; keep RB_GC_GUARD(str) as
before and ensure retval/out_result usage in struct query_nogvl_args remains
unchanged.
- Around line 185-193: The code passes args.con (ctx->con) into
duckdb_query_nogvl and then drops the GVL via rb_thread_call_without_gvl,
allowing another Ruby thread to call disconnect and invalidate the handle; fix
by acquiring a connection lifetime guard (e.g. increment a reference count or
take the connection mutex) before filling struct query_nogvl_args.args.con and
calling rb_thread_call_without_gvl(duckdb_query_nogvl,...), and release that
guard immediately after the call returns; also ensure disconnect checks/releases
the same guard so it waits or defers actual teardown until the guard is
released.
In `@ext/duckdb/scalar_function.c`:
- Around line 349-356: The current Case 1 path calls
execute_callback_protected((VALUE)&arg) directly and can let Ruby exceptions
escape; change Case 1 to use the same protected/error-conversion path as Cases
2/3 by invoking rb_protect (or the existing execute_callback_protected wrapper
that uses rb_protect) so any exception is captured and then translated to
duckdb_scalar_function_set_error; locate the branch using ruby_native_thread_p()
/ ruby_thread_has_gvl_p() and replace the direct call to
execute_callback_protected with the protected invocation (keeping the
rb_thread_call_with_gvl(callback_with_gvl, &arg) behavior for Case 2) so all
cases funnel through the exception-to-duckdb_scalar_function_set_error
conversion.
- Around line 8-9: Add a configure-time feature check for ruby_thread_has_gvl_p:
update extconf.rb to call have_func("ruby_thread_has_gvl_p") and then wrap the
extern declaration and any uses of ruby_thread_has_gvl_p in scalar_function.c
with `#ifdef` HAVE_RUBY_THREAD_HAS_GVL_P / `#endif` so compilation/linking won't
fail on Ruby ≤3.2; leave the extern and usage of ruby_native_thread_p unchanged.
In `@test/duckdb_test/scalar_function_test.rb`:
- Around line 122-125: Remove the redundant RuboCop disable for Metrics/AbcSize
in the method definition for test_scalar_function_with_two_parameters: edit the
line that currently reads "def test_scalar_function_with_two_parameters #
rubocop:disable Metrics/MethodLength, Metrics/AbcSize" to only disable
Metrics/MethodLength (i.e., drop ", Metrics/AbcSize") so the
Lint/RedundantCopDisableDirective is resolved while keeping the necessary
Metrics/MethodLength exemption.
- Line 106: There are leftover unnecessary blank lines in
test/duckdb_test/scalar_function_test.rb (they originated when per-test "SET
threads=1" lines were removed) causing Layout/EmptyLines linter failures; remove
the empty lines at the specified locations (lines around 106, 125, 145, 164,
183, 202, 221, 240, 259, 281, 303, 325, 351, 374, 397, 421, 444, 467, 490-491,
521-522, 553, 583-584, 604-605, 625-626, 669-670, 686-687, 703-704) so there are
no stray blank-only lines between tests or at file end, and run the linter to
confirm Layout/EmptyLines is satisfied.
- Around line 722-739: The test test_scalar_function_with_multithread currently
doesn't force a multi-threaded executor path; before creating the table and
registering the scalar function, explicitly enable multi-threading on the
connection by setting threads > 1 (e.g. call `@con.execute`('PRAGMA threads=2') or
use the DB config API) so the callback dispatch will run the multi-threaded case
when `@con.execute`('CREATE TABLE large_test ...') and
`@con.register_scalar_function`(sf) are exercised; keep the existing
Gem.win_platform? skip and the rest of the test unchanged.
---
Nitpick comments:
In `@ext/duckdb/connection.c`:
- Around line 150-166: The new C symbols violate extension namespacing: rename
struct query_nogvl_args to rbduckdb_query_nogvl_args and function
duckdb_query_nogvl to rbduckdb_query_nogvl, and update all uses/declares
accordingly (e.g., cast sites, initializations, and any thread/GVL call sites
that reference these symbols) so the struct and function are consistently
prefixed with rbduckdb_ to follow ext/duckdb naming conventions and avoid
collisions.
In `@ext/duckdb/scalar_function.c`:
- Around line 72-223: The new executor globals and functions must be renamed to
use the project prefix; rename all occurrences of g_executor_mutex,
g_executor_cond, g_request_list, g_executor_thread, g_executor_started to
rbduckdb_g_executor_mutex, rbduckdb_g_executor_cond, rbduckdb_g_request_list,
rbduckdb_g_executor_thread, rbduckdb_g_executor_started and rename struct
executor_wait_data, executor_wait_func, executor_stop_func,
execute_callback_protected, executor_thread_func, ensure_executor_started,
dispatch_callback_to_executor, and callback_with_gvl to
rbduckdb_executor_wait_data (or rbduckdb_executor_wait_* for struct),
rbduckdb_executor_wait_func, rbduckdb_executor_stop_func,
rbduckdb_execute_callback_protected, rbduckdb_executor_thread_func,
rbduckdb_ensure_executor_started, rbduckdb_dispatch_callback_to_executor, and
rbduckdb_callback_with_gvl respectively; update every reference in this file to
the new names (including usages inside rb_thread_create,
rb_thread_call_without_gvl, pthread calls, and any forward declarations) and
keep the same static linkage and behavior. Ensure mutex/cond initializers and
rb_global_variable use the new symbol names so no other code changes are
required.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3920366a-3df8-4f5d-a450-672cff5c1a78
📒 Files selected for processing (7)
ext/duckdb/connection.cext/duckdb/scalar_function.clib/duckdb/connection.rblib/duckdb/scalar_function.rbtest/duckdb_test/connection_test.rbtest/duckdb_test/gc_stress_test.rbtest/duckdb_test/scalar_function_test.rb
💤 Files with no reviewable changes (3)
- lib/duckdb/connection.rb
- lib/duckdb/scalar_function.rb
- test/duckdb_test/connection_test.rb
| /* Extract C string before releasing GVL (StringValueCStr is a Ruby operation) */ | ||
| const char *sql = StringValueCStr(str); | ||
|
|
||
| struct query_nogvl_args args = { | ||
| .con = ctx->con, | ||
| .sql = sql, | ||
| .out_result = &(ctxr->result), | ||
| .retval = DuckDBError, | ||
| }; | ||
|
|
||
| rb_thread_call_without_gvl(duckdb_query_nogvl, &args, RUBY_UBF_IO, 0); | ||
|
|
||
| RB_GC_GUARD(str); | ||
|
|
There was a problem hiding this comment.
Copy SQL into C-owned memory before releasing the GVL.
Line 183 stores a Ruby String buffer pointer and then releases the GVL; another Ruby thread can mutate that String while duckdb_query is reading it.
💡 Suggested fix
- const char *sql = StringValueCStr(str);
+ VALUE sql_str = rb_obj_as_string(str);
+ long sql_len = RSTRING_LEN(sql_str);
+ char *sql_buf = ALLOC_N(char, (size_t)sql_len + 1);
+ memcpy(sql_buf, RSTRING_PTR(sql_str), (size_t)sql_len);
+ sql_buf[sql_len] = '\0';
struct query_nogvl_args args = {
.con = ctx->con,
- .sql = sql,
+ .sql = sql_buf,
.out_result = &(ctxr->result),
.retval = DuckDBError,
};
rb_thread_call_without_gvl(duckdb_query_nogvl, &args, RUBY_UBF_IO, 0);
-
- RB_GC_GUARD(str);
+ xfree(sql_buf);
+ RB_GC_GUARD(sql_str);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ext/duckdb/connection.c` around lines 182 - 195, The code currently passes a
Ruby string buffer pointer (from StringValueCStr into sql) into
rb_thread_call_without_gvl which can lead to the Ruby string being mutated
concurrently; fix by making a C-owned copy of the SQL bytes before releasing the
GVL: allocate a buffer (e.g., via xmalloc/strdup) sized from RSTRING_LEN(str),
copy the contents, set args.sql to that copied buffer, call
rb_thread_call_without_gvl(duckdb_query_nogvl, &args, ...), then free the copied
buffer after the call; keep RB_GC_GUARD(str) as before and ensure
retval/out_result usage in struct query_nogvl_args remains unchanged.
| struct query_nogvl_args args = { | ||
| .con = ctx->con, | ||
| .sql = sql, | ||
| .out_result = &(ctxr->result), | ||
| .retval = DuckDBError, | ||
| }; | ||
|
|
||
| rb_thread_call_without_gvl(duckdb_query_nogvl, &args, RUBY_UBF_IO, 0); | ||
|
|
There was a problem hiding this comment.
Guard connection lifetime while query runs without GVL.
args.con is used outside the GVL with no in-flight query guard. Another Ruby thread can call disconnect concurrently and invalidate the same handle during duckdb_query.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ext/duckdb/connection.c` around lines 185 - 193, The code passes args.con
(ctx->con) into duckdb_query_nogvl and then drops the GVL via
rb_thread_call_without_gvl, allowing another Ruby thread to call disconnect and
invalidate the handle; fix by acquiring a connection lifetime guard (e.g.
increment a reference count or take the connection mutex) before filling struct
query_nogvl_args.args.con and calling
rb_thread_call_without_gvl(duckdb_query_nogvl,...), and release that guard
immediately after the call returns; also ensure disconnect checks/releases the
same guard so it waits or defers actual teardown until the guard is released.
| extern int ruby_thread_has_gvl_p(void); | ||
| extern int ruby_native_thread_p(void); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Are ruby_thread_has_gvl_p(void)andruby_native_thread_p(void) public/exported Ruby C APIs across Ruby 2.7, 3.0, 3.1, 3.2, and 3.3? Which headers should be included, and are feature checks/macros required before use?
💡 Result:
ruby_thread_has_gvl_p(void)
- Ruby 2.7, 3.0, 3.1, 3.2, 3.3: Not a public/installed C API (i.e., not declared by the public headers you’re supposed to include). It exists in the implementation, but its declaration lived in an internal header (
internal/thread.h). (docs.ruby-lang.org) - Header: none public in those versions; do not rely on internal headers (
internal/thread.his not part of the supported extension API). (docs.ruby-lang.org) - Feature checks/macros: if you choose to use it anyway, you must treat it as optional:
- Use
mkmfhave_func("ruby_thread_has_gvl_p")and compile-time-guard on the generatedHAVE_RUBY_THREAD_HAS_GVL_P(or whatever macro your extconf emits). - Otherwise you risk build failures (missing prototype / missing symbol depending on platform/build). (bugs.ruby-lang.org)
- Use
- Status change: Ruby core discussion/changes to expose it as public happened after 3.3 (Feature
#20877/#20750discussion), so you cannot assume it’s public in 2.7–3.3. (bugs.ruby-lang.org)
ruby_native_thread_p(void)
- Ruby 2.7, 3.0, 3.1, 3.2, 3.3: Public/installed API (declared from the main public header set; documented as “Queries if the thread which calls this function is a ruby’s thread”). (docs.ruby-lang.org)
- Header to include:
#include <ruby.h>is sufficient (it brings in the public Ruby C API surface). (docs.ruby-lang.org) - Feature checks/macros:
- Generally no extra feature check is required if you support Ruby 2.7+.
- If you want to be extra defensive/portable across nonstandard builds, you can still
have_func("ruby_native_thread_p")and guard onHAVE_RUBY_NATIVE_THREAD_P. (docs.ruby-lang.org)
Citations:
- 1: https://docs.ruby-lang.org/capi/en/master/d8/dfd/internal_2thread_8h.html
- 2: https://docs.ruby-lang.org/capi/en/master/d8/dfd/internal_2thread_8h.html
- 3: https://bugs.ruby-lang.org/issues/20750
- 4: https://bugs.ruby-lang.org/issues/20877
- 5: https://docs.ruby-lang.org/capi/en/master/d5/d88/default_8c_source.html?utm_source=openai
- 6: https://docs.ruby-lang.org/capi/en/master/d5/d88/default_8c_source.html?utm_source=openai
- 7: https://docs.ruby-lang.org/capi/en/master/d5/d88/default_8c_source.html?utm_source=openai
🏁 Script executed:
find . -name "extconf.rb" -type fRepository: suketa/ruby-duckdb
Length of output: 85
🏁 Script executed:
cat -n ./ext/duckdb/extconf.rbRepository: suketa/ruby-duckdb
Length of output: 2065
🏁 Script executed:
cat -n ./ext/duckdb/scalar_function.c | head -50Repository: suketa/ruby-duckdb
Length of output: 2325
🏁 Script executed:
grep -n "ruby_thread_has_gvl_p\|ruby_native_thread_p" ./ext/duckdb/scalar_function.cRepository: suketa/ruby-duckdb
Length of output: 225
🏁 Script executed:
sed -n '345,365p' ./ext/duckdb/scalar_function.cRepository: suketa/ruby-duckdb
Length of output: 795
Add feature checks for ruby_thread_has_gvl_p in extconf.rb.
ruby_thread_has_gvl_p() is not a public C API in Ruby 3.2 (the minimum supported version); it only became public after Ruby 3.3. The current extern declaration will cause compilation or linking failures. Add have_func("ruby_thread_has_gvl_p") in extconf.rb and guard its use with a compile-time check (e.g., #ifdef HAVE_RUBY_THREAD_HAS_GVL_P). ruby_native_thread_p() is public and requires no changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ext/duckdb/scalar_function.c` around lines 8 - 9, Add a configure-time
feature check for ruby_thread_has_gvl_p: update extconf.rb to call
have_func("ruby_thread_has_gvl_p") and then wrap the extern declaration and any
uses of ruby_thread_has_gvl_p in scalar_function.c with `#ifdef`
HAVE_RUBY_THREAD_HAS_GVL_P / `#endif` so compilation/linking won't fail on Ruby
≤3.2; leave the extern and usage of ruby_native_thread_p unchanged.
| if (ruby_native_thread_p()) { | ||
| if (ruby_thread_has_gvl_p()) { | ||
| /* Case 1: Ruby thread with GVL - call directly */ | ||
| execute_callback_protected((VALUE)&arg); | ||
| } else { | ||
| /* Case 2: Ruby thread without GVL - reacquire GVL */ | ||
| rb_thread_call_with_gvl(callback_with_gvl, &arg); | ||
| } |
There was a problem hiding this comment.
Use the same protected error path for Case 1 callbacks.
Case 1 (Line 352) bypasses rb_protect, while Cases 2/3 capture exceptions and convert them to duckdb_scalar_function_set_error. This inconsistency can propagate Ruby exceptions through DuckDB callback frames.
💡 Suggested fix
if (ruby_native_thread_p()) {
if (ruby_thread_has_gvl_p()) {
- /* Case 1: Ruby thread with GVL - call directly */
- execute_callback_protected((VALUE)&arg);
+ /* Case 1: Ruby thread with GVL - protected execution */
+ callback_with_gvl(&arg);
} else {
/* Case 2: Ruby thread without GVL - reacquire GVL */
rb_thread_call_with_gvl(callback_with_gvl, &arg);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ext/duckdb/scalar_function.c` around lines 349 - 356, The current Case 1 path
calls execute_callback_protected((VALUE)&arg) directly and can let Ruby
exceptions escape; change Case 1 to use the same protected/error-conversion path
as Cases 2/3 by invoking rb_protect (or the existing execute_callback_protected
wrapper that uses rb_protect) so any exception is captured and then translated
to duckdb_scalar_function_set_error; locate the branch using
ruby_native_thread_p() / ruby_thread_has_gvl_p() and replace the direct call to
execute_callback_protected with the protected invocation (keeping the
rb_thread_call_with_gvl(callback_with_gvl, &arg) behavior for Case 2) so all
cases funnel through the exception-to-duckdb_scalar_function_set_error
conversion.
| def test_scalar_function_with_two_parameters # rubocop:disable Metrics/MethodLength, Metrics/AbcSize | ||
| skip 'Scalar functions with Ruby test' if Gem.win_platform? | ||
|
|
||
| @con.execute('SET threads=1') | ||
|
|
There was a problem hiding this comment.
Remove the now-redundant Metrics/AbcSize disable.
Line 122 now triggers Lint/RedundantCopDisableDirective, so the file stays red until this annotation is dropped.
🧹 Proposed fix
- def test_scalar_function_with_two_parameters # rubocop:disable Metrics/MethodLength, Metrics/AbcSize
+ def test_scalar_function_with_two_parameters # rubocop:disable Metrics/MethodLength📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_scalar_function_with_two_parameters # rubocop:disable Metrics/MethodLength, Metrics/AbcSize | |
| skip 'Scalar functions with Ruby test' if Gem.win_platform? | |
| @con.execute('SET threads=1') | |
| def test_scalar_function_with_two_parameters # rubocop:disable Metrics/MethodLength | |
| skip 'Scalar functions with Ruby test' if Gem.win_platform? |
🧰 Tools
🪛 GitHub Actions: Linter
[error] 122-122: Lint/RedundantCopDisableDirective: Unnecessary disabling of Metrics/AbcSize.
[error] 125-125: Layout/EmptyLines: Extra blank line detected.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/duckdb_test/scalar_function_test.rb` around lines 122 - 125, Remove the
redundant RuboCop disable for Metrics/AbcSize in the method definition for
test_scalar_function_with_two_parameters: edit the line that currently reads
"def test_scalar_function_with_two_parameters # rubocop:disable
Metrics/MethodLength, Metrics/AbcSize" to only disable Metrics/MethodLength
(i.e., drop ", Metrics/AbcSize") so the Lint/RedundantCopDisableDirective is
resolved while keeping the necessary Metrics/MethodLength exemption.
| def test_scalar_function_with_multithread | ||
| skip 'Scalar functions with Ruby test' if Gem.win_platform? | ||
|
|
||
| # Use range() to generate enough rows to trigger DuckDB's multi-threaded execution | ||
| @con.execute('CREATE TABLE large_test AS SELECT range::INTEGER AS value FROM range(1000000)') | ||
|
|
||
| sf = DuckDB::ScalarFunction.new | ||
| sf.name = 'triple' | ||
| sf.add_parameter(DuckDB::LogicalType::INTEGER) | ||
| sf.return_type = DuckDB::LogicalType::BIGINT | ||
| sf.set_function { |v| v * 3 } | ||
|
|
||
| @con.register_scalar_function(sf) | ||
| result = @con.execute('SELECT SUM(triple(value)) FROM large_test') | ||
|
|
||
| # sum(0..999999) * 3 = 499999500000 * 3 = 1499998500000 | ||
| assert_equal 1_499_998_500_000, result.first.first | ||
| end |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Show the test body, any thread-setting statements in this file, and the callback dispatch logic.
rg -n -A18 -B2 'def test_scalar_function_with_multithread' test/duckdb_test/scalar_function_test.rb
printf '\n--- any SET threads in scalar_function_test.rb ---\n'
rg -n 'SET threads' test/duckdb_test/scalar_function_test.rb
printf '\n--- callback dispatch in ext/duckdb/scalar_function.c ---\n'
rg -n -A20 -B5 'scalar_function_callback|ruby_native_thread_p|dispatch_callback_to_executor' ext/duckdb/scalar_function.cRepository: suketa/ruby-duckdb
Length of output: 7052
Explicitly set threads > 1 in this multithread test to exercise the executor dispatch path.
The test currently does not set threads. Without this, DuckDB may execute callbacks in the Ruby thread (Cases 1–2 in the callback dispatch logic), bypassing the new executor dispatch (Case 3) introduced in this PR. This means the test can pass on single-threaded configs without actually covering the multi-threaded behavior it claims to test.
Proposed fix
def test_scalar_function_with_multithread
skip 'Scalar functions with Ruby test' if Gem.win_platform?
+ `@con.execute`('SET threads=4')
+
# Use range() to generate enough rows to trigger DuckDB's multi-threaded execution
`@con.execute`('CREATE TABLE large_test AS SELECT range::INTEGER AS value FROM range(1000000)')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/duckdb_test/scalar_function_test.rb` around lines 722 - 739, The test
test_scalar_function_with_multithread currently doesn't force a multi-threaded
executor path; before creating the table and registering the scalar function,
explicitly enable multi-threading on the connection by setting threads > 1 (e.g.
call `@con.execute`('PRAGMA threads=2') or use the DB config API) so the callback
dispatch will run the multi-threaded case when `@con.execute`('CREATE TABLE
large_test ...') and `@con.register_scalar_function`(sf) are exercised; keep the
existing Gem.win_platform? skip and the rest of the test unchanged.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/duckdb_test/scalar_function_test.rb (1)
687-704:⚠️ Potential issue | 🟠 MajorForce
threads > 1to actually exercise the multithread dispatch path.At Line 690+, this test currently depends on environment defaults. In single-thread configs it can pass without covering the new executor-thread callback route.
💡 Suggested patch
def test_scalar_function_with_multithread skip 'Scalar functions with Ruby test' if Gem.win_platform? + `@con.execute`('SET threads=4') + # Use range() to generate enough rows to trigger DuckDB's multi-threaded execution `@con.execute`('CREATE TABLE large_test AS SELECT range::INTEGER AS value FROM range(1000000)')#!/bin/bash # Verify whether thread settings are explicitly controlled in tests/helpers, # and inspect this test block directly. rg -n -C2 "SET\\s+threads|PRAGMA\\s+threads|current_setting\\('threads'\\)" rg -n -A25 -B3 "def test_scalar_function_with_multithread" test/duckdb_test/scalar_function_test.rb🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/duckdb_test/scalar_function_test.rb` around lines 687 - 704, The test test_scalar_function_with_multithread relies on environment defaults and may not exercise the multithread path; explicitly set DuckDB threads > 1 at the start of the test (for example run a PRAGMA/SET like "PRAGMA threads=4" via `@con.execute`) before creating the large_test table and registering the scalar function, and optionally restore the previous thread setting at the end of the test to avoid affecting other tests; update the test body around the `@con.execute` calls (test_scalar_function_with_multithread, creation of large_test, and registration of sf) to enforce the multi-thread configuration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/duckdb_test/scalar_function_test.rb`:
- Around line 687-704: The test test_scalar_function_with_multithread relies on
environment defaults and may not exercise the multithread path; explicitly set
DuckDB threads > 1 at the start of the test (for example run a PRAGMA/SET like
"PRAGMA threads=4" via `@con.execute`) before creating the large_test table and
registering the scalar function, and optionally restore the previous thread
setting at the end of the test to avoid affecting other tests; update the test
body around the `@con.execute` calls (test_scalar_function_with_multithread,
creation of large_test, and registration of sf) to enforce the multi-thread
configuration.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ebcb8809-717d-4fb1-bb6e-fac60b547a10
📒 Files selected for processing (1)
test/duckdb_test/scalar_function_test.rb
MSVC does not provide <pthread.h>. Use CRITICAL_SECTION and CONDITION_VARIABLE (Windows API) on MSVC, keeping pthread on POSIX and MinGW-w64 (mingw, ucrt) which provide winpthreads. Conditional on _MSC_VER to minimize the scope of changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (2)
ext/duckdb/scalar_function.c (2)
21-22:⚠️ Potential issue | 🟠 MajorGuard
ruby_thread_has_gvl_pbehind configure-time feature detection.
ruby_thread_has_gvl_p()is not guaranteed on all supported Ruby builds; unconditional declaration/usage can break compile or link. Please gate declaration and call sites withHAVE_RUBY_THREAD_HAS_GVL_Pgenerated fromhave_func("ruby_thread_has_gvl_p")inextconf.rb.#!/bin/bash # Verify feature detection + guards for ruby_thread_has_gvl_p rg -n --type=ruby 'have_func\("ruby_thread_has_gvl_p"\)' -C2 rg -n --type=c 'HAVE_RUBY_THREAD_HAS_GVL_P|ruby_thread_has_gvl_p\(' -C2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ext/duckdb/scalar_function.c` around lines 21 - 22, The declaration and any uses of ruby_thread_has_gvl_p must be guarded by a configure-time macro; update extconf.rb to call have_func("ruby_thread_has_gvl_p") so it defines HAVE_RUBY_THREAD_HAS_GVL_P, then wrap the extern declaration and all call sites of ruby_thread_has_gvl_p in `#ifdef` HAVE_RUBY_THREAD_HAS_GVL_P / `#endif` blocks (or provide a safe fallback when the macro is not defined). Specifically, protect the extern int ruby_thread_has_gvl_p(void); line and any places that call ruby_thread_has_gvl_p() so compilation/linking won’t fail on Rubies that lack that symbol.
429-433:⚠️ Potential issue | 🟠 MajorRoute Case 1 through the same protected error path as Case 2/3.
Case 1 directly calls
execute_callback_protected, so Ruby exceptions can bypassduckdb_scalar_function_set_errorconversion. Use the same protected wrapper path for consistency.💡 Suggested fix
if (ruby_native_thread_p()) { if (ruby_thread_has_gvl_p()) { - /* Case 1: Ruby thread with GVL - call directly */ - execute_callback_protected((VALUE)&arg); + /* Case 1: Ruby thread with GVL - protected execution */ + callback_with_gvl(&arg); } else { /* Case 2: Ruby thread without GVL - reacquire GVL */ rb_thread_call_with_gvl(callback_with_gvl, &arg); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ext/duckdb/scalar_function.c` around lines 429 - 433, The current Case 1 path directly calls execute_callback_protected((VALUE)&arg), allowing Ruby exceptions to bypass duckdb_scalar_function_set_error; change Case 1 to use the same protected wrapper used by Case 2/3 (i.e., invoke execute_callback_protected via rb_protect/its existing wrapper and handle the VALUE/exception result the same way the other branches do), so any Ruby exception is caught and converted by duckdb_scalar_function_set_error; update the branch around ruby_native_thread_p() / ruby_thread_has_gvl_p() to call the protected wrapper and follow the same post-call exception handling logic as the other cases (reference execute_callback_protected and duckdb_scalar_function_set_error).
🧹 Nitpick comments (1)
ext/duckdb/scalar_function.c (1)
109-303: Align newly added C helper symbols with therbduckdb_prefix guideline.The newly introduced helper symbols (
executor_wait_func,executor_thread_func,dispatch_callback_to_executor,callback_with_gvl,callback_request,g_request_list, etc.) are unprefixed. Renaming torbduckdb_*would keep this file aligned with repository naming rules.As per coding guidelines:
ext/duckdb/**/*.c: C symbols must be prefixed withrbduckdb_to avoid namespace conflicts.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ext/duckdb/scalar_function.c` around lines 109 - 303, The new executor-related C symbols must be renamed to the rbduckdb_ prefix to follow project namespace rules: rename functions executor_wait_func, executor_stop_func, execute_callback_protected, executor_thread_func, ensure_executor_started, dispatch_callback_to_executor, callback_with_gvl to rbduckdb_wait_func, rbduckdb_stop_func, rbduckdb_execute_callback_protected, rbduckdb_thread_func, rbduckdb_ensure_started, rbduckdb_dispatch_callback, rbduckdb_callback_with_gvl respectively; rename types/vars callback_request, callback_arg, executor_wait_data, g_request_list, g_executor_thread, g_executor_started, g_executor_mutex/cond/lock/cond names to use rbduckdb_ prefix and update every use site in this file (including rb_thread_create, rb_thread_call_without_gvl, rb_protect, rb_errinfo handling, and any enqueue/dequeue and signaling code) so all declarations and references remain consistent. Ensure header declarations (if any) are updated to the new names and that platform-specific code paths (MSC_VER vs pthread) reference the renamed symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@ext/duckdb/scalar_function.c`:
- Around line 21-22: The declaration and any uses of ruby_thread_has_gvl_p must
be guarded by a configure-time macro; update extconf.rb to call
have_func("ruby_thread_has_gvl_p") so it defines HAVE_RUBY_THREAD_HAS_GVL_P,
then wrap the extern declaration and all call sites of ruby_thread_has_gvl_p in
`#ifdef` HAVE_RUBY_THREAD_HAS_GVL_P / `#endif` blocks (or provide a safe fallback
when the macro is not defined). Specifically, protect the extern int
ruby_thread_has_gvl_p(void); line and any places that call
ruby_thread_has_gvl_p() so compilation/linking won’t fail on Rubies that lack
that symbol.
- Around line 429-433: The current Case 1 path directly calls
execute_callback_protected((VALUE)&arg), allowing Ruby exceptions to bypass
duckdb_scalar_function_set_error; change Case 1 to use the same protected
wrapper used by Case 2/3 (i.e., invoke execute_callback_protected via
rb_protect/its existing wrapper and handle the VALUE/exception result the same
way the other branches do), so any Ruby exception is caught and converted by
duckdb_scalar_function_set_error; update the branch around
ruby_native_thread_p() / ruby_thread_has_gvl_p() to call the protected wrapper
and follow the same post-call exception handling logic as the other cases
(reference execute_callback_protected and duckdb_scalar_function_set_error).
---
Nitpick comments:
In `@ext/duckdb/scalar_function.c`:
- Around line 109-303: The new executor-related C symbols must be renamed to the
rbduckdb_ prefix to follow project namespace rules: rename functions
executor_wait_func, executor_stop_func, execute_callback_protected,
executor_thread_func, ensure_executor_started, dispatch_callback_to_executor,
callback_with_gvl to rbduckdb_wait_func, rbduckdb_stop_func,
rbduckdb_execute_callback_protected, rbduckdb_thread_func,
rbduckdb_ensure_started, rbduckdb_dispatch_callback, rbduckdb_callback_with_gvl
respectively; rename types/vars callback_request, callback_arg,
executor_wait_data, g_request_list, g_executor_thread, g_executor_started,
g_executor_mutex/cond/lock/cond names to use rbduckdb_ prefix and update every
use site in this file (including rb_thread_create, rb_thread_call_without_gvl,
rb_protect, rb_errinfo handling, and any enqueue/dequeue and signaling code) so
all declarations and references remain consistent. Ensure header declarations
(if any) are updated to the new names and that platform-specific code paths
(MSC_VER vs pthread) reference the renamed symbols.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 371e8706-18bc-4e6b-a19e-1d4c3d67c5cb
📒 Files selected for processing (1)
ext/duckdb/scalar_function.c
Summary
ScalarFunctionto enable multi-threaded execution withoutSET threads=1duckdb_query()inconnection.cto prevent deadlock with the executor threadSET threads=1requirement from scalar function registration and all related testsApproach
DuckDB worker threads (non-Ruby native threads) cannot call
rb_funcall()directly because they don't hold the GVL. This PR implements a 3-way dispatch inscalar_function_callback:rb_thread_call_with_gvl()pthread_mutex/pthread_condThis pattern is inspired by FFI gem's async callback dispatcher. The global executor thread is a Ruby thread created via
rb_thread_create()that waits for callback requests and executes them with GVL protection.Changes
ext/duckdb/scalar_function.cext/duckdb/connection.cduckdb_query()viarb_thread_call_without_gvllib/duckdb/connection.rbcheck_threadsfromregister_scalar_functionlib/duckdb/scalar_function.rbthreads=1documentation notetest/duckdb_test/scalar_function_test.rbSET threads=1, add multi-thread test with 1M rowstest/duckdb_test/connection_test.rbSET threads=1from scalar function teststest/duckdb_test/gc_stress_test.rbSET threads=1only for table function testsTest plan
docker compose run --rm ubuntu bash -c "rake build && rake test")Closes #1135
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Tests
Documentation