Summary
On JRuby, SpawnWithTimeout#wait_for_process_raw raises Errno::ESRCH instead of returning a timed-out result when the subprocess exits between the Timeout::Error rescue and the Process.kill('KILL', pid) call. This race condition prevents callers from receiving the expected ProcessExecuter::TimeoutError.
Version
process_executer 4.0.2
Platform
JRuby 10.0.0.1 on Ubuntu (Linux). Likely affects all JRuby versions.
Root Cause
In lib/process_executer/commands/spawn_with_timeout.rb, the wait_for_process_raw method:
def wait_for_process_raw
timed_out = false
process_status =
begin
Timeout.timeout(options.timeout_after) { Process.wait2(pid).last }
rescue Timeout::Error
Process.kill('KILL', pid) # <-- raises Errno::ESRCH on JRuby if process already exited
timed_out = true
Process.wait2(pid).last
end
[process_status, timed_out]
end
When a very short timeout fires (e.g., timeout_after: 0.001), there is a race condition between the Timeout::Error being raised and Process.kill('KILL', pid) executing. If the subprocess exits naturally during this window, Process.kill raises Errno::ESRCH ("No such process") because the PID no longer exists.
Why This Is Practically JRuby-Specific
The race condition is theoretically possible on any Ruby implementation, but Errno::ESRCH is practically JRuby-specific due to how child processes are reaped:
CRuby (MRI): When a child process exits, it becomes a zombie — it stays in the process table until Process.wait/Process.wait2 is called by the parent. Since the only Process.wait2 call in the code is after Process.kill, the PID is still valid when kill runs, even if the process already exited. Process.kill on a zombie succeeds silently — the signal is simply discarded.
JRuby: The JVM manages child processes and can automatically reap them in a background thread. This removes the PID from the process table before Process.kill runs, causing Errno::ESRCH. That's why it manifests reliably on JRuby with very short timeouts but not on CRuby.
There are theoretical edge cases on CRuby (e.g., a SIGCHLD handler that calls Process.wait), but in practice with process_executer's code, this is a JRuby-specific issue.
Recommendation: Even though CRuby is not currently affected, rescuing Errno::ESRCH is good defensive programming that protects against all Ruby implementations and future behavior changes.
How to Reproduce
The failure was observed in the ruby-git CI on the jruby-10.0.0.1 build matrix entry. The test sets a global timeout of 0.001 seconds for a git clone operation:
Git.config.timeout = 0.001 # JRuby-specific timeout
Git.clone('repository.git', 'temp2', timeout: nil)
# Expected: Git::TimeoutError (wraps ProcessExecuter::TimeoutError)
# Actual: Errno::ESRCH - No such process
Standalone reproduction (requires JRuby):
require 'process_executer'
# With a very short timeout, the race condition is likely on JRuby
result = ProcessExecuter.spawn_with_timeout('echo hello', timeout_after: 0.0001)
# May raise Errno::ESRCH instead of returning a timed_out result
Actual Error
Errno::ESRCH: No such process - No such process
org/jruby/RubyProcess.java:1722:in 'kill'
lib/process_executer/commands/spawn_with_timeout.rb:154:in 'wait_for_process_raw'
lib/process_executer/commands/spawn_with_timeout.rb:130:in 'wait_for_process'
lib/process_executer/commands/spawn_with_timeout.rb:56:in 'call'
Expected Behavior
When Process.kill raises Errno::ESRCH during timeout handling, the error should be rescued and the process should still be treated as timed out. The method should call Process.wait2(pid) to collect the exit status and return [process_status, true] as it would normally.
Proposed Solution
Rescue Errno::ESRCH around the Process.kill('KILL', pid) call in wait_for_process_raw. When the process has already exited, the kill signal is unnecessary, but the timeout still occurred — so timed_out should still be set to true.
Implementation
Change wait_for_process_raw in lib/process_executer/commands/spawn_with_timeout.rb:
def wait_for_process_raw
timed_out = false
process_status =
begin
Timeout.timeout(options.timeout_after) { Process.wait2(pid).last }
rescue Timeout::Error
timed_out = true
kill_process
Process.wait2(pid).last
end
[process_status, timed_out]
end
Extract a new private method kill_process to encapsulate the kill-with-rescue logic:
# Send SIGKILL to the process, ignoring Errno::ESRCH if the process has already exited
#
# There is a race condition between the Timeout::Error being raised and
# Process.kill executing. If the subprocess exits naturally during this window,
# Process.kill raises Errno::ESRCH because the PID no longer exists.
#
# This is primarily observed on JRuby, where the JVM can automatically reap child
# processes in a background thread, removing the PID from the process table before
# kill runs. On CRuby, exited child processes remain as zombies until Process.wait
# is called, so kill on a zombie succeeds silently. However, rescuing Errno::ESRCH
# is good defensive programming that protects against all Ruby implementations.
#
# @return [void]
#
def kill_process
Process.kill('KILL', pid)
rescue Errno::ESRCH
# Process already exited — nothing to kill
end
Files to Change
-
lib/process_executer/commands/spawn_with_timeout.rb:
- Modify
wait_for_process_raw to set timed_out = true before calling kill (since the timeout already occurred regardless of whether kill succeeds)
- Extract
kill_process private method with Errno::ESRCH rescue
-
spec/process_executer_spawn_with_timeout_spec.rb:
- Add test: when
Process.kill raises Errno::ESRCH during timeout handling, the result should still be marked as timed_out? == true
- Add test: when
Process.kill raises Errno::ESRCH, no exception should propagate to the caller
- Both tests should stub/mock
Process.kill to raise Errno::ESRCH to make the test deterministic and platform-independent
Step-by-Step Implementation Plan (TDD)
Follow the project's strict TDD Red-Green-Refactor methodology:
Phase 1: Analysis & Checklist
Checklist of implementation steps:
Testing Details
The tests should use allow(Process).to receive(:kill).and_raise(Errno::ESRCH) to simulate the race condition deterministically on any platform. Example test structure:
context 'when Process.kill raises Errno::ESRCH during timeout handling' do
let(:command) { %w[sleep 10] }
let(:options_hash) { { timeout_after: 0.01 } }
before do
allow(Process).to receive(:kill).with('KILL', anything).and_raise(Errno::ESRCH)
end
it 'does not raise Errno::ESRCH' do
expect { subject }.not_to raise_error
end
it 'returns a result marked as timed out' do
expect(subject).to have_attributes(timed_out?: true)
end
end
Note on mocking: Since Process.kill is called inside a rescue Timeout::Error block, the mock must be set up so that the timeout actually fires first. Using sleep 10 with timeout_after: 0.01 ensures the timeout fires. The mock on Process.kill then simulates the race condition where the process exited between the timeout and the kill.
YARD Documentation Updates
- Add
@raise documentation noting that Errno::ESRCH from Process.kill is rescued internally
- Document the JRuby race condition in the class-level documentation for
SpawnWithTimeout
- Add platform compatibility notes mentioning JRuby behavior
Related Issues
Summary
On JRuby,
SpawnWithTimeout#wait_for_process_rawraisesErrno::ESRCHinstead of returning a timed-out result when the subprocess exits between theTimeout::Errorrescue and theProcess.kill('KILL', pid)call. This race condition prevents callers from receiving the expectedProcessExecuter::TimeoutError.Version
process_executer 4.0.2
Platform
JRuby 10.0.0.1 on Ubuntu (Linux). Likely affects all JRuby versions.
Root Cause
In
lib/process_executer/commands/spawn_with_timeout.rb, thewait_for_process_rawmethod:When a very short timeout fires (e.g.,
timeout_after: 0.001), there is a race condition between theTimeout::Errorbeing raised andProcess.kill('KILL', pid)executing. If the subprocess exits naturally during this window,Process.killraisesErrno::ESRCH("No such process") because the PID no longer exists.Why This Is Practically JRuby-Specific
The race condition is theoretically possible on any Ruby implementation, but
Errno::ESRCHis practically JRuby-specific due to how child processes are reaped:CRuby (MRI): When a child process exits, it becomes a zombie — it stays in the process table until
Process.wait/Process.wait2is called by the parent. Since the onlyProcess.wait2call in the code is afterProcess.kill, the PID is still valid whenkillruns, even if the process already exited.Process.killon a zombie succeeds silently — the signal is simply discarded.JRuby: The JVM manages child processes and can automatically reap them in a background thread. This removes the PID from the process table before
Process.killruns, causingErrno::ESRCH. That's why it manifests reliably on JRuby with very short timeouts but not on CRuby.There are theoretical edge cases on CRuby (e.g., a
SIGCHLDhandler that callsProcess.wait), but in practice withprocess_executer's code, this is a JRuby-specific issue.Recommendation: Even though CRuby is not currently affected, rescuing
Errno::ESRCHis good defensive programming that protects against all Ruby implementations and future behavior changes.How to Reproduce
The failure was observed in the ruby-git CI on the
jruby-10.0.0.1build matrix entry. The test sets a global timeout of0.001seconds for agit cloneoperation:Standalone reproduction (requires JRuby):
Actual Error
Expected Behavior
When
Process.killraisesErrno::ESRCHduring timeout handling, the error should be rescued and the process should still be treated as timed out. The method should callProcess.wait2(pid)to collect the exit status and return[process_status, true]as it would normally.Proposed Solution
Rescue
Errno::ESRCHaround theProcess.kill('KILL', pid)call inwait_for_process_raw. When the process has already exited, the kill signal is unnecessary, but the timeout still occurred — sotimed_outshould still be set totrue.Implementation
Change
wait_for_process_rawinlib/process_executer/commands/spawn_with_timeout.rb:Extract a new private method
kill_processto encapsulate the kill-with-rescue logic:Files to Change
lib/process_executer/commands/spawn_with_timeout.rb:wait_for_process_rawto settimed_out = truebefore calling kill (since the timeout already occurred regardless of whether kill succeeds)kill_processprivate method withErrno::ESRCHrescuespec/process_executer_spawn_with_timeout_spec.rb:Process.killraisesErrno::ESRCHduring timeout handling, the result should still be marked astimed_out? == trueProcess.killraisesErrno::ESRCH, no exception should propagate to the callerProcess.killto raiseErrno::ESRCHto make the test deterministic and platform-independentStep-by-Step Implementation Plan (TDD)
Follow the project's strict TDD Red-Green-Refactor methodology:
Phase 1: Analysis & Checklist
Checklist of implementation steps:
Process.killto raiseErrno::ESRCHduring a timeout and asserts thatspawn_with_timeoutstill returns a result withtimed_out? == true(rather than raisingErrno::ESRCH)Errno::ESRCHinwait_for_process_rawto make the test passkill_processprivate method, move thetimed_out = trueassignment before the kill call for clarity, add YARD documentation to the new methodProcess.killis still called (i.e., the method attempts to kill the process) even if it raisesErrno::ESRCHErrno::ESRCH) to ensure the refactored code doesn't break the existing behavior — this should already pass with existing tests, but verify coveragebundle exec rakefor full suite + coverage, ensure 100% coverage is maintainedwait_for_process_rawto document theErrno::ESRCHhandling and the JRuby race conditionTesting Details
The tests should use
allow(Process).to receive(:kill).and_raise(Errno::ESRCH)to simulate the race condition deterministically on any platform. Example test structure:Note on mocking: Since
Process.killis called inside arescue Timeout::Errorblock, the mock must be set up so that the timeout actually fires first. Usingsleep 10withtimeout_after: 0.01ensures the timeout fires. The mock onProcess.killthen simulates the race condition where the process exited between the timeout and the kill.YARD Documentation Updates
@raisedocumentation noting thatErrno::ESRCHfromProcess.killis rescued internallySpawnWithTimeoutRelated Issues