Skip to content

Rescue Errno::ESRCH in SpawnWithTimeout#wait_for_process_raw when killing a timed-out process on JRuby #158

Description

@jcouball

Summary

On JRuby, SpawnWithTimeout#wait_for_process_raw raises Errno::ESRCH instead of returning a timed-out result when the subprocess exits between the Timeout::Error rescue and the Process.kill('KILL', pid) call. This race condition prevents callers from receiving the expected ProcessExecuter::TimeoutError.

Version

process_executer 4.0.2

Platform

JRuby 10.0.0.1 on Ubuntu (Linux). Likely affects all JRuby versions.

Root Cause

In lib/process_executer/commands/spawn_with_timeout.rb, the wait_for_process_raw method:

def wait_for_process_raw
  timed_out = false

  process_status =
    begin
      Timeout.timeout(options.timeout_after) { Process.wait2(pid).last }
    rescue Timeout::Error
      Process.kill('KILL', pid)   # <-- raises Errno::ESRCH on JRuby if process already exited
      timed_out = true
      Process.wait2(pid).last
    end

  [process_status, timed_out]
end

When a very short timeout fires (e.g., timeout_after: 0.001), there is a race condition between the Timeout::Error being raised and Process.kill('KILL', pid) executing. If the subprocess exits naturally during this window, Process.kill raises Errno::ESRCH ("No such process") because the PID no longer exists.

Why This Is Practically JRuby-Specific

The race condition is theoretically possible on any Ruby implementation, but Errno::ESRCH is practically JRuby-specific due to how child processes are reaped:

CRuby (MRI): When a child process exits, it becomes a zombie — it stays in the process table until Process.wait/Process.wait2 is called by the parent. Since the only Process.wait2 call in the code is after Process.kill, the PID is still valid when kill runs, even if the process already exited. Process.kill on a zombie succeeds silently — the signal is simply discarded.

JRuby: The JVM manages child processes and can automatically reap them in a background thread. This removes the PID from the process table before Process.kill runs, causing Errno::ESRCH. That's why it manifests reliably on JRuby with very short timeouts but not on CRuby.

There are theoretical edge cases on CRuby (e.g., a SIGCHLD handler that calls Process.wait), but in practice with process_executer's code, this is a JRuby-specific issue.

Recommendation: Even though CRuby is not currently affected, rescuing Errno::ESRCH is good defensive programming that protects against all Ruby implementations and future behavior changes.

How to Reproduce

The failure was observed in the ruby-git CI on the jruby-10.0.0.1 build matrix entry. The test sets a global timeout of 0.001 seconds for a git clone operation:

Git.config.timeout = 0.001 # JRuby-specific timeout
Git.clone('repository.git', 'temp2', timeout: nil)
# Expected: Git::TimeoutError (wraps ProcessExecuter::TimeoutError)
# Actual:   Errno::ESRCH - No such process

Standalone reproduction (requires JRuby):

require 'process_executer'

# With a very short timeout, the race condition is likely on JRuby
result = ProcessExecuter.spawn_with_timeout('echo hello', timeout_after: 0.0001)
# May raise Errno::ESRCH instead of returning a timed_out result

Actual Error

Errno::ESRCH: No such process - No such process
  org/jruby/RubyProcess.java:1722:in 'kill'
  lib/process_executer/commands/spawn_with_timeout.rb:154:in 'wait_for_process_raw'
  lib/process_executer/commands/spawn_with_timeout.rb:130:in 'wait_for_process'
  lib/process_executer/commands/spawn_with_timeout.rb:56:in 'call'

Expected Behavior

When Process.kill raises Errno::ESRCH during timeout handling, the error should be rescued and the process should still be treated as timed out. The method should call Process.wait2(pid) to collect the exit status and return [process_status, true] as it would normally.

Proposed Solution

Rescue Errno::ESRCH around the Process.kill('KILL', pid) call in wait_for_process_raw. When the process has already exited, the kill signal is unnecessary, but the timeout still occurred — so timed_out should still be set to true.

Implementation

Change wait_for_process_raw in lib/process_executer/commands/spawn_with_timeout.rb:

def wait_for_process_raw
  timed_out = false

  process_status =
    begin
      Timeout.timeout(options.timeout_after) { Process.wait2(pid).last }
    rescue Timeout::Error
      timed_out = true
      kill_process
      Process.wait2(pid).last
    end

  [process_status, timed_out]
end

Extract a new private method kill_process to encapsulate the kill-with-rescue logic:

# Send SIGKILL to the process, ignoring Errno::ESRCH if the process has already exited
#
# There is a race condition between the Timeout::Error being raised and
# Process.kill executing. If the subprocess exits naturally during this window,
# Process.kill raises Errno::ESRCH because the PID no longer exists.
#
# This is primarily observed on JRuby, where the JVM can automatically reap child
# processes in a background thread, removing the PID from the process table before
# kill runs. On CRuby, exited child processes remain as zombies until Process.wait
# is called, so kill on a zombie succeeds silently. However, rescuing Errno::ESRCH
# is good defensive programming that protects against all Ruby implementations.
#
# @return [void]
#
def kill_process
  Process.kill('KILL', pid)
rescue Errno::ESRCH
  # Process already exited — nothing to kill
end

Files to Change

  1. lib/process_executer/commands/spawn_with_timeout.rb:

    • Modify wait_for_process_raw to set timed_out = true before calling kill (since the timeout already occurred regardless of whether kill succeeds)
    • Extract kill_process private method with Errno::ESRCH rescue
  2. spec/process_executer_spawn_with_timeout_spec.rb:

    • Add test: when Process.kill raises Errno::ESRCH during timeout handling, the result should still be marked as timed_out? == true
    • Add test: when Process.kill raises Errno::ESRCH, no exception should propagate to the caller
    • Both tests should stub/mock Process.kill to raise Errno::ESRCH to make the test deterministic and platform-independent

Step-by-Step Implementation Plan (TDD)

Follow the project's strict TDD Red-Green-Refactor methodology:

Phase 1: Analysis & Checklist

Checklist of implementation steps:

  • Step 1: RED — Write a failing spec that stubs Process.kill to raise Errno::ESRCH during a timeout and asserts that spawn_with_timeout still returns a result with timed_out? == true (rather than raising Errno::ESRCH)
  • Step 2: GREEN — Rescue Errno::ESRCH in wait_for_process_raw to make the test pass
  • Step 3: REFACTOR — Extract the kill_process private method, move the timed_out = true assignment before the kill call for clarity, add YARD documentation to the new method
  • Step 4: RED — Write a failing spec that verifies Process.kill is still called (i.e., the method attempts to kill the process) even if it raises Errno::ESRCH
  • Step 5: GREEN — Verify the existing implementation already passes this test (it should)
  • Step 6: RED — Write a failing spec for the normal timeout path (no Errno::ESRCH) to ensure the refactored code doesn't break the existing behavior — this should already pass with existing tests, but verify coverage
  • Step 7: GREEN/REFACTOR — Verify all tests pass, run bundle exec rake for full suite + coverage, ensure 100% coverage is maintained
  • Step 8: Update YARD documentation on wait_for_process_raw to document the Errno::ESRCH handling and the JRuby race condition
  • Step 9: Update CHANGELOG.md with a bug fix entry

Testing Details

The tests should use allow(Process).to receive(:kill).and_raise(Errno::ESRCH) to simulate the race condition deterministically on any platform. Example test structure:

context 'when Process.kill raises Errno::ESRCH during timeout handling' do
  let(:command) { %w[sleep 10] }
  let(:options_hash) { { timeout_after: 0.01 } }

  before do
    allow(Process).to receive(:kill).with('KILL', anything).and_raise(Errno::ESRCH)
  end

  it 'does not raise Errno::ESRCH' do
    expect { subject }.not_to raise_error
  end

  it 'returns a result marked as timed out' do
    expect(subject).to have_attributes(timed_out?: true)
  end
end

Note on mocking: Since Process.kill is called inside a rescue Timeout::Error block, the mock must be set up so that the timeout actually fires first. Using sleep 10 with timeout_after: 0.01 ensures the timeout fires. The mock on Process.kill then simulates the race condition where the process exited between the timeout and the kill.

YARD Documentation Updates

  • Add @raise documentation noting that Errno::ESRCH from Process.kill is rescued internally
  • Document the JRuby race condition in the class-level documentation for SpawnWithTimeout
  • Add platform compatibility notes mentioning JRuby behavior

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions