Description
When the auto-updater restarts the CIDX server via systemctl restart cidx-server, child processes spawned by subprocess.run() inside RefreshScheduler._index_source() receive SIGTERM from systemd's default KillMode=control-group. The cidx index --fts subprocess is killed mid-indexing, causing CalledProcessError with signal 15 (SIGTERM).
Reproduction Steps
- Start CIDX server with multiple Langfuse repos or golden repos registered
- Wait for refresh jobs to trigger
cidx index --fts (via _index_source())
- Trigger a server restart (auto-updater deploy, or manual
systemctl restart cidx-server)
- Observe logs showing SIGTERM on
cidx index --fts
Expected Behavior
The drain mechanism should wait for in-flight cidx index subprocesses to complete before restarting, OR the subprocess should be gracefully handled so that the job is marked as interrupted rather than failed.
Actual Behavior
ERROR: Indexing (semantic+FTS) on source failed for langfuse_Claude_Code_unknown-global: CalledProcessError:
Command '['cidx', 'index', '--fts']' died with <Signals.SIGTERM: 15>.
Multiple repos fail simultaneously at the exact same timestamp, confirming a server-wide event (restart) rather than per-repo issues.
Root Cause
_index_source() (refresh_scheduler.py:1122) spawns cidx index --fts via subprocess.run() inheriting the parent's process group
- The auto-updater's
restart_server() (deployment_executor.py:1361) enters maintenance mode and waits for drain (300s max)
- The drain mechanism (
_wait_for_drain) only checks BackgroundJobManager's in-memory job status -- it has NO visibility into the child subprocess.run() process
- When drain timeout expires (or drain succeeds but subprocess is still running),
systemctl restart sends SIGTERM to the entire cgroup, killing both Python server AND child processes
- Even if drain "succeeds" (BackgroundJob shows RUNNING), the thread running
_execute_refresh is blocked on subprocess.run() and cannot respond to shutdown signals
Affected Files
src/code_indexer/global_repos/refresh_scheduler.py:1122 - _index_source() subprocess.run()
src/code_indexer/server/auto_update/deployment_executor.py:1361 - restart_server() drain logic
Proposed Fix Options
Option A (Recommended): Run cidx index in its own process group (start_new_session=True in subprocess.run), so systemd's cgroup kill doesn't reach it. The parent catches SIGTERM and waits for the subprocess to finish naturally.
Option B: Improve drain awareness -- have _execute_refresh check a shutdown flag before spawning subprocesses, and skip new indexing if shutdown is in progress.
Option C: Catch CalledProcessError with SIGTERM specifically in _index_source() and treat it as a retriable interruption rather than a hard failure.
Impact
Evidence (Staging Logs 2026-03-07)
03:16:32 ERROR - cidx index --fts died with SIGTERM for langfuse_Claude_Code_unknown-global
03:16:32 ERROR - cidx index --fts died with SIGTERM for langfuse_Claude_Code_seba.battig-global
[Two repos fail at EXACT same second = server-wide event]
04:37:51 ERROR - cidx index --fts died with SIGTERM [repeated pattern]
04:52:06 ERROR - cidx index --fts died with SIGTERM [repeated pattern]
Description
When the auto-updater restarts the CIDX server via
systemctl restart cidx-server, child processes spawned bysubprocess.run()insideRefreshScheduler._index_source()receive SIGTERM from systemd's defaultKillMode=control-group. Thecidx index --ftssubprocess is killed mid-indexing, causingCalledProcessErrorwith signal 15 (SIGTERM).Reproduction Steps
cidx index --fts(via_index_source())systemctl restart cidx-server)cidx index --ftsExpected Behavior
The drain mechanism should wait for in-flight
cidx indexsubprocesses to complete before restarting, OR the subprocess should be gracefully handled so that the job is marked as interrupted rather than failed.Actual Behavior
Multiple repos fail simultaneously at the exact same timestamp, confirming a server-wide event (restart) rather than per-repo issues.
Root Cause
_index_source()(refresh_scheduler.py:1122) spawnscidx index --ftsviasubprocess.run()inheriting the parent's process grouprestart_server()(deployment_executor.py:1361) enters maintenance mode and waits for drain (300s max)_wait_for_drain) only checksBackgroundJobManager's in-memory job status -- it has NO visibility into the childsubprocess.run()processsystemctl restartsends SIGTERM to the entire cgroup, killing both Python server AND child processes_execute_refreshis blocked onsubprocess.run()and cannot respond to shutdown signalsAffected Files
src/code_indexer/global_repos/refresh_scheduler.py:1122-_index_source()subprocess.run()src/code_indexer/server/auto_update/deployment_executor.py:1361-restart_server()drain logicProposed Fix Options
Option A (Recommended): Run
cidx indexin its own process group (start_new_session=Truein subprocess.run), so systemd's cgroup kill doesn't reach it. The parent catches SIGTERM and waits for the subprocess to finish naturally.Option B: Improve drain awareness -- have
_execute_refreshcheck a shutdown flag before spawning subprocesses, and skip new indexing if shutdown is in progress.Option C: Catch
CalledProcessErrorwith SIGTERM specifically in_index_source()and treat it as a retriable interruption rather than a hard failure.Impact
Evidence (Staging Logs 2026-03-07)