-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
We have been seeing a consistent issue with QR processes timing out, but the collectcells process is not being killed and the job hangs. I believe the issue is here:
podi_collectcells.py, Line 2158:
if (p.is_alive()):
logger.warning("Timeout event triggered, shutting things down ...")
#kill_all_child_processes(process_tracker)
logger.info("Killing collectcells after timeout...")
# podi_logging.print_stacktrace()
p.terminate()
logger.info("all done after timeout problem/error!")
return 1
There should be an additional p.join() after the p.terminate() to clear the child process entries entirely. Without that, the parent collectcells process appears as [python] <defunct> after exiting and cannot be killed directly.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels