Skip to content

Timeout in collectcells leaves zombie processes #13

@youngmd

Description

@youngmd

We have been seeing a consistent issue with QR processes timing out, but the collectcells process is not being killed and the job hangs. I believe the issue is here:

podi_collectcells.py, Line 2158:

    if (p.is_alive()):
    logger.warning("Timeout event triggered, shutting things down ...")
    #kill_all_child_processes(process_tracker)

    logger.info("Killing collectcells after timeout...")
    # podi_logging.print_stacktrace()
    p.terminate()
    logger.info("all done after timeout problem/error!")
    return 1

There should be an additional p.join() after the p.terminate() to clear the child process entries entirely. Without that, the parent collectcells process appears as [python] <defunct> after exiting and cannot be killed directly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions