Bring in climo file hanging during creation fix#453
Merged
Conversation
Sometimes the climo file generation hangs, this seems to consistently fix the issue
brianpm
approved these changes
May 14, 2026
Collaborator
brianpm
left a comment
There was a problem hiding this comment.
This looks like a clean implementation of what I was doing!
Collaborator
Author
|
@brianpm These were your changes! I forgot to tag you for credit. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The current code can lend itself to climo files hanging during creation. The main improvement here is that the new implementation isolates worker processes much more cleanly and prevents nested parallelism / shared-memory issues that can cause hangs or deadlocks.
Major Changes:
That avoids inherited corrupted state and is much safer for scientific I/O workflows.
The original code passed the entire ADF object into every worker process.
That can be problematic because:
This can create hangs during worker initialization.
The new version passes only a simple string for
adf_userWith multiprocessing, especially 'spawn', importing libraries inside workers can help avoid:
daskschedulersdaskmultithreadingThis created a nested parallelism which can lead to:
Now each worker process computes serially internally.
open_mfdataset(..., chunks=...)The new version forces dask-backed lazy arrays.
This guarantees:
Without explicit closure, workers can accumulate open file descriptors and eventually hang.
Scientific Python stacks sometimes retain:
Explicit garbage collection helps workers release memory sooner between tasks.
Previously a worker crash could:
Now failures are caught and reported cleanly.
@brianpm I tried to tackle the core concepts of the changes and where they are helping fix the code, please let me know if I'm off or something I messed!
EDIT: These changes were supplied by @brianpm!