Skip to content

Unique Scratch Directory Names#118

Merged
mxkpp merged 3 commits intodevelopmentfrom
maxkipp-unique-scratch-dir-2
Mar 13, 2026
Merged

Unique Scratch Directory Names#118
mxkpp merged 3 commits intodevelopmentfrom
maxkipp-unique-scratch-dir-2

Conversation

@mxkpp
Copy link

@mxkpp mxkpp commented Mar 12, 2026

This causes the provided scratch directory to be a parent of the actual scratch directory. The actual scratch directory is now created as a child of the directory specified in the configuration file. The name of the child directory is a random string, and it is shared among all MPI ranks.

This is to avoid file contention during concurrent execution.

The cleanup function was also changed some.

TODO: note that when running through ngen, the MPI exit handling logic does not seem to be triggering the cleanup step in cases when forcing engine crashes.

The forcing cleanup is triggering during normal exit when running through ngen, and is triggering under crash conditions and normal exits when running through the debugger (run_bmi_forcing.py with mpirun).

Additions

  • Scratch dir is now created as a child dir of the provided (configured) scratch dir path. The new child dir is a random string.
  • New private attribute ConfigOptions._scratch_dir_has_been_uniquefied used for state management to ensure the uniquefication method is called no more than once.

Removals

Changes

  • Scratch dir is now created as a child dir of the provided (configured) scratch dir path. The new child dir is a random string.
  • Some log messages during cleanup function were altered.

Testing

  1. I ran short range forecasts and confirmed that the new randomized scratch directory is being used and cleaned up, when the program exits normally.
  2. I ran the regrid.py pytests.

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Linux

@mxkpp mxkpp requested a review from kyle-larkin March 12, 2026 22:02
@mxkpp
Copy link
Author

mxkpp commented Mar 12, 2026

Note that the directory name uniqueness would be required even if it is later decided to move the scratch directory into /tmp/, for cases that involve concurrent calls to ngen such as GWO and PSO calibrations.

Copy link

@kyle-larkin kyle-larkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked as expected, I like the changes, and the plan outlined in the TODO.

So use cases such as self._output_obj.outPath in `bmi_model.py` would need to be updated
to use a new configuration key for specifying the output directory for (optionally) storing permanent results.
The process of writing to outPath could still leverage /tmp/ while the file is incomplete / in-process,
using a OS rename to move the file to the shared location once writing has completed and the file handle has been closed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a good approach.

@mxkpp mxkpp merged commit c4c4f9f into development Mar 13, 2026
6 checks passed
@mxkpp mxkpp deleted the maxkipp-unique-scratch-dir-2 branch March 13, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants