Conversation
…riting for who creates the hdf5 file
|
@bozhang-hpc Thanks for creating this feature. I think most of this makes good sense. I left review comments. A summary and some additional items:
|
| { | ||
| meminfo_t meminfo = parse_meminfo(); | ||
|
|
||
| if(meminfo.MemAvailableMiB < (size_MB + meminfo.MemTotalMiB - threshold)) |
There was a problem hiding this comment.
There should be a mode to tally this synthetically (i.e. increment in ls_add and decrement in ls_free and act when that tally passes some threshold). A user might (and probably will) want to ration server ranks in order to maintain a certain headroom per process.
There was a problem hiding this comment.
Yes, we can definitely add a memory usage keeper for ls_add() and ls_remove(). But the problem is, and also the reason why I didn't implement this way is: we are not actually allocating a new memory region when we add od to ls (ls_remove() really releases the memory), but we allocate the memory for od before the bulk transfer, and assign the od->data as the bulk transfer destination, then we attach this od's ptr to the ls. So if we really want to keep track of the memory usage to prevent the overflow, we might already got segfault before the checking in ls_add()
There was a problem hiding this comment.
We can address the order of operations by making a memory tally for local storage and doing setters and getters for it. In reality, running out of physical memory won't segfault, it will start swapping into virtual memory.
…d FILEBACKEND option for swap
…swap headers; add find_package for NetCDF
…ther functions to it
CMakeLists.txt
Outdated
| endif() | ||
|
|
||
| if(HDF5_FOUND OR NetCDF_FOUND) | ||
| option(FILEBACKEND_FOUND "Option to Enable File Swap on DataSpaces Server" ON) |
There was a problem hiding this comment.
Might want set here instead of option.
This PR adds a swap space for dspaces when the staging memory is about to be insufficient.
User can specify the staging memory quota in the
dspaces_conf.tomlfile, and when theput_rpc()on the server side is going to allocate the buffer for bulk data transfer, it checks if the memory quota is reached and starts to write the staged data objects into a hdf5 file.The policy of which staged data object is poped out can be also specified in the
dspaces_conf.tomlfile. Currently, we support FIFO (default), LIFO, and LRU.When reading the data, specifically, calling
get_rpc()on the server side, it will first check if the queried data object is in the staging memory. If not found, it will try to read the data back from the hdf5 file to the staging memory. If new allocated buffer for reading also triggers the staging memory quota, it will first pop objects from the staging memory according to the user-specified policy until the server has enough memory for reading the data back.