rts: halve max heap, return memory to OS after parsing spikes#59
Closed
ccomb wants to merge 1 commit into
Closed
Conversation
The previous flags caused parsing-heavy startups to be killed by the kernel OOM-killer on a 16 GB host: - `-M` was 75 % of RAM. Combined with MUMPS Fortran allocations (outside the GHC heap), parser-intermediate RSS overhead, and kernel/page-cache needs, RSS reached the cgroup limit before `-M` could trigger a clean Haskell heap-exhaustion exit. Now 50 %, leaving real headroom. - `-I30` deferred idle major GC for 30 s, so live-data drops after a parsing spike weren't reflected in heap accounting promptly. Back to the GHC default of 0.3 s. - Added `-Fd1.0` (GHC 9.10+): decays free heap blocks back to the OS over ~1 idle period instead of the default 4.0, which pins RSS near the peak for minutes after parsing finishes. `-A`, `-c`, `-F1.5`, `-qg0` left as-is — they trade off with parser throughput and shouldn't move without measurement.
5 tasks
ccomb
added a commit
that referenced
this pull request
May 16, 2026
Two cheap RTS tweaks (cherry-picked from #59, minus the -M change): - Add -Fd1.0 (GHC 9.10+): decay free heap blocks back to the OS over ~1 idle period instead of the default 4.0, which keeps RSS pinned near peak for minutes after a parsing spike. - -I30 -> -I0.3 (GHC default): trigger idle-time major GC promptly. The previous 30 s deferral hid live-data drops and starved -Fd of free blocks to release. Keeping -M at 75 % of RAM for now: dropping to 50 % may be the right call eventually, but the OpenBLAS musl crash that motivated the change in #59 is fixed independently in the previous commit. Re-evaluate -M once we have RSS curves on the 8 GB target.
Owner
Author
ccomb
added a commit
that referenced
this pull request
May 16, 2026
…ory hygiene (#60) Two independent fixes for running VoLCA on an 8 GB RAM VM. 1) OpenBLAS pthread stack on musl. Static Alpine/musl builds segfaulted (exit 139 / SIGSEGV) inside MUMPS factorization on the first dense BLAS3 call. musl's hardcoded 128 KB default pthread stack (vs glibc's 8 MB from RLIMIT_STACK) is overflowed by OpenBLAS DYNAMIC_ARCH Fortran kernels with large auto-arrays. Patch driver/others/blas_server.c during the Docker build to call pthread_attr_setstacksize(&attr, 8 << 20) right after pthread_attr_init. Two grep guards bracket the sed so a future upstream refactor fails the Docker build loudly instead of silently regressing the runtime. 2) RTS memory return to OS. Add -Fd1.0 (GHC 9.10+) to decay free heap blocks back to the OS over ~1 idle period instead of the default 4.0, which keeps RSS pinned near peak for minutes after a parsing spike. -I30 -> -I0.3 to trigger idle-time major GC promptly; the previous 30 s deferral hid live-data drops and starved -Fd of free blocks. -M kept at 75 % of RAM — re-evaluate once we have RSS curves on the 8 GB target. Cherry-picked the cheap RTS subset from #59 (now closed); the OpenBLAS fix is the load-bearing change for the 8 GB target.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
On a 16 GB host, parsing-heavy startups were being killed by the kernel OOM-killer. RTS flags computed by
docker/rts-flags.shleft no headroom and held memory long after the parsing spike was over.-M75 % → 50 % of RAM. The previous 75 % cap competed with MUMPS Fortran allocations (outside the GHC heap), transient parser-intermediate RSS, and kernel/page-cache needs. RSS hit the cgroup limit before-Mcould trigger a clean Haskell heap-exhaustion exit. 50 % leaves real headroom and lets the runtime fail gracefully on overload.-I30→-I0.3(GHC default). A 30 s idle-GC delay hid live-data drops after parsing, so the runtime kept the post-spike heap inflated for far too long.-Fd1.0(GHC 9.10+). Decays free heap blocks back to the OS over ~1 idle period. Without it, the default decay (4.0) pins RSS near the peak for minutes after parsing finishes.-A,-c,-F1.5,-qg0are left alone — they trade off with parser throughput and shouldn't move without benchmarking.Expected effect
On a 16 GB host with a parsing workload:
Test plan
RTS: ... -> +RTS ...summary on stderr to confirm the new flags appear and-Mis half the cgroup limit.-M2048M).