fix(ublk): survive buffer-pool mmap OOM at worker init instead of aborting#70
Merged
Merged
Conversation
…rting The per-worker USER_COPY bounce pool mmapped + pre-faulted ~32 MB at worker init and `.expect()`-panicked on failure. With 16 workers and many exports recovering at once, a transient host memory spike at startup made one mmap return ENOMEM; the panic hook fired, then even tiny heap allocations failed and the Rust allocator called abort() (SIGABRT). systemd's on-failure restart re-slammed the same 512 MB burst into a tight host every 10s, re-triggering the OOM until it hit the start-limit and gave up — a self-amplifying crash loop that only escaped when host pressure happened to ease. The pool is a performance optimization (bounded bounce RSS), so the correct degraded behavior when it can't be mmapped is a heap buffer, not a dead daemon: - `worker_pool()` now returns `Option` and never panics. Success is cached; failure is deliberately NOT cached, so a later I/O retries the mmap and the worker upgrades back to the bounded fast path on its own once the host recovers. Logging is throttled to one line per degrade→recover transition (not per I/O while starved). - New `IoBuf` enum (pooled slot or heap vec) and `acquire_io_buf()` give the hot path a single uniform buffer type. `io_task_user_copy` acquires through it; the bounded backpressure path is unchanged when the pool exists. - New `GLOBAL_HEAP_FALLBACKS` counter + `glidefs_ublk_buffer_pool_heap_fallbacks_total` metric so sustained degradation (a worker stuck on heap buffers) is alertable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The per-worker USER_COPY bounce pool mmaps and pre-faults ~32 MB at worker init (16 workers × 256 slots × 128 KB = 512 MB committed in a startup burst), on top of recovering ~66 exports' caches. A transient host memory spike made one mmap return
ENOMEM; the init path did.expect(...)→ panic → the allocator then failed even 184-byte allocations →abort(). systemd'son-failurerestart re-slammed the same 512 MB burst into a tight host every 10 s, re-triggering the OOM — a self-amplifying crash loop that only escaped when host pressure happened to ease.Fix
The pool is a performance optimization (a bounded-RSS bounce buffer). The correct degraded behavior when it can't be mmapped is a heap buffer, not a dead daemon.
worker_pool()returnsOptionand never panics. Success is cached; failure is deliberately not cached, so a later I/O retries the mmap and the worker upgrades back to the bounded fast path once the host recovers. Logging is throttled to one line per degrade→recover transition (not per I/O while starved).IoBufenum (pooled slot or heap vec) +acquire_io_buf()give the hot path a single uniform buffer type. The bounded async-backpressure path is unchanged when the pool exists.GLOBAL_HEAP_FALLBACKScounter +glidefs_ublk_buffer_pool_heap_fallbacks_totalmetric so a worker stuck on heap buffers (sustained degradation) is alertable.Behavior change
Test plan
cargo build --release -p glidefs --features ublk--features ublk)systemctl reload)🤖 Generated with Claude Code