Skip to content

feat: device alloc: appropriate entire pages, lazily#2465

Open
zyuiop wants to merge 2 commits into
hermit-os:mainfrom
zyuiop:feat/devicealloc-claim-frames
Open

feat: device alloc: appropriate entire pages, lazily#2465
zyuiop wants to merge 2 commits into
hermit-os:mainfrom
zyuiop:feat/devicealloc-claim-frames

Conversation

@zyuiop
Copy link
Copy Markdown
Contributor

@zyuiop zyuiop commented Jun 5, 2026

This is a partial rewrite of #1815.

When device memory is mapped at an offset, map it on demand instead of all at once at boot. To avoid doing a ton of TLB changes when VirtIO allocates memory (read: frequently), we keep the allocated pages in a dedicated list, and reuse them before doing any new frame allocation from the physical free list.

I have considered using a dedicated Talc allocator instead, but this was a bit too complicated (would have required a bunch of VirtIO code paths to do address translation, which we don't want).

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Details
Benchmark Current: 61b2b2f Previous: 28c7089 Performance Ratio
startup_benchmark Build Time 114.10 s 115.28 s 0.99
startup_benchmark File Size 0.77 MB 0.77 MB 1.00
Startup Time - 1 core 0.97 s (±0.08 s) 1.00 s (±0.03 s) 0.97
Startup Time - 2 cores 1.00 s (±0.03 s) 1.04 s (±0.05 s) 0.97
Startup Time - 4 cores 1.00 s (±0.03 s) 1.00 s (±0.06 s) 1.00
multithreaded_benchmark Build Time 110.68 s 118.33 s 0.94
multithreaded_benchmark File Size 0.87 MB 0.87 MB 1.00
Multithreaded Pi Efficiency - 2 Threads 86.33 % (±18.50 %) 92.34 % (±11.70 %) 0.93
Multithreaded Pi Efficiency - 4 Threads 42.27 % (±8.63 %) 45.83 % (±5.87 %) 0.92
Multithreaded Pi Efficiency - 8 Threads 23.38 % (±4.51 %) 25.71 % (±3.50 %) 0.91
micro_benchmarks Build Time 93.83 s 96.08 s 0.98
micro_benchmarks File Size 0.88 MB 0.88 MB 1.00
Scheduling time - 1 thread 73.93 ticks (±5.07 ticks) 73.85 ticks (±3.74 ticks) 1.00
Scheduling time - 2 threads 41.94 ticks (±4.11 ticks) 41.46 ticks (±4.37 ticks) 1.01
Micro - Time for syscall (getpid) 3.87 ticks (±0.38 ticks) 3.08 ticks (±0.19 ticks) 1.26
Memcpy speed - (built_in) block size 4096 73662.69 MByte/s (±50976.89 MByte/s) 73208.48 MByte/s (±50616.82 MByte/s) 1.01
Memcpy speed - (built_in) block size 1048576 29617.10 MByte/s (±24255.76 MByte/s) 29392.18 MByte/s (±24125.47 MByte/s) 1.01
Memcpy speed - (built_in) block size 16777216 22826.80 MByte/s (±18937.52 MByte/s) 23751.74 MByte/s (±19605.56 MByte/s) 0.96
Memset speed - (built_in) block size 4096 73721.81 MByte/s (±51019.62 MByte/s) 73313.38 MByte/s (±50689.78 MByte/s) 1.01
Memset speed - (built_in) block size 1048576 30379.97 MByte/s (±24707.69 MByte/s) 30169.43 MByte/s (±24587.38 MByte/s) 1.01
Memset speed - (built_in) block size 16777216 23124.83 MByte/s (±19061.75 MByte/s) 24402.29 MByte/s (±20008.02 MByte/s) 0.95
Memcpy speed - (rust) block size 4096 66485.41 MByte/s (±46327.60 MByte/s) 66427.37 MByte/s (±46490.49 MByte/s) 1.00
Memcpy speed - (rust) block size 1048576 29532.36 MByte/s (±24236.61 MByte/s) 29425.11 MByte/s (±24175.79 MByte/s) 1.00
Memcpy speed - (rust) block size 16777216 23399.38 MByte/s (±19391.96 MByte/s) 24403.77 MByte/s (±20179.96 MByte/s) 0.96
Memset speed - (rust) block size 4096 66894.91 MByte/s (±46616.22 MByte/s) 66262.32 MByte/s (±46403.75 MByte/s) 1.01
Memset speed - (rust) block size 1048576 30289.31 MByte/s (±24682.15 MByte/s) 30172.11 MByte/s (±24599.55 MByte/s) 1.00
Memset speed - (rust) block size 16777216 23796.37 MByte/s (±19577.74 MByte/s) 25069.95 MByte/s (±20590.63 MByte/s) 0.95
alloc_benchmarks Build Time 91.58 s 91.43 s 1.00
alloc_benchmarks File Size 0.85 MB 0.85 MB 1.00
Allocations - Allocation success 100.00 % 100.00 % 1
Allocations - Deallocation success 100.00 % 100.00 % 1
Allocations - Pre-fail Allocations 100.00 % 100.00 % 1
Allocations - Average Allocation time 4442.26 Ticks (±46.38 Ticks) 4633.96 Ticks (±62.19 Ticks) 0.96
Allocations - Average Allocation time (no fail) 4442.26 Ticks (±46.38 Ticks) 4633.96 Ticks (±62.19 Ticks) 0.96
Allocations - Average Deallocation time 696.44 Ticks (±82.41 Ticks) 683.28 Ticks (±99.25 Ticks) 1.02
mutex_benchmark Build Time 93.18 s 90.68 s 1.03
mutex_benchmark File Size 0.88 MB 0.88 MB 1.00
Mutex Stress Test Average Time per Iteration - 1 Threads 13.82 ns (±0.52 ns) 13.82 ns (±0.84 ns) 1
Mutex Stress Test Average Time per Iteration - 2 Threads 17.10 ns (±0.61 ns) 14.48 ns (±0.75 ns) 1.18

This comment was automatically generated by workflow using github-action-benchmark.

@zyuiop zyuiop force-pushed the feat/devicealloc-claim-frames branch 2 times, most recently from 8d1886b to c9819a0 Compare June 5, 2026 16:11
@zyuiop zyuiop force-pushed the feat/devicealloc-claim-frames branch from c9819a0 to 61b2b2f Compare June 5, 2026 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant