Commit afe87da
buffer: speed up Buffer.prototype.copy via view-level copy
Route the native backing of Buffer.prototype.copy (CopyImpl, the `_copy`
binding) through the new v8::ArrayBufferView::CopyArrayBufferViewBytes API
instead of v8::ArrayBuffer::CopyArrayBufferBytes. The previous binding had
to convert both views to ArrayBuffers (ArrayBufferView::Buffer()), read
their byte offsets (ByteOffset()) and test shared-ness
(IsSharedArrayBuffer()) before the copy -- around half a dozen separate V8
API calls per copy. The view-level API does all of that internally from
the views' own fields in a single call, so the binding now just forwards
the two views, the view-relative offsets and the length.
Profiling on AMD EPYC 9135 (x86-64) attributed the small-copy cost almost
entirely to that view->buffer conversion: ArrayBufferView::Buffer() /
JSTypedArray::GetBuffer() alone was ~25% of runtime, paid every call and
twice per copy. Resolving the buffer in JS instead (passing
source.buffer/target.buffer to the binding) was measured and is worse:
the typed-array `.buffer` getter is not JIT-inlined and dispatches through
the CEntry trampoline to a C++ builtin, costing ~36%.
The view-level copy keeps all existing semantics: byte-range clamping,
no-op (0 bytes) on a detached or immutable target, relaxed-atomic memmove
when both sides are SharedArrayBuffer-backed, plain memmove otherwise. The
JS-side view clamping in copyImpl is retained: V8 clamps to the underlying
backing store, which for pooled Buffers is the whole shared pool rather
than the individual view.
buffer-copy.js, median of 30 interleaved runs, AMD EPYC 9135 x86-64
(all changes p < 0.001, Welch t = 19-50):
partial=false bytes=8: 42.2 -> 62.4 Mops/s (+48%)
partial=true bytes=8: 42.0 -> 62.8 Mops/s (+49%)
partial=false bytes=128: 42.2 -> 61.7 Mops/s (+46%)
partial=true bytes=128: 42.0 -> 63.1 Mops/s (+50%)
partial=false bytes=1024: 35.1 -> 47.3 Mops/s (+35%)
partial=true bytes=1024: 37.8 -> 55.1 Mops/s (+46%)
The gain is largest for small/medium copies, where per-call overhead
dominates, and tapers for 1024-byte copies as the memmove itself grows.
Also inlines the former _copyActual helper (only caller was copyImpl) into
copyImpl, folding a redundant target.byteLength read.
Refs: #55422
Signed-off-by: Robert Nagy <ronagy@icloud.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>1 parent 4fdf44b commit afe87da
3 files changed
Lines changed: 23 additions & 35 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9021 | 9021 | | |
9022 | 9022 | | |
9023 | 9023 | | |
9024 | | - | |
| 9024 | + | |
9025 | 9025 | | |
9026 | 9026 | | |
9027 | 9027 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
261 | 261 | | |
262 | 262 | | |
263 | 263 | | |
264 | | - | |
| 264 | + | |
| 265 | + | |
265 | 266 | | |
266 | 267 | | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
271 | | - | |
272 | | - | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
273 | 274 | | |
274 | 275 | | |
275 | 276 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
605 | 605 | | |
606 | 606 | | |
607 | 607 | | |
608 | | - | |
609 | | - | |
610 | | - | |
611 | | - | |
612 | | - | |
613 | | - | |
614 | | - | |
615 | | - | |
616 | | - | |
617 | | - | |
618 | | - | |
619 | | - | |
620 | | - | |
621 | | - | |
622 | | - | |
623 | | - | |
624 | | - | |
625 | | - | |
626 | | - | |
627 | | - | |
628 | | - | |
629 | | - | |
630 | | - | |
631 | | - | |
632 | | - | |
633 | | - | |
634 | | - | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
635 | 622 | | |
636 | 623 | | |
637 | 624 | | |
| |||
0 commit comments