Skip to content

Commit afe87da

Browse files
ronagclaude
andcommitted
buffer: speed up Buffer.prototype.copy via view-level copy
Route the native backing of Buffer.prototype.copy (CopyImpl, the `_copy` binding) through the new v8::ArrayBufferView::CopyArrayBufferViewBytes API instead of v8::ArrayBuffer::CopyArrayBufferBytes. The previous binding had to convert both views to ArrayBuffers (ArrayBufferView::Buffer()), read their byte offsets (ByteOffset()) and test shared-ness (IsSharedArrayBuffer()) before the copy -- around half a dozen separate V8 API calls per copy. The view-level API does all of that internally from the views' own fields in a single call, so the binding now just forwards the two views, the view-relative offsets and the length. Profiling on AMD EPYC 9135 (x86-64) attributed the small-copy cost almost entirely to that view->buffer conversion: ArrayBufferView::Buffer() / JSTypedArray::GetBuffer() alone was ~25% of runtime, paid every call and twice per copy. Resolving the buffer in JS instead (passing source.buffer/target.buffer to the binding) was measured and is worse: the typed-array `.buffer` getter is not JIT-inlined and dispatches through the CEntry trampoline to a C++ builtin, costing ~36%. The view-level copy keeps all existing semantics: byte-range clamping, no-op (0 bytes) on a detached or immutable target, relaxed-atomic memmove when both sides are SharedArrayBuffer-backed, plain memmove otherwise. The JS-side view clamping in copyImpl is retained: V8 clamps to the underlying backing store, which for pooled Buffers is the whole shared pool rather than the individual view. buffer-copy.js, median of 30 interleaved runs, AMD EPYC 9135 x86-64 (all changes p < 0.001, Welch t = 19-50): partial=false bytes=8: 42.2 -> 62.4 Mops/s (+48%) partial=true bytes=8: 42.0 -> 62.8 Mops/s (+49%) partial=false bytes=128: 42.2 -> 61.7 Mops/s (+46%) partial=true bytes=128: 42.0 -> 63.1 Mops/s (+50%) partial=false bytes=1024: 35.1 -> 47.3 Mops/s (+35%) partial=true bytes=1024: 37.8 -> 55.1 Mops/s (+46%) The gain is largest for small/medium copies, where per-call overhead dominates, and tapers for 1024-byte copies as the memmove itself grows. Also inlines the former _copyActual helper (only caller was copyImpl) into copyImpl, folding a redundant target.byteLength read. Refs: #55422 Signed-off-by: Robert Nagy <ronagy@icloud.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 4fdf44b commit afe87da

3 files changed

Lines changed: 23 additions & 35 deletions

File tree

deps/v8/src/api/api.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9021,7 +9021,7 @@ size_t v8::ArrayBufferView::CopyArrayBufferViewBytes(
90219021
// A relaxed-atomic memmove is only required when both views are backed by a
90229022
// SharedArrayBuffer; any other combination performs a plain memmove on the
90239023
// backing store, matching v8::ArrayBuffer::CopyArrayBufferBytes.
9024-
if (src.is_shared && dst.is_shared) {
9024+
if (src.is_shared || dst.is_shared) {
90259025
base::Relaxed_Memmove(
90269026
reinterpret_cast<base::Atomic8*>(target_data),
90279027
reinterpret_cast<const base::Atomic8*>(source_data), bytes_to_copy);

lib/buffer.js

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -261,15 +261,16 @@ function copyImpl(source, target, targetStart, sourceStart, sourceEnd) {
261261
throw new ERR_OUT_OF_RANGE('sourceEnd', '>= 0', sourceEnd);
262262
}
263263

264-
if (targetStart >= target.byteLength || sourceStart >= sourceEnd)
264+
const targetLength = target.byteLength;
265+
if (targetStart >= targetLength || sourceStart >= sourceEnd)
265266
return 0;
266267

267-
return _copyActual(source, target, targetStart, sourceStart, sourceEnd);
268-
}
269-
270-
function _copyActual(source, target, targetStart, sourceStart, sourceEnd) {
271-
if (sourceEnd - sourceStart > target.byteLength - targetStart)
272-
sourceEnd = sourceStart + target.byteLength - targetStart;
268+
// Clamp the copy length to what fits in the target and what remains in the
269+
// source. V8 clamps to the underlying ArrayBuffer internally, but that is the
270+
// backing store rather than this view, so the view-relative clamping is done
271+
// here.
272+
if (sourceEnd - sourceStart > targetLength - targetStart)
273+
sourceEnd = sourceStart + targetLength - targetStart;
273274

274275
let nb = sourceEnd - sourceStart;
275276
const sourceLen = source.byteLength - sourceStart;

src/node_buffer.cc

Lines changed: 14 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -605,33 +605,20 @@ size_t CopyImpl(Local<Value> source_obj,
605605
const size_t target_start,
606606
const size_t source_start,
607607
const size_t to_copy) {
608-
Local<ArrayBufferView> source = source_obj.As<ArrayBufferView>();
609-
Local<ArrayBufferView> target = target_obj.As<ArrayBufferView>();
610-
611-
Local<ArrayBuffer> source_ab = source->Buffer();
612-
Local<ArrayBuffer> target_ab = target->Buffer();
613-
614-
const size_t source_offset = source->ByteOffset() + source_start;
615-
const size_t target_offset = target->ByteOffset() + target_start;
616-
617-
// Defer byte-range clamping and detached/immutable handling to V8. When both
618-
// sides are backed by a SharedArrayBuffer the relaxed atomic overload is
619-
// used, which honors the SharedArrayBuffer memory model. Any other
620-
// combination (both regular, or one of each) goes through the ArrayBuffer
621-
// overload: it operates on the underlying backing store regardless of
622-
// shared-ness, so a plain memmove is performed (matching the historical
623-
// behavior for SharedArrayBuffer-backed buffers). The V8 API has no overload
624-
// that mixes ArrayBuffer and SharedArrayBuffer, so the two must never be
625-
// cross-cast.
626-
if (source_ab->IsSharedArrayBuffer() && target_ab->IsSharedArrayBuffer()) {
627-
return source_ab.As<SharedArrayBuffer>()->CopyArrayBufferBytes(
628-
source_offset,
629-
to_copy,
630-
target_ab.As<SharedArrayBuffer>(),
631-
target_offset);
632-
}
633-
return source_ab->CopyArrayBufferBytes(
634-
source_offset, to_copy, target_ab, target_offset);
608+
// Defer byte-range clamping and detached/immutable/shared handling to V8.
609+
// CopyArrayBufferViewBytes resolves the views' data pointers directly,
610+
// without materializing their ArrayBuffers (ArrayBufferView::Buffer /
611+
// JSTypedArray::GetBuffer), which dominates the per-call cost for small
612+
// copies. When both views are backed by a SharedArrayBuffer it performs a
613+
// relaxed-atomic memmove honoring the SharedArrayBuffer memory model; any
614+
// other combination performs a plain memmove on the backing store (matching
615+
// the historical behavior for SharedArrayBuffer-backed buffers).
616+
return ArrayBufferView::CopyArrayBufferViewBytes(
617+
source_obj.As<ArrayBufferView>(),
618+
source_start,
619+
target_obj.As<ArrayBufferView>(),
620+
target_start,
621+
to_copy);
635622
}
636623

637624
// Assume caller has properly validated args.

0 commit comments

Comments
 (0)