buffer: speed up Buffer.prototype.copy via view-level copy

ronag · claude · ronag · commit afe87da73dcf · 2026-06-13T22:09:21.000+02:00
Route the native backing of Buffer.prototype.copy (CopyImpl, the `_copy` binding) through the new v8::ArrayBufferView::CopyArrayBufferViewBytes API instead of v8::ArrayBuffer::CopyArrayBufferBytes. The previous binding had to convert both views to ArrayBuffers (ArrayBufferView::Buffer()), read their byte offsets (ByteOffset()) and test shared-ness (IsSharedArrayBuffer()) before the copy -- around half a dozen separate V8 API calls per copy. The view-level API does all of that internally from the views' own fields in a single call, so the binding now just forwards the two views, the view-relative offsets and the length. Profiling on AMD EPYC 9135 (x86-64) attributed the small-copy cost almost entirely to that view->buffer conversion: ArrayBufferView::Buffer() / JSTypedArray::GetBuffer() alone was ~25% of runtime, paid every call and twice per copy. Resolving the buffer in JS instead (passing source.buffer/target.buffer to the binding) was measured and is worse: the typed-array `.buffer` getter is not JIT-inlined and dispatches through the CEntry trampoline to a C++ builtin, costing ~36%. The view-level copy keeps all existing semantics: byte-range clamping, no-op (0 bytes) on a detached or immutable target, relaxed-atomic memmove when both sides are SharedArrayBuffer-backed, plain memmove otherwise. The JS-side view clamping in copyImpl is retained: V8 clamps to the underlying backing store, which for pooled Buffers is the whole shared pool rather than the individual view. buffer-copy.js, median of 30 interleaved runs, AMD EPYC 9135 x86-64 (all changes p < 0.001, Welch t = 19-50): partial=false bytes=8: 42.2 -> 62.4 Mops/s (+48%) partial=true bytes=8: 42.0 -> 62.8 Mops/s (+49%) partial=false bytes=128: 42.2 -> 61.7 Mops/s (+46%) partial=true bytes=128: 42.0 -> 63.1 Mops/s (+50%) partial=false bytes=1024: 35.1 -> 47.3 Mops/s (+35%) partial=true bytes=1024: 37.8 -> 55.1 Mops/s (+46%) The gain is largest for small/medium copies, where per-call overhead dominates, and tapers for 1024-byte copies as the memmove itself grows. Also inlines the former _copyActual helper (only caller was copyImpl) into copyImpl, folding a redundant target.byteLength read. Refs: #55422 Signed-off-by: Robert Nagy <ronagy@icloud.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
diff --git a/deps/v8/src/api/api.cc b/deps/v8/src/api/api.cc
@@ -9021,7 +9021,7 @@ size_t v8::ArrayBufferView::CopyArrayBufferViewBytes(
   // A relaxed-atomic memmove is only required when both views are backed by a
   // SharedArrayBuffer; any other combination performs a plain memmove on the
   // backing store, matching v8::ArrayBuffer::CopyArrayBufferBytes.
-  if (src.is_shared && dst.is_shared) {
+  if (src.is_shared || dst.is_shared) {
     base::Relaxed_Memmove(
         reinterpret_cast<base::Atomic8*>(target_data),
         reinterpret_cast<const base::Atomic8*>(source_data), bytes_to_copy);
diff --git a/lib/buffer.js b/lib/buffer.js
@@ -261,15 +261,16 @@ function copyImpl(source, target, targetStart, sourceStart, sourceEnd) {
       throw new ERR_OUT_OF_RANGE('sourceEnd', '>= 0', sourceEnd);
   }
 
-  if (targetStart >= target.byteLength || sourceStart >= sourceEnd)
+  const targetLength = target.byteLength;
+  if (targetStart >= targetLength || sourceStart >= sourceEnd)
     return 0;
 
-  return _copyActual(source, target, targetStart, sourceStart, sourceEnd);
-}
-
-function _copyActual(source, target, targetStart, sourceStart, sourceEnd) {
-  if (sourceEnd - sourceStart > target.byteLength - targetStart)
-    sourceEnd = sourceStart + target.byteLength - targetStart;
+  // Clamp the copy length to what fits in the target and what remains in the
+  // source. V8 clamps to the underlying ArrayBuffer internally, but that is the
+  // backing store rather than this view, so the view-relative clamping is done
+  // here.
+  if (sourceEnd - sourceStart > targetLength - targetStart)
+    sourceEnd = sourceStart + targetLength - targetStart;
 
   let nb = sourceEnd - sourceStart;
   const sourceLen = source.byteLength - sourceStart;
diff --git a/src/node_buffer.cc b/src/node_buffer.cc
@@ -605,33 +605,20 @@ size_t CopyImpl(Local<Value> source_obj,
                 const size_t target_start,
                 const size_t source_start,
                 const size_t to_copy) {
-  Local<ArrayBufferView> source = source_obj.As<ArrayBufferView>();
-  Local<ArrayBufferView> target = target_obj.As<ArrayBufferView>();
-
-  Local<ArrayBuffer> source_ab = source->Buffer();
-  Local<ArrayBuffer> target_ab = target->Buffer();
-
-  const size_t source_offset = source->ByteOffset() + source_start;
-  const size_t target_offset = target->ByteOffset() + target_start;
-
-  // Defer byte-range clamping and detached/immutable handling to V8. When both
-  // sides are backed by a SharedArrayBuffer the relaxed atomic overload is
-  // used, which honors the SharedArrayBuffer memory model. Any other
-  // combination (both regular, or one of each) goes through the ArrayBuffer
-  // overload: it operates on the underlying backing store regardless of
-  // shared-ness, so a plain memmove is performed (matching the historical
-  // behavior for SharedArrayBuffer-backed buffers). The V8 API has no overload
-  // that mixes ArrayBuffer and SharedArrayBuffer, so the two must never be
-  // cross-cast.
-  if (source_ab->IsSharedArrayBuffer() && target_ab->IsSharedArrayBuffer()) {
-    return source_ab.As<SharedArrayBuffer>()->CopyArrayBufferBytes(
-        source_offset,
-        to_copy,
-        target_ab.As<SharedArrayBuffer>(),
-        target_offset);
-  }
-  return source_ab->CopyArrayBufferBytes(
-      source_offset, to_copy, target_ab, target_offset);
+  // Defer byte-range clamping and detached/immutable/shared handling to V8.
+  // CopyArrayBufferViewBytes resolves the views' data pointers directly,
+  // without materializing their ArrayBuffers (ArrayBufferView::Buffer /
+  // JSTypedArray::GetBuffer), which dominates the per-call cost for small
+  // copies. When both views are backed by a SharedArrayBuffer it performs a
+  // relaxed-atomic memmove honoring the SharedArrayBuffer memory model; any
+  // other combination performs a plain memmove on the backing store (matching
+  // the historical behavior for SharedArrayBuffer-backed buffers).
+  return ArrayBufferView::CopyArrayBufferViewBytes(
+      source_obj.As<ArrayBufferView>(),
+      source_start,
+      target_obj.As<ArrayBufferView>(),
+      target_start,
+      to_copy);
 }
 
 // Assume caller has properly validated args.