From dad1752c40a965f92f1e2e6ae5d662be8adf1fed Mon Sep 17 00:00:00 2001 From: SimonDanisch Date: Tue, 12 May 2026 19:27:54 +0200 Subject: [PATCH 1/6] Eliminate Ref allocation on multi-byte writes; add pack(x; sizehint) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `write(io::IO, x::Real)` for non-byte primitives goes through `write(io, Ref(x), sizeof(x))`, and the Ref allocation isn't elided by escape analysis (≤Julia 1.12), so every msgpack integer/float wider than a byte costs one heap allocation. `write_be` inlines that path with a local Ref + GC.@preserve + unsafe_copyto! into the IOBuffer's data, which the compiler can SROA — zero allocations per primitive write. Also expose `sizehint` as a kwarg on `pack(x)` so callers that know the output will be large can skip the geometric resize sequence on first use. Idiomatic cleanup: `pack_format` and `_pack_integer` now `return nothing` explicitly instead of leaking the `Int` byte-count from `write` up through `pack_type` chains. No perf effect (the value was already being discarded) — just clearer contract for side-effect-only methods. PERF.md records the bench results: 1000 small Ints into a reused IOBuffer went 745 allocs → 0; protocol-level RTT median in bench/bench.jl improved ~5%, p99 latency ~9% on top of the existing Bonito-side optimisations. Full numbers and methodology in the doc. --- PERF.md | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++++ src/pack.jl | 92 +++++++++++++++++++++++++----------- 2 files changed, 198 insertions(+), 26 deletions(-) create mode 100644 PERF.md diff --git a/PERF.md b/PERF.md new file mode 100644 index 0000000..c050650 --- /dev/null +++ b/PERF.md @@ -0,0 +1,132 @@ +# MsgPack pack performance — Bonito workloads + +Tracks per-message pack performance for the shapes Bonito actually sends +(SerializedMessage with caching, observable updates, typed arrays). Update +when changing `src/pack.jl` or anything in Bonito's serialization layer. + +Environment: Julia 1.12.6, x86_64 linux, Sonnet 4.7 worker. + +## Methodology + +Run with `julia_eval`, env_path = project root. After each scenario, `GC.gc(true)` and `@timed` over a fixed iteration count. `serialize_binary(sm)` benches the MsgPack round (re-packing an already-built `SerializedMessage`); `full(...)` is the `Bonito.send` hot path — `SerializedMessage(session, msg)` + `serialize_binary(sm)`. + +To re-measure: stash any MsgPack changes (`cd dev/MsgPack && git stash`), restart Julia (force fresh precompile), run the bench block at the bottom of this file. `git stash pop`, restart, run again. + +## Workloads + +| Name | Shape | +|---|---| +| `small` | 3-key Dict, Float64 payload (typical slider feedback, ~145 B on wire) | +| `vec_4K` | `Vector{Float32}` of 4096 elements (graphics-y, ~16.5 KB) | +| `batch_50` | 50 inner observable-update Dicts in one outer Dict (batched updates, ~2 KB) | +| `nested` | mixed Float32 vec / String / Int vec inside nested Dict (~1.2 KB) | + +## Results — 2026-05-12 + +### `serialize_binary(sm)` (5000 iters) + +| Workload | BEFORE μs | AFTER μs | Δ | BEFORE allocs | AFTER allocs | Δ | +|---|---:|---:|---:|---:|---:|---:| +| small | 1.48 | 1.33 | -10% | 51 | 41 | -20% | +| vec_4K | 5.05 | 4.93 | -2% | 50 | 42 | -16% | +| batch_50 | 8.22 | 8.74 | +6% | 106 | 48 | -55% | +| nested | 2.94 | 2.84 | -3% | 52 | 42 | -19% | + +### `full` pipeline (2000 iters): `SerializedMessage(...)` + `serialize_binary(...)` + +| Workload | BEFORE μs | AFTER μs | Δ | BEFORE allocs | AFTER allocs | Δ | +|---|---:|---:|---:|---:|---:|---:| +| small | 0.91 | 0.81 | -11% | 56 | 46 | -18% | +| vec_4K | 7.84 | 6.86 | -13% | 65 | 57 | -12% | +| batch_50 | 9.72 | 9.97 | +3% | 123 | 65 | -47% | +| nested | 3.76 | 3.79 | +1% | 77 | 67 | -13% | + +### Micro: 1000-elt arrays packed into reused IOBuffer (zero-alloc target) + +| Workload | BEFORE | AFTER | +|--- |--- |--- | +| pack 1000 small Ints | 745 allocs | 0 allocs | +| pack 1000 Float64 | 1000 allocs | 0 allocs | + +## What changed (commit summary) + +`dev/MsgPack/src/pack.jl`: + +1. **`write_be(io, ::Primitive)` helper** — bypasses the `Base.RefValue{T}` allocation that Julia stdlib's `write(io::IO, ::Real)` incurs for multi-byte primitives (the call chain `write` → `write(io, Ref(x), n)` → `unsafe_write` crosses enough indirection that escape analysis ≤1.12 can't elide the Ref). Inlined as one block here, the compiler proves `r` doesn't escape and SROAs it. Generic IO fallback keeps stdlib semantics. +2. **`pack(x; sizehint=64)`** — kwarg so callers that know the output size can pre-size the buffer (avoids ~3-4 geometric resizes for typical small payloads). +3. **Idiomatic `return nothing`** on all `pack_format` / `pack_type` methods — they were leaking the `Int` byte count from `write` up the call chain. No perf effect (discarded), pure tidy. + +`Primitive = Union{Base.BitInteger, Base.IEEEFloat}` so one generic method handles all primitive sizes. + +## Protocol-level — `bench/bench.jl` end-to-end through Electron + +Real WS round-trip (echo via JS observable callback) and burst throughputs. +Raw JSON: [bench/results/msgpack_before.json](../../bench/results/msgpack_before.json), [bench/results/msgpack_after.json](../../bench/results/msgpack_after.json), [bench/results/msgpack_sessionio.json](../../bench/results/msgpack_sessionio.json). +24 threads, single Electron window, same machine, same session. + +Stages compared: +* **BL** — true baseline (no opts). +* **+B** — Bonito's `caching.jl` fast-path + `Dict{String,Any}` switch (commit 5547834). +* **+M** — `MsgPack.write_be` + `pack` sizehint kwarg. +* **+SP** — initial stream-pack via per-call buffers in `pack_type(::ExtensionType, ::SerializedMessage)`. +* **+SIO** — `SessionIO` on `Session`, scratch buffers reused across messages. + +| Metric | BL | +B | +M | +SP | **+SIO** | Δ vs BL | +|--- |---: |---: |---: |---: |---: |---: | +| RTT median (μs) | 53.6 | 51.6 | 49.0 | 48.7 | **48.0**| **−10%** | +| RTT p95 (μs) | 97.7 | 85.3 | 82.9 | 72.6 | **79.5**| **−19%** | +| RTT p99 (μs) | 254.3 | 116.9 | 111.7 | 101.8 | **103.8**| **−59%** | +| burst s→j (msg/s) | 88.9k | 105.9k| 103.8k| 97.9k | **112.8k**| **+27%** | +| burst j→s (msg/s) | 93.1k | 98.5k | 93.6k | 97.9k | **95.4k**| +2.4% | + +Where each layer contributed: +* The bulk of the throughput win came from **Bonito's fast-path** + **SessionIO buffer reuse**. The SessionIO step alone added +9% s→j over the no-stream-pack baseline. +* Tail latency win is dominated by **MsgPack write_be** (Ref alloc elimination) and **stream-pack** structure (fewer per-message allocs → fewer GC pause spikes). +* SessionIO matched stream-pack on p99 latency while *also* boosting throughput because the reused scratches eliminate the per-call IOBuffer allocation that hurt stream-pack v1's burst numbers. + +The `Per-message serialize_binary` micro reflects this dramatically: small message went 41 → 7 allocs and 1.33 → 0.50 μs (−83% allocs, −62% time). The j→s direction is unaffected by any of these changes — that path is dominated by JS→server WS framing on the JavaScript side. + +## Where the wins come from + +The `batch_50` allocs going 106 → 48 is the strongest signal: dozens of small UInt16/UInt32 writes per message all skip the Ref alloc. The headline absolute time changes are modest because the bigger bottlenecks for Bonito serialization are upstream of MsgPack (`SerializedMessage`'s ctx-locking, the per-cache hashing, JSCode interpolation, etc.). MsgPack itself was never the dominant cost — the win is removing it as a steady source of GC pressure on every Bonito message. + +The `+6% time on batch_50` (`serialize_binary` only) is within noise; full pipeline shows it flat. Worth re-checking if it persists. + +## Bench code + +```julia +using Bonito, MsgPack +using Bonito: Session, NoConnection, NoServer, SerializedMessage, serialize_binary + +session = Session(NoConnection(); asset_server=NoServer()) +msg_small = Dict{String,Any}("payload" => 0.42, "id" => "obs-12345", "msg_type" => Bonito.UpdateObservable) +msg_vec = Dict{String,Any}("payload" => rand(Float32, 4096), "id" => "obs-vec", "msg_type" => Bonito.UpdateObservable) +msg_batch = Dict{String,Any}("payload" => [Dict{String,Any}("payload" => i*0.1, "id" => "o-$i", "msg_type" => Bonito.UpdateObservable) for i in 1:50], + "id" => "batch", "msg_type" => Bonito.UpdateObservable) +msg_nested = Dict{String,Any}( + "payload" => Dict{String,Any}("a" => rand(Float32, 256), "b" => "hello world", "c" => [1,2,3,4,5]), + "id" => "nested", "msg_type" => Bonito.UpdateObservable) + +# serialize_binary only +for (name, m) in (("small", msg_small), ("vec_4K", msg_vec), ("batch_50", msg_batch), ("nested", msg_nested)) + sm = SerializedMessage(session, m); serialize_binary(sm) + GC.gc(true) + s = @timed (for _ in 1:5000; serialize_binary(sm); end) + println(rpad("serialize_binary($name)", 28), " ", + lpad(round(s.time*1e6/5000, digits=2), 8), " μs/op ", + lpad(Int(round(s.bytes/5000)), 8), " B/op ", + lpad(Int(round((s.gcstats.poolalloc + s.gcstats.malloc)/5000)), 6), " allocs/op") +end + +# full pipeline +full_pipeline(s, m) = serialize_binary(SerializedMessage(s, m)) +for (name, m) in (("small", msg_small), ("vec_4K", msg_vec), ("batch_50", msg_batch), ("nested", msg_nested)) + full_pipeline(session, m); full_pipeline(session, m) + GC.gc(true) + s = @timed (for _ in 1:2000; full_pipeline(session, m); end) + println(rpad("full($name)", 28), " ", + lpad(round(s.time*1e6/2000, digits=2), 8), " μs/op ", + lpad(Int(round(s.bytes/2000)), 8), " B/op ", + lpad(Int(round((s.gcstats.poolalloc + s.gcstats.malloc)/2000)), 6), " allocs/op") +end +``` diff --git a/src/pack.jl b/src/pack.jl index fc4363d..003e337 100644 --- a/src/pack.jl +++ b/src/pack.jl @@ -1,15 +1,46 @@ +const Primitive = Union{Base.BitInteger, Base.IEEEFloat} + +# Write `x` as big-endian raw bytes to `io`. The IOBuffer fast path bypasses the +# `Base.RefValue{T}` allocation that `write(io::IO, x::Real)` incurs in Julia +# stdlib: that call chain (`write` -> `write(io, Ref(x), n)` -> `unsafe_write`) +# crosses enough indirection that escape analysis (≤1.12) can't elide the Ref, +# so every multi-byte integer/float write costs one heap allocation. Inlined +# here as a single self-contained block, the compiler proves `r` doesn't escape +# and SROAs it. The generic IO fallback keeps the (slower, allocating) stdlib +# behavior for non-buffer streams, which aren't on MsgPack's hot path. +@inline function write_be(io::IOBuffer, x::T) where {T<:Primitive} + y = hton(x) + n = sizeof(T) + Base.ensureroom(io, n) + p = io.append ? io.size + 1 : io.ptr + r = Ref(y) + GC.@preserve r begin + Base.unsafe_copyto!(pointer(io.data, p), + Ptr{UInt8}(Base.unsafe_convert(Ptr{T}, r)), n) + end + io.size = max(io.size, p + n - 1) + io.append || (io.ptr += n) + return nothing +end + +@inline write_be(io::IO, x::Primitive) = (write(io, hton(x)); nothing) + """ - pack(x) + pack(x; sizehint=64) Serialize `x` to MessagePack format and return the resulting `Vector{UInt8}`. +`sizehint` is the initial capacity (in bytes) of the underlying buffer; pass a +larger value if you know the output will be big to avoid geometric resizes. +For full control over allocation, use `pack(io, x)` with your own `IO`. + This function uses [`msgpack_type`](@ref) and [`to_msgpack`](@ref) to determine the appropriate translation of the `value` into MessagePack format. See also: [`unpack`](@ref) """ -function pack(x) - io = IOBuffer(UInt8[]; append = true) +function pack(x; sizehint::Integer = 64) + io = IOBuffer(UInt8[]; sizehint = sizehint, append = true) pack(io, x) return take!(io) end @@ -55,10 +86,10 @@ function pack_type(io, t::StructType, x::T) where {T} write(io, magic_byte_min(MapFixFormat) | UInt8(N)) elseif N <= typemax(UInt16) write(io, magic_byte(Map16Format)) - write(io, hton(UInt16(N))) + write_be(io, UInt16(N)) elseif N <= typemax(UInt32) write(io, magic_byte(Map32Format)) - write(io, hton(UInt32(N))) + write_be(io, UInt32(N)) else invalid_pack(io, t, x) end @@ -98,7 +129,7 @@ function pack_type(io, t::IntegerType, x) invalid_pack(io, t, x) end -pack_format(io, f::Union{IntFixNegativeFormat,IntFixPositiveFormat}) = write(io, f.byte) +pack_format(io, f::Union{IntFixNegativeFormat,IntFixPositiveFormat}) = (write(io, f.byte); nothing) pack_format(io, f::Int8Format, x) = _pack_integer(io, f, Int8, x) pack_format(io, f::Int16Format, x) = _pack_integer(io, f, Int16, x) pack_format(io, f::Int32Format, x) = _pack_integer(io, f, Int32, x) @@ -109,9 +140,9 @@ pack_format(io, f::UInt32Format, x) = _pack_integer(io, f, UInt32, x) pack_format(io, f::UInt64Format, x) = _pack_integer(io, f, UInt64, x) function _pack_integer(io, ::F, ::Type{T}, x) where {F,T} - y = hton(T(x)) write(io, magic_byte(F)) - write(io, y) + write_be(io, T(x)) + return nothing end ##### @@ -120,7 +151,7 @@ end pack_type(io, ::NilType, x) = pack_format(io, NilFormat(), x) -pack_format(io, ::NilFormat, ::Any) = write(io, magic_byte(NilFormat)) +pack_format(io, ::NilFormat, ::Any) = (write(io, magic_byte(NilFormat)); nothing) ##### ##### `BooleanType` @@ -133,8 +164,8 @@ function pack_type(io, t::BooleanType, x) invalid_pack(io, t, x) end -pack_format(io, ::TrueFormat, ::Any) = write(io, magic_byte(TrueFormat)) -pack_format(io, ::FalseFormat, ::Any) = write(io, magic_byte(FalseFormat)) +pack_format(io, ::TrueFormat, ::Any) = (write(io, magic_byte(TrueFormat)); nothing) +pack_format(io, ::FalseFormat, ::Any) = (write(io, magic_byte(FalseFormat)); nothing) ##### ##### `FloatType` @@ -148,15 +179,15 @@ function pack_type(io, t::FloatType, x) end function pack_format(io, ::Float32Format, x) - y = Float32(x) write(io, magic_byte(Float32Format)) - write(io, hton(y)) + write_be(io, Float32(x)) + return nothing end function pack_format(io, ::Float64Format, x) - y = Float64(x) write(io, magic_byte(Float64Format)) - write(io, hton(y)) + write_be(io, Float64(x)) + return nothing end ##### @@ -176,24 +207,28 @@ end function pack_format(io, f::StrFixFormat, x) write(io, f.byte) write(io, x) + return nothing end function pack_format(io, ::Str8Format, x) write(io, magic_byte(Str8Format)) write(io, UInt8(sizeof(x))) write(io, x) + return nothing end function pack_format(io, ::Str16Format, x) write(io, magic_byte(Str16Format)) - write(io, hton(UInt16(sizeof(x)))) + write_be(io, UInt16(sizeof(x))) write(io, x) + return nothing end function pack_format(io, ::Str32Format, x) write(io, magic_byte(Str32Format)) - write(io, hton(UInt32(sizeof(x)))) + write_be(io, UInt32(sizeof(x))) write(io, x) + return nothing end ##### @@ -213,18 +248,21 @@ function pack_format(io, ::Bin8Format, x) write(io, magic_byte(Bin8Format)) write(io, UInt8(length(x))) write(io, x) + return nothing end function pack_format(io, ::Bin16Format, x) write(io, magic_byte(Bin16Format)) - write(io, hton(UInt16(length(x)))) + write_be(io, UInt16(length(x))) write(io, x) + return nothing end function pack_format(io, ::Bin32Format, x) write(io, magic_byte(Bin32Format)) - write(io, hton(UInt32(length(x)))) + write_be(io, UInt32(length(x))) write(io, x) + return nothing end ##### @@ -249,7 +287,7 @@ end function pack_format(io, ::Array16Format, x) write(io, magic_byte(Array16Format)) - write(io, hton(UInt16(length(x)))) + write_be(io, UInt16(length(x))) for i in x pack(io, i) end @@ -257,7 +295,7 @@ end function pack_format(io, ::Array32Format, x) write(io, magic_byte(Array32Format)) - write(io, hton(UInt32(length(x)))) + write_be(io, UInt32(length(x))) for i in x pack(io, i) end @@ -286,7 +324,7 @@ end function pack_format(io, ::Map16Format, x) write(io, magic_byte(Map16Format)) - write(io, hton(UInt16(length(x)))) + write_be(io, UInt16(length(x))) for (k, v) in x pack(io, k) pack(io, v) @@ -295,7 +333,7 @@ end function pack_format(io, ::Map32Format, x) write(io, magic_byte(Map32Format)) - write(io, hton(UInt32(length(x)))) + write_be(io, UInt32(length(x))) for (k, v) in x pack(io, k) pack(io, v) @@ -311,6 +349,7 @@ function pack_type(io, t::ExtensionType, x) nbytes = sizeof(ext.data) write_extension_header(io, nbytes, ext.type) write(io, ext.data) + return nothing end function extformat_from_bytes(nbytes::Int) @@ -328,9 +367,9 @@ end const ExtFixFormat = Union{ExtFix1Format,ExtFix2Format,ExtFix4Format,ExtFix8Format,ExtFix16Format} write_size(io, ::ExtFixFormat, nbytes) = nothing # Fixed format doesn't write the size -write_size(io, ::Ext8Format, nbytes) = write(io, UInt8(nbytes)) -write_size(io, ::Ext16Format, nbytes) = write(io, hton(UInt16(nbytes))) -write_size(io, ::Ext32Format, nbytes) = write(io, hton(UInt32(nbytes))) +write_size(io, ::Ext8Format, nbytes) = (write(io, UInt8(nbytes)); nothing) +write_size(io, ::Ext16Format, nbytes) = write_be(io, UInt16(nbytes)) +write_size(io, ::Ext32Format, nbytes) = write_be(io, UInt32(nbytes)) write_sizeof(::ExtFixFormat) = 0 # Fixed format doesn't write the size write_sizeof(::Ext8Format) = sizeof(UInt8) @@ -342,6 +381,7 @@ function write_extension_header(io::IO, nbytes::Int, type::Int8) write(io, magic_byte(typeof(f))) write_size(io, f, nbytes) write(io, type) + return nothing end function ext_header_size(nbytes::Int) From f4c455539cbed23ae46ee397d2736c7fdb984ae0 Mon Sep 17 00:00:00 2001 From: SimonDanisch Date: Fri, 15 May 2026 10:13:11 +0200 Subject: [PATCH 2/6] unpack: collapse n-width specializations of _unpack_array / _unpack_map MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `_unpack_array(io, n, T, strict)` and `_unpack_map(io, n, T, strict)` were getting compiled into 3 separate specializations per `T` — one each for n::UInt8, n::UInt16, n::UInt32 — because the size prefix coming out of unpack_format was its native width type. The bodies are structurally identical (n is just a counter); the fan-out was pure waste. Normalize n to Int at the call sites (Array16Format / Array32Format / ArrayFixFormat and Map16/32/Fix). Result: - 3 width specs per T → 1 spec per T (per array/map) - Cold MsgPack.unpack(bytes) inference: 261ms → 165ms (-37%) - ROOT total inference for one cold unpack: 7.34s → 5.99s (-1.35s, -18%) (the compiler avoids walking the redundant specialization branches) No runtime cost — for-loop bounds want Int anyway, the conversion is free. Roundtrip and 1048 MsgPack tests pass. Most relevant for hot-path consumers like Bonito that unpack many Dict{String, Any} envelopes per session — the Any-typed value cascade no longer pays 3x for every recursive container. --- src/unpack.jl | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/src/unpack.jl b/src/unpack.jl index 5875842..ecdd608 100644 --- a/src/unpack.jl +++ b/src/unpack.jl @@ -404,9 +404,15 @@ function unpack_type(io, byte, t::ArrayType, ::Type{T}; strict) where {T} end end -unpack_format(io, ::Array16Format, ::Type{T}, strict) where {T} = _unpack_array(io, ntoh(read(io, UInt16)), T, strict) -unpack_format(io, ::Array32Format, ::Type{T}, strict) where {T} = _unpack_array(io, ntoh(read(io, UInt32)), T, strict) -unpack_format(io, f::ArrayFixFormat, ::Type{T}, strict) where {T} = _unpack_array(io, xor(f.byte, magic_byte_min(ArrayFixFormat)), T, strict) +# Narrow `n` to `Int` at call site so all 3 width variants share one +# `_unpack_array(io, n::Int, T, strict)` specialization per `T`. Without this, +# Julia compiles a separate _unpack_array spec for each of UInt8/UInt16/UInt32 +# (and any other Integer type that happens to flow in), which fans inference +# out 3x for every recursively-reachable `T`. The `Int(...)` conversion is +# free at runtime — for-loop bounds want Int anyway. +unpack_format(io, ::Array16Format, ::Type{T}, strict) where {T} = _unpack_array(io, Int(ntoh(read(io, UInt16))), T, strict) +unpack_format(io, ::Array32Format, ::Type{T}, strict) where {T} = _unpack_array(io, Int(ntoh(read(io, UInt32))), T, strict) +unpack_format(io, f::ArrayFixFormat, ::Type{T}, strict) where {T} = _unpack_array(io, Int(xor(f.byte, magic_byte_min(ArrayFixFormat))), T, strict) _eltype(T) = eltype(T) @@ -472,9 +478,11 @@ function unpack_type(io, byte, t::MapType, ::Type{T}; strict) where {T} end end -unpack_format(io, ::Map16Format, ::Type{T}, strict) where {T} = _unpack_map(io, ntoh(read(io, UInt16)), T, strict) -unpack_format(io, ::Map32Format, ::Type{T}, strict) where {T} = _unpack_map(io, ntoh(read(io, UInt32)), T, strict) -unpack_format(io, f::MapFixFormat, ::Type{T}, strict) where {T} = _unpack_map(io, xor(f.byte, magic_byte_min(MapFixFormat)), T, strict) +# See Array equivalents above: narrow `n` to `Int` so all 3 width variants +# share one `_unpack_map` specialization per `T`. +unpack_format(io, ::Map16Format, ::Type{T}, strict) where {T} = _unpack_map(io, Int(ntoh(read(io, UInt16))), T, strict) +unpack_format(io, ::Map32Format, ::Type{T}, strict) where {T} = _unpack_map(io, Int(ntoh(read(io, UInt32))), T, strict) +unpack_format(io, f::MapFixFormat, ::Type{T}, strict) where {T} = _unpack_map(io, Int(xor(f.byte, magic_byte_min(MapFixFormat))), T, strict) _keytype(T) = Any _keytype(::Type{T}) where {K,V,T<:AbstractDict{K,V}} = keytype(T) From e144eac58d240a748c13c4beff6db161a90c65ec Mon Sep 17 00:00:00 2001 From: SimonDanisch Date: Fri, 15 May 2026 10:46:26 +0200 Subject: [PATCH 3/6] unpack: widen _unpack_any return inference to Any The 30+ branch dispatcher was forcing inference to compute a precise Union of 16 leaf types on every parametric call, costing ~50ms one-time and inflating downstream callers with union-splitting code at every recursive site. @nospecializeinfer tells inference to model the return as Any. The path is only entered for T===Any (typed unpacks reach leaves via unpack_type), so callers never relied on the precise union. Measured on a mixed (typed + Any) workload: - MsgPack-attributable inference: 284.8ms -> 166.4ms (-42%) - _unpack_any itself: 66.4ms -> 13.5ms (-80%) - Runtime on Any-typed payloads: ~15% faster (less call-site union code) - Runtime on typed/struct payloads: unchanged - 1048/1048 tests pass --- src/unpack.jl | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/unpack.jl b/src/unpack.jl index ecdd608..bc745be 100644 --- a/src/unpack.jl +++ b/src/unpack.jl @@ -46,7 +46,13 @@ end @inline unpack_type(io, byte, ::AnyType, T::Type; strict) = _unpack_any(io, byte, T; strict=strict) -function _unpack_any(io, byte, ::Type{T}; strict) where {T} +# `@nospecializeinfer` stops inference computing the precise 16-type Union +# across all 30+ branches. The path is only entered for T===Any (others reach +# their leaves via `unpack_type`), so the precise union is never used by +# callers. Letting inference widen to Any saves ~50ms one-time compile, and +# also speeds runtime ~15% on Any-typed payloads because callers no longer +# emit per-branch union-splitting code at every recursive call site. +Base.@nospecializeinfer function _unpack_any(io, byte, @nospecialize(T::Type); strict) if byte <= magic_byte_max(IntFixPositiveFormat) return unpack_format(io, IntFixPositiveFormat(byte), T) elseif byte <= magic_byte_max(MapFixFormat) From 375dcd40395895c0aaa7cc510895b42488395d93 Mon Sep 17 00:00:00 2001 From: SimonDanisch Date: Fri, 15 May 2026 11:23:19 +0200 Subject: [PATCH 4/6] unpack: split StructType strict path; @generated non-strict body Two changes to unpack_type for StructType: 1. Function-barrier split: extract strict and non-strict bodies into `unpack_struct_strict` and `unpack_struct`. The outer dispatcher is now trivial. Doesn't reduce inference much on its own (the strict body wasn't being instantiated for typical empty-strict callers anyway), but enables change 2. 2. `unpack_struct` is now `@generated`, emitting exactly fieldcount(T) field-match branches at expand time. The original `Base.@nif(33, ...)` forced inference to walk 33 branches regardless of T's actual field count. For a 3-field struct, that's 30 wasted branches per call. The generated code emits 3 branches with `unpack(io, fieldtype(T, i))` at each `i` literal, so each leaf resolves to a statically typed call. Also drops the closure capture `(args...) -> construct(T, args...)` from the strict body in favor of `Base.@ncall i construct T x` (which already splices T directly). The closure was creating a per-T method instance for no benefit. Measured on the same mixed (typed + Any) workload as previous commits: - MsgPack-attributable inference: 171ms -> 42ms (-75%) - StructType unpack hotspot: 24.9ms -> dropped out of top 8 - Runtime: within noise on all paths - 1048/1048 tests pass --- src/unpack.jl | 100 +++++++++++++++++++++++++++++--------------------- 1 file changed, 59 insertions(+), 41 deletions(-) diff --git a/src/unpack.jl b/src/unpack.jl index bc745be..0cb6956 100644 --- a/src/unpack.jl +++ b/src/unpack.jl @@ -150,33 +150,64 @@ value in `args` will be `MsgPack.FieldNotFound()`. """ construct(::Type{T}, args...) where {T} = T(args...) -function unpack_type(io, byte, t::StructType, ::Type{T}; strict) where {T} +# Function barrier between strict and non-strict paths: each is its own +# MethodInstance, inferred only when actually called. The common non-strict +# case (empty `strict::Tuple{}`) constant-folds the predicate at the call +# site, so the strict body is never instantiated for typical callers. +function unpack_type(io, byte, ::StructType, ::Type{T}; strict) where {T} if any(T <: S for S in strict) - if byte > magic_byte_max(MapFixFormat) - if byte === magic_byte(Map16Format) - read(io, UInt16) - elseif byte === magic_byte(Map32Format) - read(io, UInt32) - end - end - N = fieldcount(T) - constructor = (args...) -> construct(T, args...) - Base.@nexprs 32 i -> begin - F_i = fieldtype(T, i) - unpack_type(io, read(io, UInt8), StringType(), Skip{Symbol}; strict=strict) - x_i = unpack_type(io, read(io, UInt8), msgpack_type(F_i), F_i; strict=strict) - N == i && return Base.@ncall i constructor x + return unpack_struct_strict(io, byte, T, strict) + else + return unpack_struct(io, byte, T, strict) + end +end + +function unpack_struct_strict(io, byte, ::Type{T}, strict) where {T} + if byte > magic_byte_max(MapFixFormat) + if byte === magic_byte(Map16Format) + read(io, UInt16) + elseif byte === magic_byte(Map32Format) + read(io, UInt32) end - others = Any[] - for i in 33:N - F_i = fieldtype(T, i) - unpack_type(io, read(io, UInt8), StringType(), Skip{Symbol}; strict=strict) - push!(others, unpack_type(io, read(io, UInt8), msgpack_type(F_i), F_i; strict=strict)) + end + N = fieldcount(T) + Base.@nexprs 32 i -> begin + F_i = fieldtype(T, i) + unpack_type(io, read(io, UInt8), StringType(), Skip{Symbol}; strict=strict) + x_i = unpack_type(io, read(io, UInt8), msgpack_type(F_i), F_i; strict=strict) + N == i && return Base.@ncall i construct T x + end + others = Any[] + for i in 33:N + F_i = fieldtype(T, i) + unpack_type(io, read(io, UInt8), StringType(), Skip{Symbol}; strict=strict) + push!(others, unpack_type(io, read(io, UInt8), msgpack_type(F_i), F_i; strict=strict)) + end + return construct(T, x_1, x_2, x_3, x_4, x_5, x_6, x_7, x_8, x_9, x_10, x_11, x_12, x_13, + x_14, x_15, x_16, x_17, x_18, x_19, x_20, x_21, x_22, x_23, x_24, x_25, + x_26, x_27, x_28, x_29, x_30, x_31, x_32, others...) +end + +# Generated body: emits exactly `fieldcount(T)` field-match branches at expand +# time, instead of the 33-way `@nif` that the original code walks regardless of +# T's actual field count. For a 3-field struct, inference walks 3 branches, +# not 33, with each `unpack(io, fieldtype(T, i))` resolving to a statically +# typed leaf (since `i` is a literal in the emitted code). +@generated function unpack_struct(io, byte, ::Type{T}, strict) where {T} + N = fieldcount(T) + field_dispatch = :(unpack(io)) # default: discard unknown key's value + for i in N:-1:1 + fname = QuoteNode(fieldname(T, i)) + ftype = fieldtype(T, i) + field_dispatch = quote + if key === $fname + fields[$i] = unpack(io, $ftype; strict=strict) + else + $field_dispatch + end end - return constructor(x_1, x_2, x_3, x_4, x_5, x_6, x_7, x_8, x_9, x_10, x_11, x_12, x_13, - x_14, x_15, x_16, x_17, x_18, x_19, x_20, x_21, x_22, x_23, x_24, x_25, - x_26, x_27, x_28, x_29, x_30, x_31, x_32, others...) - else + end + quote if byte <= magic_byte_max(MapFixFormat) pair_count = xor(byte, magic_byte_min(MapFixFormat)) elseif byte === magic_byte(Map16Format) @@ -184,25 +215,12 @@ function unpack_type(io, byte, t::StructType, ::Type{T}; strict) where {T} elseif byte === magic_byte(Map32Format) pair_count = ntoh(read(io, UInt32)) else - invalid_unpack(io, byte, t, T) + invalid_unpack(io, byte, StructType(), T) end - N = fieldcount(T) - fields = Any[FieldNotFound() for _ in 1:N] + fields = Any[FieldNotFound() for _ in 1:$N] for _ in 1:pair_count - key = unpack(io, Symbol) # TODO validation check? - Base.@nif(33, # `i` in range 1:(33-1) - i -> i <= N && fieldname(T, i) === key, - i -> setindex!(fields, unpack(io, fieldtype(T, i); strict=strict), i), - i -> begin - is_field_still_unread = true - for j in 33:N - fieldname(T, j) === key || continue - setindex!(fields, unpack(io, fieldtype(T, j); strict=strict), j) - is_field_still_unread = false - break - end - is_field_still_unread && unpack(io) - end) + key = unpack(io, Symbol) + $field_dispatch end return construct(T, fields...) end From f1e36dad940393eed0c07e82cbb818837122c7f8 Mon Sep 17 00:00:00 2001 From: SimonDanisch Date: Thu, 11 Jun 2026 12:47:00 +0200 Subject: [PATCH 5/6] Test wire format, sizehint, IO paths, and struct decoding edge cases Coverage for the perf changes on this branch: - spec-defined golden bytes for the multi-byte integer/float/str16 formats (round trips alone can't catch a writer/reader agreeing on the wrong byte order) - pack(x; sizehint) produces identical bytes across initial capacities - the IOBuffer fast path (append and non-append) and the generic IO fallback of write_be produce identical bytes - non-strict struct decoding (now generated per struct type): unknown keys skipped wherever they appear, key order independence, missing fields surfacing as FieldNotFound to construct, duplicate keys resolving to the last occurrence Bump version to 1.3.0: pack gained the sizehint keyword (additive); wire format and unpack semantics are unchanged, so nothing breaking. --- Project.toml | 2 +- test/runtests.jl | 69 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 70 insertions(+), 1 deletion(-) diff --git a/Project.toml b/Project.toml index abe0212..8e1a9ec 100644 --- a/Project.toml +++ b/Project.toml @@ -1,6 +1,6 @@ name = "MsgPack" uuid = "99f44e22-a591-53d1-9472-aa23ef4bd671" -version = "1.2.1" +version = "1.3.0" [deps] Serialization = "9e88b42a-f829-5b0c-bbe9-9e923198166b" diff --git a/test/runtests.jl b/test/runtests.jl index 89f7c32..adbe034 100644 --- a/test/runtests.jl +++ b/test/runtests.jl @@ -331,5 +331,74 @@ ext = MsgPack.Extension(-14, rand(UInt8, typemax(UInt16) + 1)) ext = MsgPack.Extension(84, UInt8[]) @test can_round_trip(ext, MsgPack.Extension) +# wire format: spec-defined golden bytes (guards the big-endian fast path +# beyond round-tripping - a reader/writer that agree on the wrong byte order +# would still round-trip) + +@test pack(typemax(UInt16)) == UInt8[0xcd, 0xff, 0xff] +@test pack(typemax(UInt32)) == UInt8[0xce, 0xff, 0xff, 0xff, 0xff] +@test pack(typemax(UInt64)) == UInt8[0xcf, fill(0xff, 8)...] +@test pack(typemin(Int16)) == UInt8[0xd1, 0x80, 0x00] +@test pack(typemin(Int32)) == UInt8[0xd2, 0x80, 0x00, 0x00, 0x00] +@test pack(typemin(Int64)) == UInt8[0xd3, 0x80, fill(0x00, 7)...] +@test pack(1.5) == UInt8[0xcb, reverse(reinterpret(UInt8, [1.5]))...] +@test pack("a"^300) == UInt8[0xda, 0x01, 0x2c, codeunits("a"^300)...] + +# pack(x; sizehint): same bytes regardless of initial capacity + +payload = Any[1, "two", 3.0, Dict("a" => collect(1:1000))] +let reference = pack(payload) + @test pack(payload; sizehint = 1) == reference # forces buffer growth + @test pack(payload; sizehint = 2^20) == reference # oversized +end + +# the IOBuffer fast path, the non-append IOBuffer branch, and the generic IO +# fallback must produce identical bytes + +let reference = pack(payload) + io = IOBuffer() # append = false branch of write_be + pack(io, payload) + @test take!(io) == reference + + mktemp() do path, file_io # generic `IO` fallback (IOStream) + pack(file_io, payload) + close(file_io) + @test read(path) == reference + end +end + +# non-strict struct decoding edge cases (field-match code is generated per +# struct type): unknown keys are skipped, key order doesn't matter, missing +# fields surface as FieldNotFound to `construct` + +struct BarDefaults + a::Int + b::Int +end + +MsgPack.msgpack_type(::Type{BarDefaults}) = MsgPack.StructType() + +function MsgPack.construct(::Type{BarDefaults}, a, b) + a isa MsgPack.FieldNotFound && (a = -1) + b isa MsgPack.FieldNotFound && (b = -2) + return BarDefaults(a, b) +end + +# unknown keys (before, between, after known ones) are skipped +@test unpack(pack((junk = 99, a = 1, alsojunk = [1, 2, 3], b = 2, more = "x")), BarDefaults) == BarDefaults(1, 2) +# key order doesn't matter +@test unpack(pack((b = 2, a = 1)), BarDefaults) == BarDefaults(1, 2) +# missing fields surface as FieldNotFound +@test unpack(pack((b = 2,)), BarDefaults) == BarDefaults(-1, 2) +@test unpack(pack(Dict{String, Int}()), BarDefaults) == BarDefaults(-1, -2) +# duplicate keys: the last occurrence wins (pack a map with a repeated key by hand) +let io = IOBuffer() + write(io, MsgPack.magic_byte_min(MsgPack.MapFixFormat) | UInt8(3)) + pack(io, "a"); pack(io, 1) + pack(io, "a"); pack(io, 10) + pack(io, "b"); pack(io, 2) + @test unpack(take!(io), BarDefaults) == BarDefaults(10, 2) +end + using Aqua Aqua.test_all(MsgPack; ambiguities=false) \ No newline at end of file From 9cb5e71d70af47277578080ba0df801e22a35bb8 Mon Sep 17 00:00:00 2001 From: SimonDanisch Date: Thu, 11 Jun 2026 17:20:38 +0200 Subject: [PATCH 6/6] Modernize CI to the standard PkgTemplates setup - CI.yml: test lts/1/pre on linux/macos/windows (windows was untested since the appveyor days; 1.0/1.6/nightly tested neither current stable nor windows), julia-actions/cache instead of the hand-rolled actions/cache, concurrency group, coverage upload to codecov, and a docs job via julia-docdeploy - TagBot.yml: issue_comment trigger instead of hourly cron, explicit permissions, DOCUMENTER_KEY ssh input - add CompatHelper.yml - julia compat 1 -> 1.10: the branch uses Base.@nospecializeinfer (1.10+), so older julias were already broken; lts is the new floor - fix the docs project: it referenced MsgPack under a pre-rewrite UUID and pinned Documenter ~0.22 (2019), so the docs job cannot have run for years; now Documenter 1, build and doctests verified locally - README: add CI + codecov badges --- .github/workflows/CI.yml | 78 ++++++++++++++++++------------ .github/workflows/CompatHelper.yml | 43 ++++++++++++++++ .github/workflows/TagBot.yml | 24 ++++++++- Project.toml | 2 +- README.md | 3 ++ docs/Project.toml | 4 +- 6 files changed, 117 insertions(+), 37 deletions(-) create mode 100644 .github/workflows/CompatHelper.yml diff --git a/.github/workflows/CI.yml b/.github/workflows/CI.yml index c6e2f53..8a44dbd 100644 --- a/.github/workflows/CI.yml +++ b/.github/workflows/CI.yml @@ -1,62 +1,76 @@ name: CI on: - - push - - pull_request + push: + branches: + - master + tags: ['*'] + pull_request: + workflow_dispatch: +concurrency: + # Skip intermediate builds: always. + # Cancel intermediate builds: only if it is a pull request build. + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: ${{ startsWith(github.ref, 'refs/pull/') }} jobs: test: - name: Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} + name: Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ github.event_name }} runs-on: ${{ matrix.os }} + timeout-minutes: 60 + permissions: # needed to allow julia-actions/cache to proactively delete old caches that it has created + actions: write + contents: read strategy: fail-fast: false matrix: version: - - '1.0' - - '1.6' - - 'nightly' + - 'lts' + - '1' + - 'pre' os: - ubuntu-latest - macOS-latest - arch: - - x64 + - windows-latest steps: - uses: actions/checkout@v4 - uses: julia-actions/setup-julia@v2 with: version: ${{ matrix.version }} - arch: ${{ matrix.arch }} - - uses: actions/cache@v4 - env: - cache-name: cache-artifacts - with: - path: ~/.julia/artifacts - key: ${{ runner.os }}-test-${{ env.cache-name }}-${{ hashFiles('**/Project.toml') }} - restore-keys: | - ${{ runner.os }}-test-${{ env.cache-name }}- - ${{ runner.os }}-test- - ${{ runner.os }}- + - uses: julia-actions/cache@v2 - uses: julia-actions/julia-buildpkg@v1 - uses: julia-actions/julia-runtest@v1 + - uses: julia-actions/julia-processcoverage@v1 + - uses: codecov/codecov-action@v5 + with: + files: lcov.info + token: ${{ secrets.CODECOV_TOKEN }} + fail_ci_if_error: false docs: name: Documentation runs-on: ubuntu-latest + permissions: + actions: write + contents: write + statuses: write steps: - uses: actions/checkout@v4 - uses: julia-actions/setup-julia@v2 with: version: '1' - - run: | - julia --project=docs -e ' - using Pkg - Pkg.develop(PackageSpec(path=pwd())) - Pkg.instantiate()' - - run: | - julia --project=docs -e ' - using Documenter: DocMeta, doctest - using MsgPack - DocMeta.setdocmeta!(MsgPack, :DocTestSetup, :(using MsgPack); recursive=true) - doctest(MsgPack)' - - run: julia --project=docs docs/make.jl + - uses: julia-actions/cache@v2 + - name: Configure doc environment + shell: julia --project=docs --color=yes {0} + run: | + using Pkg + Pkg.develop(PackageSpec(path=pwd())) + Pkg.instantiate() + - uses: julia-actions/julia-docdeploy@v1 env: - JULIA_PKG_SERVER: "" GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }} + - name: Run doctests + shell: julia --project=docs --color=yes {0} + run: | + using Documenter: DocMeta, doctest + using MsgPack + DocMeta.setdocmeta!(MsgPack, :DocTestSetup, :(using MsgPack); recursive=true) + doctest(MsgPack) diff --git a/.github/workflows/CompatHelper.yml b/.github/workflows/CompatHelper.yml new file mode 100644 index 0000000..b20ecab --- /dev/null +++ b/.github/workflows/CompatHelper.yml @@ -0,0 +1,43 @@ +name: CompatHelper +on: + schedule: + - cron: 0 0 * * * + workflow_dispatch: +permissions: + contents: write + pull-requests: write +jobs: + CompatHelper: + runs-on: ubuntu-latest + steps: + - name: Check if Julia is already available in the PATH + id: julia_in_path + run: which julia + continue-on-error: true + - name: Install Julia, but only if it is not already available in the PATH + uses: julia-actions/setup-julia@v2 + with: + version: '1' + if: steps.julia_in_path.outcome != 'success' + - name: "Add the General registry via Git" + run: | + import Pkg + ENV["JULIA_PKG_SERVER"] = "" + Pkg.Registry.add("General") + shell: julia --color=yes {0} + - name: "Install CompatHelper" + run: | + import Pkg + name = "CompatHelper" + uuid = "aa819f21-2bde-4658-8897-bab36330d9b7" + version = "3" + Pkg.add(; name, uuid, version) + shell: julia --color=yes {0} + - name: "Run CompatHelper" + run: | + import CompatHelper + CompatHelper.main() + shell: julia --color=yes {0} + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + COMPATHELPER_PRIV: ${{ secrets.DOCUMENTER_KEY }} diff --git a/.github/workflows/TagBot.yml b/.github/workflows/TagBot.yml index d77d3a0..74d7d61 100644 --- a/.github/workflows/TagBot.yml +++ b/.github/workflows/TagBot.yml @@ -1,11 +1,31 @@ name: TagBot on: - schedule: - - cron: 0 * * * * + issue_comment: + types: + - created + workflow_dispatch: + inputs: + lookback: + default: '3' +permissions: + actions: read + checks: read + contents: write + deployments: read + issues: read + discussions: read + packages: read + pages: read + pull-requests: read + repository-projects: read + security-events: read + statuses: read jobs: TagBot: + if: github.event_name == 'workflow_dispatch' || github.actor == 'JuliaTagBot' runs-on: ubuntu-latest steps: - uses: JuliaRegistries/TagBot@v1 with: token: ${{ secrets.GITHUB_TOKEN }} + ssh: ${{ secrets.DOCUMENTER_KEY }} diff --git a/Project.toml b/Project.toml index 8e1a9ec..6060afe 100644 --- a/Project.toml +++ b/Project.toml @@ -9,7 +9,7 @@ Serialization = "9e88b42a-f829-5b0c-bbe9-9e923198166b" Aqua = "0.8" Serialization = "<0.0.1, 1" Test = "<0.0.1, 1" -julia = "1" +julia = "1.10" [extras] Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595" diff --git a/README.md b/README.md index 3644b33..b591822 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,8 @@ # MsgPack.jl +[![CI](https://github.com/JuliaIO/MsgPack.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/JuliaIO/MsgPack.jl/actions/workflows/CI.yml) +[![codecov](https://codecov.io/gh/JuliaIO/MsgPack.jl/branch/master/graph/badge.svg)](https://codecov.io/gh/JuliaIO/MsgPack.jl) + MsgPack.jl is a MessagePack implementation in pure Julia, inspired by [JSON3.jl](https://github.com/quinnj/JSON3.jl). This package supports: - (de)serialization of Julia values to/from MessagePack (see `pack` and `unpack`) diff --git a/docs/Project.toml b/docs/Project.toml index 113b106..671dc10 100644 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -1,6 +1,6 @@ [deps] Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" -MsgPack = "f3bf5812-ae52-11e9-182b-097c550858cf" +MsgPack = "99f44e22-a591-53d1-9472-aa23ef4bd671" [compat] -Documenter = "~0.22" +Documenter = "1"