Skip to content

suboptimal small matrix multiplication benchmark #183

@nsajko

Description

@nsajko

Benchmark script:

const n = parse(Int, ARGS[1])
const samples = parse(Int, ARGS[2])
const evals = parse(Int, ARGS[3])

@show n
@show samples
@show evals

using BenchmarkTools, FixedSizeArrays

@btime x * y * z seconds=Inf samples=samples evals=evals setup=(x = FixedSizeArray(rand(Float32, n, n)); y = FixedSizeArray(rand(Float32, n, n)); z = FixedSizeArray(rand(Float32, n, n)););
@btime x * y * z seconds=Inf samples=samples evals=evals setup=(x = rand(Float32, n, n); y = rand(Float32, n, n); z = rand(Float32, n, n););

My results for n from 0:9:

n = 0
samples = 20000
evals = 20
  45.050 ns (0 allocations: 0 bytes)
  40.050 ns (2 allocations: 96 bytes)
n = 1
samples = 20000
evals = 20
  317.100 ns (2 allocations: 64 bytes)
  300.050 ns (4 allocations: 160 bytes)
n = 2
samples = 20000
evals = 20
  132.250 ns (2 allocations: 96 bytes)
  99.150 ns (4 allocations: 192 bytes)
n = 3
samples = 20000
evals = 20
  131.200 ns (2 allocations: 128 bytes)
  118.700 ns (4 allocations: 224 bytes)
n = 4
samples = 20000
evals = 20
  360.200 ns (2 allocations: 192 bytes)
  353.650 ns (4 allocations: 288 bytes)
n = 5
samples = 20000
evals = 20
  435.800 ns (2 allocations: 256 bytes)
  417.300 ns (4 allocations: 352 bytes)
n = 6
samples = 20000
evals = 20
  499.950 ns (2 allocations: 352 bytes)
  463.850 ns (4 allocations: 448 bytes)
n = 7
samples = 20000
evals = 20
  565.550 ns (2 allocations: 448 bytes)
  557.550 ns (4 allocations: 544 bytes)
n = 8
samples = 20000
evals = 20
  516.450 ns (2 allocations: 576 bytes)
  500.900 ns (4 allocations: 672 bytes)
n = 9
samples = 20000
evals = 20
  633.700 ns (2 allocations: 736 bytes)
  604.650 ns (4 allocations: 832 bytes)

Lots of weird stuff here (why is the n == 1 case so slow?), but the takeaway is that FSA is slower than Array even though FSA allocates less.

Of course, the heavy lifting here is supposed to depend on BLAS, not on Julia code, so the question is, where does the difference come from in the first place.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions