suboptimal small matrix multiplication benchmark

Benchmark script:

```julia
const n = parse(Int, ARGS[1])
const samples = parse(Int, ARGS[2])
const evals = parse(Int, ARGS[3])

@show n
@show samples
@show evals

using BenchmarkTools, FixedSizeArrays

@btime x * y * z seconds=Inf samples=samples evals=evals setup=(x = FixedSizeArray(rand(Float32, n, n)); y = FixedSizeArray(rand(Float32, n, n)); z = FixedSizeArray(rand(Float32, n, n)););
@btime x * y * z seconds=Inf samples=samples evals=evals setup=(x = rand(Float32, n, n); y = rand(Float32, n, n); z = rand(Float32, n, n););
```

My results for n from `0:9`:

```none
n = 0
samples = 20000
evals = 20
  45.050 ns (0 allocations: 0 bytes)
  40.050 ns (2 allocations: 96 bytes)
n = 1
samples = 20000
evals = 20
  317.100 ns (2 allocations: 64 bytes)
  300.050 ns (4 allocations: 160 bytes)
n = 2
samples = 20000
evals = 20
  132.250 ns (2 allocations: 96 bytes)
  99.150 ns (4 allocations: 192 bytes)
n = 3
samples = 20000
evals = 20
  131.200 ns (2 allocations: 128 bytes)
  118.700 ns (4 allocations: 224 bytes)
n = 4
samples = 20000
evals = 20
  360.200 ns (2 allocations: 192 bytes)
  353.650 ns (4 allocations: 288 bytes)
n = 5
samples = 20000
evals = 20
  435.800 ns (2 allocations: 256 bytes)
  417.300 ns (4 allocations: 352 bytes)
n = 6
samples = 20000
evals = 20
  499.950 ns (2 allocations: 352 bytes)
  463.850 ns (4 allocations: 448 bytes)
n = 7
samples = 20000
evals = 20
  565.550 ns (2 allocations: 448 bytes)
  557.550 ns (4 allocations: 544 bytes)
n = 8
samples = 20000
evals = 20
  516.450 ns (2 allocations: 576 bytes)
  500.900 ns (4 allocations: 672 bytes)
n = 9
samples = 20000
evals = 20
  633.700 ns (2 allocations: 736 bytes)
  604.650 ns (4 allocations: 832 bytes)
```

Lots of weird stuff here (why is the `n == 1` case so slow?), but the takeaway is that FSA is slower than `Array` even though FSA allocates less.

Of course, the heavy lifting here is supposed to depend on BLAS, not on Julia code, so the question is, where does the difference come from in the first place.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

suboptimal small matrix multiplication benchmark #183

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

suboptimal small matrix multiplication benchmark #183

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions