Skip to content

Are the write tests here measuring write time correctly? #3

@charles-turner-1

Description

@charles-turner-1

Heya,

I've been trying to replicate some of these benchmarks on some real world data we have and I'm finding some pretty different results. I've forked and modified the bench code pretty heavily to reflect our use cases a bit more closely, but when I was doing that I noticed this snippet:

        if len(dimensions) == 1:
            t3 = time.perf_counter()
            dataset[:dimensions[0]] = data
        elif len(dimensions) == 2:
            t3 = time.perf_counter()
            dataset[:dimensions[0], :dimensions[1]] = data
        else:
            t3 = time.perf_counter()
            dataset[:dimensions[0], :dimensions[1], :dimensions[2]] = data
        t4 = time.perf_counter()

        # Add up the times taken to get the total time taken to create and write all datasets
        dataset_creation_time += (t2 - t1)
        dataset_population_time += (t4 - t3)

coming from

dataset[:dimensions[0]] = data

This looks to me like you're just measuring the time to write into a buffer, rather than actually write the files to disk? From a usage scenario, I'm pretty sure disk IO rather than filling a buffer will dominate write time, so I don't think this is necessarily benchmarking exactly what you were hoping to?

For example, our loads and writes look something like this:

Image

which is pretty different to the results you guys found - at least at first glance!

(Ignore the high outlier for the HDF5 read, I'm pretty sure thats related to FS block caching for importing the code used to read the files off disk).

Thanks so much for doing this work - it's something I'd never thought about much until I read the paper and it's definitely got me thinking about serialisation more deeply. I'm on my laptop right now, but when I get the chance I'll link my fork of the benchmarks too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions