pandas out_flavor for ctable by ARF1 · Pull Request #184 · Blosc/bcolz

ARF1 · 2015-05-03T14:25:35Z

Closes #176.
Simplifies implementation of #66.

Summary:

introduction of an abstraction layer for the "results array"
implementation of a numpy specialisation of the abstraction layer
implementation of a pandas specialisation of the abstraction layer

This is a quick hack to demonstrate the possible performance gains by using a output flavor with column major ordering, here: the pandas dataframe.

The architecture would need to be improved upon since this implementation suffers a x3-4 performance penalty for db[1] -type queries due to increased python overhead. For queries returning a larger number of rows this penalty disappears.

Timing results in #176.

* introduction of an abstraction layer for the "output array" * implementation of an numpy specialisation of the abstraction layer * implementation of a pandas specialisation of the abstraction layer

FrancescAlted · 2015-05-05T17:17:26Z

Would you mind to add some benchmarks in the 'bench/' directory showing the advantage of this approach? My idea is to setup a speed regression check based on different benchmarks there.
Thanks!

ARF1 · 2015-05-05T17:59:53Z

@FrancescAlted

Would you mind to add some benchmarks in the 'bench/' directory showing the advantage of this approach?

I would be happy to. I just need to clarify what you are looking for:

This PR (pandas out_flavor) was only intended as a proof-of-concept, it was not really intended for inclusion in the code-base. The architecture of the more general #187 (abstraction layer) is more performant (and easier to read).

Would you like me to provide a sample implementation of a pandas "out_flavor" for the new #187 (abstraction layer) instead and a benchmark for that? I.e. with a benchmark in analogy to bench\getitem.py.

Or would you like a "rawer" benchmark, avoiding __getitem__() (and its overhead) showing only the best possible performance for filling a pandas dataframe? Sort of like bench\pandas-todataframe.py does?

ARF1 · 2015-05-05T20:56:35Z

@FrancescAlted On reflection, I probably was not as clear as I could have been: when you speak of "this approach", do you mean

the column-major (vs. row-major) result array in isolation or
the abstraction layer (in whatever version) plus the pandas out-flavor implementation (vs. the current non-abstracted out flavor)?

esc · 2015-05-23T04:08:23Z

What do you want us to do with the pull-request?

ARF1 mentioned this pull request May 3, 2015

Pandas out_flavor for better ctable performance #176

Closed

pandas out_flavor for ctable

5766048

* introduction of an abstraction layer for the "output array" * implementation of an numpy specialisation of the abstraction layer * implementation of a pandas specialisation of the abstraction layer

ARF1 force-pushed the pandas_out_flavor branch from 1534fc4 to 5766048 Compare May 5, 2015 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas out_flavor for ctable#184

pandas out_flavor for ctable#184
ARF1 wants to merge 1 commit intoBlosc:masterfrom
ARF1:pandas_out_flavor

ARF1 commented May 3, 2015

Uh oh!

FrancescAlted commented May 5, 2015

Uh oh!

ARF1 commented May 5, 2015

Uh oh!

ARF1 commented May 5, 2015

Uh oh!

esc commented May 23, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ARF1 commented May 3, 2015

Uh oh!

FrancescAlted commented May 5, 2015

Uh oh!

ARF1 commented May 5, 2015

Uh oh!

ARF1 commented May 5, 2015

Uh oh!

esc commented May 23, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants