Optimization related to compression, allowing multithreaded blosc2 to be used as a compression library by mkuehbach · Pull Request #747 · FAIRmat-NFDI/pynxtools

mkuehbach · 2026-03-17T13:02:34Z

Adding whats required to use blosc2, essentially making explicit what the inclusion of hdf5plugin and pandas already delivered
Write all non-scalar non-string datasets using chunked storage layout with h5py's autochunker active by default the main motivation behind this move is that it allows everybody to take advantage of Memory optimization NOMAD nexus parser #750 and Memory optimization PYNXTOOLS validate #752 chunked-by-chunk based processing
deflate, i.e. "gzip" is kept as the standard algo by default.

sherjeelshabih · 2026-03-18T09:46:40Z

Can you also add a small text in this PR even for now of what changes this introduces to the way the user interacts with this? I believe the writer.py now expects a "filter" key in the Template object. It will be nice to know the "user interface" changes in the PR.

Thanks for introducing this. I hope it makes it easier for the large datasets we run into.

mkuehbach · 2026-03-26T09:35:11Z

Can you also add a small text in this PR even for now of what changes this introduces to the way the user interacts with this? I believe the writer.py now expects a "filter" key in the Template object. It will be nice to know the "user interface" changes in the PR.

Thanks for introducing this. I hope it makes it easier for the large datasets we run into.

Need to check, thought that "filter" is not required per se, a default kicks in, but yeah i should document this.

… with h5py autochunking active by default, add documentation for blosc in learning section

…ry but blosc made available now by default

lukaspie

Looks good, but I am not an expert

RubelMozumder

LGTM!
Though I have a general question, how effectively will this multi-threading work in NOMAD? Where NOMAD launches NeXus parser using an asynchronous thread. I do not have any idea about it.

atomprobe-tc added 2 commits March 17, 2026 14:01

carried over from NXapm run-through

d4daf41

linting

32439fa

mkuehbach changed the title ~~carried over from NXapm run-through~~ Adding explicit support for blosc but keeping deflate the default Mar 17, 2026

atomprobe-tc added 2 commits March 17, 2026 14:31

invert logic

9d14e6c

linting

4b4d629

mkuehbach requested review from lukaspie and sherjeelshabih March 17, 2026 13:40

lint testing

b43a060

sherjeelshabih requested changes Mar 18, 2026

View reviewed changes

Comment thread src/pynxtools/dataconverter/writer.py Outdated

Comment thread src/pynxtools/dataconverter/writer.py Outdated

Comment thread src/pynxtools/dataconverter/writer.py

Comment thread src/pynxtools/dataconverter/chunk.py Outdated

mkuehbach mentioned this pull request Mar 26, 2026

Memory optimization NOMAD nexus parser #750

Open

14 tasks

atomprobe-tc added 2 commits March 26, 2026 10:36

Merge branch 'master' into add_blosc_but_keep_deflate_the_default

eb0fa47

fix tests for compression

031e5a7

mkuehbach changed the title ~~Adding explicit support for blosc but keeping deflate the default~~ Refactoring compression, adding support for blosc but keeping deflate the default Mar 26, 2026

Write all non-scalar non-string datasets using chunked storage layout…

d4360e6

… with h5py autochunking active by default, add documentation for blosc in learning section

mkuehbach changed the title ~~Refactoring compression, adding support for blosc but keeping deflate the default~~ Refactoring default storage layout to use chunked layout and adding support for optional blosc custom compression filter Mar 26, 2026

mkuehbach requested review from RubelMozumder and sherjeelshabih March 26, 2026 13:30

atomprobe-tc added 7 commits May 8, 2026 10:41

Merge branch 'master' into add_blosc_but_keep_deflate_the_default

78258a1

editing docs

0e8585f

remove optionality of blosc, gzip still the default compression libra…

8563fd3

…ry but blosc made available now by default

rm spurious whitespace

75218f1

remove misleading comment

2d85f6d

rm spurious whitespace

b2d271f

fix tests

2c69dd2

mkuehbach changed the title ~~Refactoring default storage layout to use chunked layout and adding support for optional blosc custom compression filter~~ Optimization related to compression, allowing multithreaded blosc2 to be used as a compression library May 8, 2026

lukaspie approved these changes May 8, 2026

View reviewed changes

Comment thread docs/learn/pynxtools/compression.md Outdated

RubelMozumder approved these changes May 8, 2026

View reviewed changes

Comment thread src/pynxtools/dataconverter/writer.py Outdated

Comment thread src/pynxtools/dataconverter/writer.py

Merge branch 'master' into add_blosc_but_keep_deflate_the_default

a4f4c0c

atomprobe-tc added 2 commits May 8, 2026 18:08

reviewer comments

3fe119a

final touches

1e88114

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization related to compression, allowing multithreaded blosc2 to be used as a compression library#747

Optimization related to compression, allowing multithreaded blosc2 to be used as a compression library#747
mkuehbach wants to merge 18 commits intomasterfrom
add_blosc_but_keep_deflate_the_default

mkuehbach commented Mar 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sherjeelshabih commented Mar 18, 2026

Uh oh!

mkuehbach commented Mar 26, 2026

Uh oh!

lukaspie left a comment

Uh oh!

Uh oh!

RubelMozumder left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mkuehbach commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sherjeelshabih commented Mar 18, 2026

Uh oh!

mkuehbach commented Mar 26, 2026

Uh oh!

lukaspie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RubelMozumder left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mkuehbach commented Mar 17, 2026 •

edited

Loading