-
Notifications
You must be signed in to change notification settings - Fork 3
Add benchmarks to vignette #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
topepo
wants to merge
3
commits into
main
Choose a base branch
from
benchmarks
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely not what I'd expect from H2O.
Possible reason could be the usage of hyper-threading. I'm not sure if all but at least some Intel processors have one floating point unit per core and with HT you can have 2 parallel threads on one core which means that in floating point intensive workload the threads will have to wait for each other.
Another possible reason related to HT is security fixes - couple years ago there were several security issues related to HT (Meltdown, Spectre,...) and one of the mitigation techniques was to disable HT altogether so I assume it might take some performance hit on some workloads.
For both cases it could help to use
nthreads=10.Another reason could be cache utilization - the more threads, the more cache invalidations => more time spent on waiting on memory.
It can also be related to how we split the data, if the dataset is small, it could very well make the training slower with higher parallelism (more time spent on communication and synchronization). Trying with bigger data could make the more parallel version perform better in this case.
Anyway, now I'm curious about it so I'll try to run the benchmark with a profiler. If I find some reason I will mention it here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not find the definition of
sim_classificationfunction but if it takes as its first argument number of rows to generate, I would say the reason for slower parallel run is really just small data (10k rows).With 10-fold cv it will use 9k rows to train the model and if we have 20 threads we will process 450 rows per thread so the time spent on communication/synchronization might be significant when compared to the computation time and cause this behavior.
@ledell do we have any recommendation about how many threads should we use based on the dataset size? I know we have some heuristic for GLM (
nodes = rows*columns^2/(nthreads*1e8)) but I don't know if we have something like that in general.My general recommendation would be to use the H2O parallelism (using
nthreadsand/or h2o cluster) for bigger data (but I don't know where the threshold is).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomasfryda Recommendation is to use default
nthreadsexcept if strong reason not to do so (e.g. running on laptop and don't want h2o to use all my cpus). The heuristic that you mentioned is to optimize the number ofnodesas they need to be defined beforehand, but once a node is started, it's up to H2O to optimize its behaviour according to the number of threads/cpus available, not the other way round.