Store first value in Dict directly in innerjoin#29
Open
non-Jedi wants to merge 1 commit into
Open
Conversation
This avoids allocating a Vector for the case where l does not have multiple indices with the same value. For the smoke-test benchmark in <JuliaData/DataFrames.jl#2340 (comment)>, this reduces allocations by half and overall runtime by 10%.
Member
|
Ah thanks. Another strategy I tried a while back was to first assume the join keys are distinct and then bail to a more general implementation when that's not the case. It would be awesome to compare the performance vesus this. (Sorry @non-Jedi I haven't been keeping track of PRs...) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I saw some of the discussion in
JuliaData/DataFrames.jl#2340 and got
curious about what was possible.
This avoids allocating a Vector for the case where
ldoes not havemultiple indices with the same value. For the smoke-test benchmark in
JuliaData/DataFrames.jl#2340 (comment),
this reduces allocations by half and overall runtime by 10%.
Most of the allocations still come from this
line
which it's much less clear how to reduce allocations in. I'm not sure
how much JuliaLang/julia#24909 affects
performance in this case. One option would be to heuristically
estimate the size of
outbased on the size oflandrand callsizehint!on it; this didn't seem to help in my testing.I realize I'm only optimizing a single method of
innerjoin, but I'mnot super familiar with this field nor with the inner workings of this
package, so I leave it to you to decide if this is a worthwhile
complication and if it's relevant elsewhere in the package.