Refactor std::set<T*> to llvm::SmallPtrSet<T*, N> based on benchmark profile data#27
Refactor std::set<T*> to llvm::SmallPtrSet<T*, N> based on benchmark profile data#27m-atalla wants to merge 27 commits intolac-dcc:mainfrom
Conversation
|
I have but one concern about this, that my evaluation is too biased to SQLite... I wanted to further test this change over llvm-test-suite however I failed to make the pass run at all, which is what took me so long to send this PR... I followed @Casperento article on building the test suite for an arbitrary pass btw. I just had to cut my losses with this one 😓 |
Thank you for your contribution. I will review it for approval as soon as possible. What issues did you encounter when trying to reproduce the article? Could you leave me some feedback? Just so you know, we are using the following fork of the test suite with the patch from the article applied: https://github.com/Casperento/llvm-test-suite/tree/daedalus-crit When reviewing your PR, I will run your branch on the test suite, so don't worry about that part for now. |
No worries, take your time.
I couldn't get the test suite to run Daedalus, I'm pretty sure I was providing the correct path in my cmake configs. I confirmed this by building the test suite with |
Oh, sure. For now, the stats from our pass are not parsable by any script in the test suite. We are currently only collecting The current stats are only generated for our own scripts located in |
Hi, sorry this one took longer than expected.
This PR includes changes addressing #16, I used the SQLite 3 amalgamation as a benchmark while working on this refactor. After identifying all
std::set<T*>variables I created a macro to log the size of each one during execution. I then analyzed this data with an ad-hoc simple python script.To determine a representative set size N, I excluded sets with 0 or 1 samples, as these are probably very specific to SQLite, same goes to very large sets as the majority of sets are small. I then computed the 80th percentile using
np.percentileto find the size value below which 80% of the observed set sizes fall. This gives a reasonable cutoff that captures most realistic cases while avoiding skew from larger outliers.Finally this translates to a 7.63% runtime speedup of Daedalus over the SQLite benchmark, here is the average runtime of this PR vs. the main branch over 10 runs: