Skip to content

Refactor std::set<T*> to llvm::SmallPtrSet<T*, N> based on benchmark profile data#27

Open
m-atalla wants to merge 27 commits intolac-dcc:mainfrom
m-atalla:refactor-llvm-ptr-set
Open

Refactor std::set<T*> to llvm::SmallPtrSet<T*, N> based on benchmark profile data#27
m-atalla wants to merge 27 commits intolac-dcc:mainfrom
m-atalla:refactor-llvm-ptr-set

Conversation

@m-atalla
Copy link
Copy Markdown
Contributor

@m-atalla m-atalla commented Apr 25, 2025

Hi, sorry this one took longer than expected.

This PR includes changes addressing #16, I used the SQLite 3 amalgamation as a benchmark while working on this refactor. After identifying all std::set<T*> variables I created a macro to log the size of each one during execution. I then analyzed this data with an ad-hoc simple python script.

To determine a representative set size N, I excluded sets with 0 or 1 samples, as these are probably very specific to SQLite, same goes to very large sets as the majority of sets are small. I then computed the 80th percentile using np.percentile to find the size value below which 80% of the observed set sizes fall. This gives a reasonable cutoff that captures most realistic cases while avoiding skew from larger outliers.

Finally this translates to a 7.63% runtime speedup of Daedalus over the SQLite benchmark, here is the average runtime of this PR vs. the main branch over 10 runs:

main This PR
Average Time(s) 10.32 9.54

@m-atalla
Copy link
Copy Markdown
Contributor Author

m-atalla commented Apr 25, 2025

I have but one concern about this, that my evaluation is too biased to SQLite... I wanted to further test this change over llvm-test-suite however I failed to make the pass run at all, which is what took me so long to send this PR... I followed @Casperento article on building the test suite for an arbitrary pass btw. I just had to cut my losses with this one 😓

@m-atalla m-atalla changed the title Refactor llvm ptr set Refactor std::set<T*> to llvm::SmallPtrSet<T*, N> set based on benchmark profile data Apr 25, 2025
@m-atalla m-atalla changed the title Refactor std::set<T*> to llvm::SmallPtrSet<T*, N> set based on benchmark profile data Refactor std::set<T*> to llvm::SmallPtrSet<T*, N> based on benchmark profile data Apr 25, 2025
@Casperento
Copy link
Copy Markdown
Collaborator

I have but one concern about this, that my evaluation is too biased to SQLite... I wanted to further test this change over llvm-test-suite however I failed to make the pass run at all, which is what took me so long to send this PR... I followed @Casperento article on building the test suite for an arbitrary pass btw. I just had to cut my losses with this one 😓

Thank you for your contribution. I will review it for approval as soon as possible.

What issues did you encounter when trying to reproduce the article? Could you leave me some feedback?

Just so you know, we are using the following fork of the test suite with the patch from the article applied: https://github.com/Casperento/llvm-test-suite/tree/daedalus-crit

When reviewing your PR, I will run your branch on the test suite, so don't worry about that part for now.

@m-atalla
Copy link
Copy Markdown
Contributor Author

Thank you for your contribution. I will review it for approval as soon as possible.

No worries, take your time.

What issues did you encounter when trying to reproduce the article? Could you leave me some feedback?

Just so you know, we are using the following fork of the test suite with the patch from the article applied: https://github.com/Casperento/llvm-test-suite/tree/daedalus-crit

When reviewing your PR, I will run your branch on the test suite, so don't worry about that part for now.

I couldn't get the test suite to run Daedalus, I'm pretty sure I was providing the correct path in my cmake configs. I confirmed this by building the test suite with -DTEST_SUITE_COLLECT_STATS=On which collected statistics from all passes except Daedalus in the test results json.

@Casperento
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. I will review it for approval as soon as possible.

No worries, take your time.

What issues did you encounter when trying to reproduce the article? Could you leave me some feedback?
Just so you know, we are using the following fork of the test suite with the patch from the article applied: https://github.com/Casperento/llvm-test-suite/tree/daedalus-crit
When reviewing your PR, I will run your branch on the test suite, so don't worry about that part for now.

I couldn't get the test suite to run Daedalus, I'm pretty sure I was providing the correct path in my cmake configs. I confirmed this by building the test suite with -DTEST_SUITE_COLLECT_STATS=On which collected statistics from all passes except Daedalus in the test results json.

Oh, sure. For now, the stats from our pass are not parsable by any script in the test suite. We are currently only collecting instcount and size..text metrics and checking for test correctness.

The current stats are only generated for our own scripts located in artifact/utils.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants