Refactor std::set<T*> to llvm::SmallPtrSet<T*, N> based on benchmark profile data by m-atalla · Pull Request #27 · lac-dcc/Daedalus

m-atalla · 2025-04-25T04:12:15Z

Hi, sorry this one took longer than expected.

This PR includes changes addressing #16, I used the SQLite 3 amalgamation as a benchmark while working on this refactor. After identifying all std::set<T*> variables I created a macro to log the size of each one during execution. I then analyzed this data with an ad-hoc simple python script.

To determine a representative set size N, I excluded sets with 0 or 1 samples, as these are probably very specific to SQLite, same goes to very large sets as the majority of sets are small. I then computed the 80th percentile using np.percentile to find the size value below which 80% of the observed set sizes fall. This gives a reasonable cutoff that captures most realistic cases while avoiding skew from larger outliers.

Finally this translates to a 7.63% runtime speedup of Daedalus over the SQLite benchmark, here is the average runtime of this PR vs. the main branch over 10 runs:

	main	This PR
Average Time(s)	10.32	9.54

m-atalla · 2025-04-25T04:19:45Z

I have but one concern about this, that my evaluation is too biased to SQLite... I wanted to further test this change over llvm-test-suite however I failed to make the pass run at all, which is what took me so long to send this PR... I followed @Casperento article on building the test suite for an arbitrary pass btw. I just had to cut my losses with this one 😓

Casperento · 2025-04-26T13:50:42Z

I have but one concern about this, that my evaluation is too biased to SQLite... I wanted to further test this change over llvm-test-suite however I failed to make the pass run at all, which is what took me so long to send this PR... I followed @Casperento article on building the test suite for an arbitrary pass btw. I just had to cut my losses with this one 😓

Thank you for your contribution. I will review it for approval as soon as possible.

What issues did you encounter when trying to reproduce the article? Could you leave me some feedback?

Just so you know, we are using the following fork of the test suite with the patch from the article applied: https://github.com/Casperento/llvm-test-suite/tree/daedalus-crit

When reviewing your PR, I will run your branch on the test suite, so don't worry about that part for now.

m-atalla · 2025-04-26T14:15:07Z

Thank you for your contribution. I will review it for approval as soon as possible.

No worries, take your time.

What issues did you encounter when trying to reproduce the article? Could you leave me some feedback?

Just so you know, we are using the following fork of the test suite with the patch from the article applied: https://github.com/Casperento/llvm-test-suite/tree/daedalus-crit

When reviewing your PR, I will run your branch on the test suite, so don't worry about that part for now.

I couldn't get the test suite to run Daedalus, I'm pretty sure I was providing the correct path in my cmake configs. I confirmed this by building the test suite with -DTEST_SUITE_COLLECT_STATS=On which collected statistics from all passes except Daedalus in the test results json.

Casperento · 2025-04-26T15:57:04Z

Thank you for your contribution. I will review it for approval as soon as possible.

No worries, take your time.

What issues did you encounter when trying to reproduce the article? Could you leave me some feedback?
Just so you know, we are using the following fork of the test suite with the patch from the article applied: https://github.com/Casperento/llvm-test-suite/tree/daedalus-crit
When reviewing your PR, I will run your branch on the test suite, so don't worry about that part for now.

I couldn't get the test suite to run Daedalus, I'm pretty sure I was providing the correct path in my cmake configs. I confirmed this by building the test suite with -DTEST_SUITE_COLLECT_STATS=On which collected statistics from all passes except Daedalus in the test results json.

Oh, sure. For now, the stats from our pass are not parsable by any script in the test suite. We are currently only collecting instcount and size..text metrics and checking for test correctness.

The current stats are only generated for our own scripts located in artifact/utils.

m-atalla added 25 commits February 6, 2025 20:50

add program slice dot export flag

908e504

update function slices set to dot param

fcb0890

clean-up: clang-format changes

0985bbb

clean-up: remove redundant include

3fe0aa0

clean-up: remove redundant include

907b88b

add graphviz dot files to gitignore

764e97c

modify dump dot to generate dot files in a source-named directory

956adbb

Merge branch 'main' into main

8db6977

modify dump dot directory to use Module ID instead of source name

d79aef8

add visualizing slice dot instructions

1429dcb

Merge branch 'lac-dcc:main' into main

a1ec626

Merge branch 'lac-dcc:main' into main

013b758

add std::set instrumentation logging code

0ec7aa1

Merge remote-tracking branch 'refs/remotes/origin/main'

055084e

Merge branch 'main' into refactor-llvm-ptr-set

9331d93

refactor SmallPtrSet where std::set is constantly small on sqlite3

07f2efe

refactor commonly small std::set to SmallPtrSet

1f34969

refactor instsInSlice to SmallPtrSet

b6e4073

add timeSpent statistic

0721578

Merge remote-tracking branch 'refs/remotes/origin/main'

3b67c7b

Merge branch 'main' into refactor-llvm-ptr-set

abe26f3

remove set sizes debug statements

dac511c

update declarations to match the SmallPtrSet refactor

3dae1de

remove mistakenly committed local test runner changes

743449e

fix isSelfContained function signature

2a2e512

m-atalla added 2 commits April 25, 2025 07:22

clean up - time spent in pass statistic

c4512bb

fix spelling typo

be5db96

m-atalla changed the title ~~Refactor llvm ptr set~~ Refactor std::set<T*> to llvm::SmallPtrSet<T*, N> set based on benchmark profile data Apr 25, 2025

m-atalla changed the title ~~Refactor std::set<T*> to llvm::SmallPtrSet<T*, N> set based on benchmark profile data~~ Refactor std::set<T*> to llvm::SmallPtrSet<T*, N> based on benchmark profile data Apr 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor std::set<T> to llvm::SmallPtrSet<T, N> based on benchmark profile data#27

Refactor std::set<T> to llvm::SmallPtrSet<T, N> based on benchmark profile data#27
m-atalla wants to merge 27 commits intolac-dcc:mainfrom
m-atalla:refactor-llvm-ptr-set

m-atalla commented Apr 25, 2025 •

edited

Loading

Uh oh!

m-atalla commented Apr 25, 2025 •

edited

Loading

Uh oh!

Casperento commented Apr 26, 2025

Uh oh!

m-atalla commented Apr 26, 2025

Uh oh!

Casperento commented Apr 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

m-atalla commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-atalla commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Casperento commented Apr 26, 2025

Uh oh!

m-atalla commented Apr 26, 2025

Uh oh!

Casperento commented Apr 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

m-atalla commented Apr 25, 2025 •

edited

Loading

m-atalla commented Apr 25, 2025 •

edited

Loading