Add workflow to test the performance of a package#114
Conversation
d18b430 to
889306f
Compare
FranzBusch
left a comment
There was a problem hiding this comment.
I’m curious to understand the direction here more. In the server ecosystem we are using the package-benchmark package to write performance tests for our packages. Packages like foundation have also started adopting it. Could we use that instead of the custom scripts here?
|
Do you have an example of a package using the |
|
Yes you can see it here: https://github.com/apple/swift-certificates/tree/main/Benchmarks. This packages only uses allocation metrics but another good metrics are the cpu instructions since those are stable on CI. I would really encourage us to adopt this approach instead of hand rolling performance scripts |
This adds the infrastructure to take performance measurements for PRs by taking a performance measurement of the project before the changes in the PR. This will become really useful once we can switch this to a macOS runner and are then able to measure instructions executed by an executable (like swift-format supports using `swift-format --measure-instructions`) – instruction counting isn’t available on Linux as far as I could find out. But even now, we can use this to track other metrics like binary size.
889306f to
b972a0e
Compare
|
Great suggestion with using the benchmarks package. I have updated the workflow to be based on in. Example run can be found here: ahoppen/swift-format#10 (comment) |
FranzBusch
left a comment
There was a problem hiding this comment.
Great stuff. I would like @rnro to take a look at this as well. We had some plans to upstream the existing benchmark https://github.com/apple/swift-nio/blob/main/.github/workflows/benchmarks.yml workflow from NIO here. That workflow runs benchmarks against multiple Swift versions which we found very helpful since it allowed us to catch compiler regressions/improvements quite fast.
|
I think one thing that’s worth mentioning here is that this workflow does not use a fixed baseline, which makes it very easy to handle because it means that we don’t need to ensure that performance testing always runs on exactly the same kind of machine – we don’t even need to guarantee that the Swift version stays constant. It means that we can’t use it to test for compiler regressions (because they would necessarily run in different containers / on different nodes). That sort of testing has different trade-offs, so if we want that, we should add a different kind of workflow for it, I think. |
|
@ahoppen for the record, cpu instructions is supported on linux, but you must have permissions to use the perf_event_open call, if you modify your local package-benchmark copy and/or make a fork and uncomment this line in the package: you probably will get a good hint. Likely no permissions. |
|
Or try running "perf" and see if that works. I.e. |
|
Thanks for chiming in, @hassila. I think the problem is that we will be running the performance tests inside a Docker container in GitHub actions and my understand is that you don’t have access to perf_event_open in these setups. If you know better, I would love to hear about it. |
Oh, I misread the description that instruction count was not available from the benchmark package, but you probably meant in docker. Quick google gave this as a workaround: Also ping @freef4ll - maybe you know for sure. |
|
I think I tried something like this without success, I think GitHub Actions doesn’t allow these extended privileges. But maybe I’ll give it another try when I find the time. |
|
Unfortunately it does not look like GitHub public runners expose The VMs are running in Azure under Hyper-V and while it appears that PMU events are supported, it is not enabled. |
|
Unfortunate that GitHub Action runners don’t support instruction counters by default but thanks a lot for verifying my analysis, @freef4ll! |
|
This looks great! I think it serves a different purpose to the use-case we have in the swift-server repos where we want to compare against a fix baseline so that we can check not only PRs for regressions but also timed runs. I think having both is a good thing. |
FranzBusch
left a comment
There was a problem hiding this comment.
LGTM. Just two minor comments
| else | ||
| echo "has_significant_changes=false" >> "$GITHUB_OUTPUT" | ||
| fi | ||
| - name: Install gh |
There was a problem hiding this comment.
Wondering if we would be better off just moving this into a separate job since gh is installed on the runners by default
There was a problem hiding this comment.
I tried this before but if you run it as a separate job, that shows up as a separate item in the pull request status, which is confusing.
| apt update | ||
| apt install gh -y | ||
| - name: Post comment | ||
| if: ${{ steps.compare_performance.outputs.has_significant_changes == 'true' }} |
There was a problem hiding this comment.
Should we unconditionally post this? Otherwise it might seem like nothing happened.
There was a problem hiding this comment.
I don’t think so. Most PRs don’t modify the performance of a package and since we run this unconditionally, I think posting a comment saying that the performance stayed constant just creates noise and users might get used to them and then not actually notice if a comment highlights a real performance change. If you want to double-check that it ran, you can open the GitHub action log.
This adds the infrastructure to take performance measurements for PRs by taking a performance measurement of the project before the changes in the PR.
This will become really useful once we can switch this to a macOS runner and are then able to measure instructions executed by an executable (like swift-format supports using
swift-format --measure-instructions) – instruction counting isn’t available on Linux as far as I could find out.But even now, we can use this to track other metrics like binary size.
Example run here (ignore the actual measurement, it’s bogus) ahoppen/swift-format#8 (comment)