Skip to content

Commit 3cd527b

Browse files
committed
Add benchmark results
1 parent 88e14be commit 3cd527b

2 files changed

Lines changed: 205 additions & 1 deletion

File tree

docs/design/04-candumpr-architecture.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ troubleshooting enabler. I'm after a solution with the minimal system performanc
3131

3232
I want a long-running daemon to log all CAN traffic to facilitate future field issues. That means:
3333

34-
* Address claim PGN requests
34+
* Address claim PGN requests upon startup and rotation
3535
* Log rotation policy
3636
* Log retention policy
3737
* Configuration
@@ -354,6 +354,10 @@ collapses suddenly (large loss %).
354354
branch predictor behavior differ on the target ARM64 platform. Use these results for relative
355355
comparison between backends, not as absolute predictions.
356356

357+
It's technically possible to measure syscalls per thread with `perf_event_open` to setup a counter
358+
for the `raw_syscalls:sys_enter` tracepoint using the `perf-event` crate, but this doesn't work well
359+
inside the unshare user namespace without additional orchestration externally.
360+
357361
## Open questions
358362

359363
TODO

docs/design/06-benchmarks.md

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
# Benchmark results
2+
3+
The [04-candumpr-architecture.md](/docs/design/04-candumpr-architecture.md) design document proposes
4+
three different benchmarks to compare receiver backends.
5+
6+
# Benchmark A - pure CPU cost
7+
8+
```
9+
recv_cost::recv_cost::dedicated run:setup_blocking()
10+
Instructions: 454678|N/A (*********)
11+
L1 Hits: 866202|N/A (*********)
12+
LL Hits: 10504|N/A (*********)
13+
RAM Hits: 175|N/A (*********)
14+
Total read+write: 876881|N/A (*********)
15+
Estimated Cycles: 924847|N/A (*********)
16+
recv_cost::recv_cost::epoll run:setup_nonblocking()
17+
Instructions: 519312|N/A (*********)
18+
L1 Hits: 960184|N/A (*********)
19+
LL Hits: 10182|N/A (*********)
20+
RAM Hits: 53|N/A (*********)
21+
Total read+write: 970419|N/A (*********)
22+
Estimated Cycles: 1012949|N/A (*********)
23+
Comparison with dedicated run:setup_blocking()
24+
Instructions: 454678|519312 (-12.4461%) [-1.14215x]
25+
L1 Hits: 866202|960184 (-9.78792%) [-1.10850x]
26+
LL Hits: 10504|10182 (+3.16244%) [+1.03162x]
27+
RAM Hits: 175|53 (+230.189%) [+3.30189x]
28+
Total read+write: 876881|970419 (-9.63893%) [-1.10667x]
29+
Estimated Cycles: 924847|1012949 (-8.69758%) [-1.09526x]
30+
recv_cost::recv_cost::recvmmsg run:setup_nonblocking()
31+
Instructions: 468571|N/A (*********)
32+
L1 Hits: 882905|N/A (*********)
33+
LL Hits: 10191|N/A (*********)
34+
RAM Hits: 57|N/A (*********)
35+
Total read+write: 893153|N/A (*********)
36+
Estimated Cycles: 935855|N/A (*********)
37+
Comparison with dedicated run:setup_blocking()
38+
Instructions: 454678|468571 (-2.96497%) [-1.03056x]
39+
L1 Hits: 866202|882905 (-1.89182%) [-1.01928x]
40+
LL Hits: 10504|10191 (+3.07134%) [+1.03071x]
41+
RAM Hits: 175|57 (+207.018%) [+3.07018x]
42+
Total read+write: 876881|893153 (-1.82186%) [-1.01856x]
43+
Estimated Cycles: 924847|935855 (-1.17625%) [-1.01190x]
44+
Comparison with epoll run:setup_nonblocking()
45+
Instructions: 519312|468571 (+10.8289%) [+1.10829x]
46+
L1 Hits: 960184|882905 (+8.75281%) [+1.08753x]
47+
LL Hits: 10182|10191 (-0.08831%) [-1.00088x]
48+
RAM Hits: 53|57 (-7.01754%) [-1.07547x]
49+
Total read+write: 970419|893153 (+8.65093%) [+1.08651x]
50+
Estimated Cycles: 1012949|935855 (+8.23781%) [+1.08238x]
51+
recv_cost::recv_cost::uring run:setup_nonblocking()
52+
Instructions: 587770|N/A (*********)
53+
L1 Hits: 1071803|N/A (*********)
54+
LL Hits: 10210|N/A (*********)
55+
RAM Hits: 119|N/A (*********)
56+
Total read+write: 1082132|N/A (*********)
57+
Estimated Cycles: 1127018|N/A (*********)
58+
Comparison with dedicated run:setup_blocking()
59+
Instructions: 454678|587770 (-22.6436%) [-1.29272x]
60+
L1 Hits: 866202|1071803 (-19.1827%) [-1.23736x]
61+
LL Hits: 10504|10210 (+2.87953%) [+1.02880x]
62+
RAM Hits: 175|119 (+47.0588%) [+1.47059x]
63+
Total read+write: 876881|1082132 (-18.9673%) [-1.23407x]
64+
Estimated Cycles: 924847|1127018 (-17.9386%) [-1.21860x]
65+
Comparison with epoll run:setup_nonblocking()
66+
Instructions: 519312|587770 (-11.6471%) [-1.13182x]
67+
L1 Hits: 960184|1071803 (-10.4141%) [-1.11625x]
68+
LL Hits: 10182|10210 (-0.27424%) [-1.00275x]
69+
RAM Hits: 53|119 (-55.4622%) [-2.24528x]
70+
Total read+write: 970419|1082132 (-10.3234%) [-1.11512x]
71+
Estimated Cycles: 1012949|1127018 (-10.1213%) [-1.11261x]
72+
Comparison with recvmmsg run:setup_nonblocking()
73+
Instructions: 468571|587770 (-20.2799%) [-1.25439x]
74+
L1 Hits: 882905|1071803 (-17.6243%) [-1.21395x]
75+
LL Hits: 10191|10210 (-0.18609%) [-1.00186x]
76+
RAM Hits: 57|119 (-52.1008%) [-2.08772x]
77+
Total read+write: 893153|1082132 (-17.4636%) [-1.21159x]
78+
Estimated Cycles: 935855|1127018 (-16.9618%) [-1.20427x]
79+
recv_cost::recv_cost::uring_multi run:setup_nonblocking()
80+
Instructions: 628114|N/A (*********)
81+
L1 Hits: 1145140|N/A (*********)
82+
LL Hits: 11463|N/A (*********)
83+
RAM Hits: 168|N/A (*********)
84+
Total read+write: 1156771|N/A (*********)
85+
Estimated Cycles: 1208335|N/A (*********)
86+
Comparison with dedicated run:setup_blocking()
87+
Instructions: 454678|628114 (-27.6122%) [-1.38145x]
88+
L1 Hits: 866202|1145140 (-24.3584%) [-1.32202x]
89+
LL Hits: 10504|11463 (-8.36605%) [-1.09130x]
90+
RAM Hits: 175|168 (+4.16667%) [+1.04167x]
91+
Total read+write: 876881|1156771 (-24.1958%) [-1.31919x]
92+
Estimated Cycles: 924847|1208335 (-23.4610%) [-1.30652x]
93+
Comparison with epoll run:setup_nonblocking()
94+
Instructions: 519312|628114 (-17.3220%) [-1.20951x]
95+
L1 Hits: 960184|1145140 (-16.1514%) [-1.19263x]
96+
LL Hits: 10182|11463 (-11.1751%) [-1.12581x]
97+
RAM Hits: 53|168 (-68.4524%) [-3.16981x]
98+
Total read+write: 970419|1156771 (-16.1097%) [-1.19203x]
99+
Estimated Cycles: 1012949|1208335 (-16.1699%) [-1.19289x]
100+
Comparison with recvmmsg run:setup_nonblocking()
101+
Instructions: 468571|628114 (-25.4003%) [-1.34049x]
102+
L1 Hits: 882905|1145140 (-22.8998%) [-1.29701x]
103+
LL Hits: 10191|11463 (-11.0966%) [-1.12482x]
104+
RAM Hits: 57|168 (-66.0714%) [-2.94737x]
105+
Total read+write: 893153|1156771 (-22.7891%) [-1.29515x]
106+
Estimated Cycles: 935855|1208335 (-22.5500%) [-1.29116x]
107+
Comparison with uring run:setup_nonblocking()
108+
Instructions: 587770|628114 (-6.42304%) [-1.06864x]
109+
L1 Hits: 1071803|1145140 (-6.40420%) [-1.06842x]
110+
LL Hits: 10210|11463 (-10.9308%) [-1.12272x]
111+
RAM Hits: 119|168 (-29.1667%) [-1.41176x]
112+
Total read+write: 1082132|1156771 (-6.45236%) [-1.06897x]
113+
Estimated Cycles: 1127018|1208335 (-6.72967%) [-1.07215x]
114+
```
115+
116+
# Benchmark B - system impact
117+
118+
| backend | ifaces | rate | sent | recv | lost | user_ms | sys_ms | vol_csw | invol_csw |
119+
| ----------- | ------ | ---- | ----- | ----- | ---- | ------- | ------ | ------- | --------- |
120+
| dedicated | 1 | 1000 | 5000 | 5000 | 0 | 6.1 | 0.0 | 5000 | 0 |
121+
| dedicated | 1 | 2000 | 10000 | 10000 | 0 | 11.7 | 0.0 | 10000 | 0 |
122+
| dedicated | 1 | 4000 | 20000 | 20000 | 0 | 22.7 | 0.0 | 19996 | 0 |
123+
| dedicated | 2 | 1000 | 10000 | 10000 | 0 | 12.2 | 0.0 | 10000 | 0 |
124+
| dedicated | 2 | 2000 | 20000 | 20000 | 0 | 18.1 | 4.8 | 19999 | 0 |
125+
| dedicated | 2 | 4000 | 40000 | 40000 | 0 | 44.4 | 0.0 | 39997 | 0 |
126+
| dedicated | 4 | 1000 | 20000 | 20000 | 0 | 22.0 | 0.0 | 19999 | 0 |
127+
| dedicated | 4 | 2000 | 40000 | 40000 | 0 | 34.6 | 7.8 | 39993 | 12 |
128+
| dedicated | 4 | 4000 | 80000 | 80000 | 0 | 66.9 | 22.5 | 79956 | 48 |
129+
| epoll | 1 | 1000 | 5000 | 5000 | 0 | 3.9 | 3.9 | 4999 | 0 |
130+
| epoll | 1 | 2000 | 10000 | 10000 | 0 | 7.4 | 7.4 | 10000 | 0 |
131+
| epoll | 1 | 4000 | 20000 | 20000 | 0 | 14.2 | 14.3 | 19999 | 0 |
132+
| epoll | 2 | 1000 | 10000 | 9999 | 1 | 7.8 | 7.8 | 9865 | 0 |
133+
| epoll | 2 | 2000 | 20000 | 19999 | 1 | 14.6 | 14.6 | 19871 | 1 |
134+
| epoll | 2 | 4000 | 40000 | 39999 | 1 | 41.7 | 14.3 | 38664 | 1 |
135+
| epoll | 4 | 1000 | 20000 | 19997 | 3 | 26.8 | 0.0 | 16407 | 1 |
136+
| epoll | 4 | 2000 | 40000 | 39997 | 3 | 0.0 | 46.6 | 26749 | 62 |
137+
| epoll | 4 | 4000 | 80000 | 79997 | 3 | 0.0 | 103.7 | 66257 | 18 |
138+
| recvmmsg | 1 | 1000 | 5000 | 5000 | 0 | 0.0 | 7.9 | 5000 | 0 |
139+
| recvmmsg | 1 | 2000 | 10000 | 10000 | 0 | 0.0 | 15.1 | 10000 | 0 |
140+
| recvmmsg | 1 | 4000 | 20000 | 20000 | 0 | 0.0 | 28.7 | 19999 | 0 |
141+
| recvmmsg | 2 | 1000 | 10000 | 9999 | 1 | 0.0 | 15.4 | 9896 | 0 |
142+
| recvmmsg | 2 | 2000 | 20000 | 19999 | 1 | 0.0 | 29.6 | 19894 | 0 |
143+
| recvmmsg | 2 | 4000 | 40000 | 39999 | 1 | 0.0 | 57.9 | 39893 | 0 |
144+
| recvmmsg | 4 | 1000 | 20000 | 19997 | 3 | 0.0 | 26.2 | 15838 | 5 |
145+
| recvmmsg | 4 | 2000 | 40000 | 39997 | 3 | 0.0 | 52.7 | 32025 | 7 |
146+
| recvmmsg | 4 | 4000 | 80000 | 79997 | 3 | 0.0 | 101.0 | 63199 | 69 |
147+
| uring | 1 | 1000 | 5000 | 5000 | 0 | 0.0 | 7.3 | 5048 | 0 |
148+
| uring | 1 | 2000 | 10000 | 10000 | 0 | 0.0 | 14.0 | 10047 | 0 |
149+
| uring | 1 | 4000 | 20000 | 20000 | 0 | 0.0 | 26.7 | 20046 | 1 |
150+
| uring | 2 | 1000 | 10000 | 9999 | 1 | 0.0 | 14.2 | 9897 | 0 |
151+
| uring | 2 | 2000 | 20000 | 19999 | 1 | 0.0 | 27.2 | 19924 | 0 |
152+
| uring | 2 | 4000 | 40000 | 39999 | 1 | 7.6 | 44.8 | 39836 | 2 |
153+
| uring | 4 | 1000 | 20000 | 19997 | 3 | 3.8 | 20.3 | 14763 | 10 |
154+
| uring | 4 | 2000 | 40000 | 39997 | 3 | 8.1 | 42.2 | 33084 | 7 |
155+
| uring | 4 | 4000 | 80000 | 79997 | 3 | 15.3 | 78.8 | 61615 | 43 |
156+
| uring_multi | 1 | 1000 | 5000 | 5000 | 0 | 1.0 | 6.1 | 5000 | 0 |
157+
| uring_multi | 1 | 2000 | 10000 | 10000 | 0 | 1.7 | 11.3 | 10000 | 0 |
158+
| uring_multi | 1 | 4000 | 20000 | 20000 | 0 | 3.9 | 21.1 | 19997 | 0 |
159+
| uring_multi | 2 | 1000 | 10000 | 10000 | 0 | 1.3 | 7.0 | 5000 | 0 |
160+
| uring_multi | 2 | 2000 | 20000 | 20000 | 0 | 2.5 | 13.4 | 9999 | 0 |
161+
| uring_multi | 2 | 4000 | 40000 | 40000 | 0 | 4.8 | 25.6 | 19996 | 0 |
162+
| uring_multi | 4 | 1000 | 20000 | 20000 | 0 | 1.8 | 9.4 | 5000 | 2 |
163+
| uring_multi | 4 | 2000 | 40000 | 40000 | 0 | 13.6 | 9.5 | 9995 | 22 |
164+
| uring_multi | 4 | 4000 | 80000 | 80000 | 0 | 35.1 | 8.6 | 19984 | 11 |
165+
166+
# Benchmark C - system contention
167+
168+
## 4 core ~75% utilization
169+
170+
| backend | ifaces | rate | sent | recv | lost | user_ms | sys_ms | vol_csw | invol_csw |
171+
| ----------- | ------ | ---- | ----- | ----- | ---- | ------- | ------ | ------- | --------- |
172+
| dedicated | 4 | 4000 | 79991 | 79989 | 2 | 7.8 | 43.5 | 61858 | 169 |
173+
| epoll | 4 | 4000 | 79997 | 79994 | 3 | 5.7 | 40.7 | 33651 | 327 |
174+
| recvmmsg | 4 | 4000 | 79995 | 79994 | 1 | 6.3 | 40.3 | 34500 | 389 |
175+
| uring | 4 | 4000 | 80000 | 79997 | 3 | 3.4 | 46.7 | 39036 | 284 |
176+
| uring_multi | 4 | 4000 | 79993 | 79992 | 1 | 4.4 | 20.7 | 11021 | 110 |
177+
178+
## 4 core ~90% utilization
179+
180+
| backend | ifaces | rate | sent | recv | lost | user_ms | sys_ms | vol_csw | invol_csw |
181+
| ----------- | ------ | ---- | ----- | ----- | ---- | ------- | ------ | ------- | --------- |
182+
| dedicated | 4 | 4000 | 79991 | 79991 | 0 | 8.1 | 27.0 | 56873 | 81 |
183+
| epoll | 4 | 4000 | 79993 | 79991 | 2 | 9.9 | 31.7 | 40314 | 150 |
184+
| recvmmsg | 4 | 4000 | 79993 | 79991 | 2 | 7.1 | 33.7 | 39261 | 184 |
185+
| uring | 4 | 4000 | 80000 | 79991 | 9 | 7.9 | 29.2 | 37673 | 115 |
186+
| uring_multi | 4 | 4000 | 79993 | 79992 | 1 | 3.6 | 16.2 | 9232 | 64 |
187+
188+
**NOTE:** Fewer involuntary context switches under higher CPU utilization is counter intuitive, but
189+
correct. It means the receiver is being starved rather than interrupted. Compare the sys_ms kernel
190+
CPU time.
191+
192+
# Takeaways
193+
194+
* The pure CPU cost of the receive backends don't matter hugely, because the dominant cost is the
195+
syscalls and context switching
196+
* The multiplex methods are all pretty close to each other in terms of results
197+
* It appears all backends degrade nicely when the system is under high CPU load
198+
* It doesn't look like I'm going to get absolutely no dropped frames
199+
* Batching receives in the multishot backend dramatically reduces kernel CPU time and context
200+
switches, moreso than the other multiplex backends, and even at high CPU load

0 commit comments

Comments
 (0)