Replies: 4 comments 7 replies
-
|
As you say, clearly there is some instability here - the 20% -ve slowdown
for L3 logging is particularly weird. I suspect it has to do with time
keeping, e.g. NTP made the clock run slower.
I think we'll need to do multiple runs (8?) and take the median. I realise
that's painful, but I can't think of an alternative.
…On Thu, 13 Jun 2024 at 02:53, Aditya P. Gurajada ***@***.***> wrote:
HI, @gregthelaw <https://github.com/gregthelaw> --
I now have the test.sh and report generation productized under the 2
commits under PR #75 <#75> .
I ran a perf-test on AWS bare-metal instance with 72 cores.
Test configuration: 32-clients, 1Mil messages / client: Server: 1, 2, 4, 8
thread-count
Here are the post-processed results.
I was expecting to see increasing degradation due to concurrent threads
logging but the results don't reflect this behaviour.
Below each chunk of output, I have annotated my observations. Comments
welcome:
*Performance comparison for NumClients=32, NumOps=32000000 (32 Million),
NumThreads=1*
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Run-Type | Server throughput | Srv:Drop | Client throughput | Cli:Drop | NumOps/thread |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Baseline - No logging | ~295.85 K ops/sec | 0.00 % | ~9.79 K ops/sec | 0.00 % | 32 Million |
| L3-logging (no LOC) | ~355.17 K ops/sec | 20.05 % | ~11.83 K ops/sec | 20.85 % | 32 Million |
| L3-fast logging (no LOC) | ~296.56 K ops/sec | 0.24 % | ~9.73 K ops/sec | -0.58 % | 32 Million |
| L3-fprintf() logging (no LOC) | ~270.67 K ops/sec | -8.51 % | ~8.87 K ops/sec | -9.44 % | 32 Million |
| L3-write() logging (no LOC) | ~188.06 K ops/sec | -36.43 % | ~6.09 K ops/sec | -37.77 % | 32 Million |
| L3-logging default LOC | ~296.63 K ops/sec | 0.26 % | ~9.76 K ops/sec | -0.31 % | 32 Million |
| L3-logging LOC-ELF | ~297.14 K ops/sec | 0.44 % | ~9.77 K ops/sec | -0.19 % | 32 Million |
| spdlog-logging | ~284.75 K ops/sec | -3.75 % | ~9.36 K ops/sec | -4.36 % | 32 Million |
| spdlog-backtrace-logging | ~283.64 K ops/sec | -4.13 % | ~9.31 K ops/sec | -4.90 % | 32 Million |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
- L3-logging is 20% faster than baseline: This is totally unexpected
and unheard of. I attribute this to some m/c instability [?]
- The remaining numbers of %-drop are "in line" what we've been seeing
elsewhere.
*Performance comparison for NumClients=32, NumOps=32000000 (32 Million),
NumThreads=2*
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Run-Type | Server throughput | Srv:Drop | Client throughput | Cli:Drop | NumOps/thread |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Baseline - No logging | ~457.74 K ops/sec | 0.00 % | ~15.41 K ops/sec | 0.00 % | 16 Million |
| L3-logging (no LOC) | ~442.93 K ops/sec | -3.24 % | ~14.87 K ops/sec | -3.46 % | 16 Million |
| L3-fast logging (no LOC) | ~437.40 K ops/sec | -4.44 % | ~14.68 K ops/sec | -4.75 % | 16 Million |
| L3-fprintf() logging (no LOC) | ~451.28 K ops/sec | -1.41 % | ~15.31 K ops/sec | -0.62 % | 16 Million |
| L3-write() logging (no LOC) | ~238.83 K ops/sec | -47.82 % | ~7.75 K ops/sec | -49.70 % | 16 Million |
| L3-logging default LOC | ~439.79 K ops/sec | -3.92 % | ~14.76 K ops/sec | -4.20 % | 16 Million |
| L3-logging LOC-ELF | ~439.76 K ops/sec | -3.93 % | ~14.76 K ops/sec | -4.20 % | 16 Million |
| spdlog-logging | ~406.27 K ops/sec | -11.24 % | ~13.63 K ops/sec | -11.56 % | 16 Million |
| spdlog-backtrace-logging | ~457.64 K ops/sec | -0.02 % | ~15.63 K ops/sec | 1.43 % | 16 Million |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
- For 2-threads, the overall %-age drops are seemingly reasonable / as
expected.
- Between 1 and 2 server-threads, one would expect to see greater
degradation for fprintf() but it has gone down.
- write() perf has worsened - I guess this could be due to concurrent
threads logging [?]
*Performance comparison for NumClients=32, NumOps=32000000 (32 Million),
NumThreads=4*
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Run-Type | Server throughput | Srv:Drop | Client throughput | Cli:Drop | NumOps/thread |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Baseline - No logging | ~695.46 K ops/sec | 0.00 % | ~24.44 K ops/sec | 0.00 % | 8 Million |
| L3-logging (no LOC) | ~673.46 K ops/sec | -3.16 % | ~23.68 K ops/sec | -3.11 % | 8 Million |
| L3-fast logging (no LOC) | ~658.55 K ops/sec | -5.31 % | ~22.99 K ops/sec | -5.94 % | 8 Million |
| L3-fprintf() logging (no LOC) | ~580.25 K ops/sec | -16.57 % | ~19.95 K ops/sec | -18.35 % | 8 Million |
| L3-write() logging (no LOC) | ~314.35 K ops/sec | -54.80 % | ~10.34 K ops/sec | -57.69 % | 8 Million |
| L3-logging default LOC | ~681.19 K ops/sec | -2.05 % | ~23.89 K ops/sec | -2.24 % | 8 Million |
| L3-logging LOC-ELF | ~664.07 K ops/sec | -4.51 % | ~23.27 K ops/sec | -4.79 % | 8 Million |
| spdlog-logging | ~504.25 K ops/sec | -27.49 % | ~17.06 K ops/sec | -30.18 % | 8 Million |
| spdlog-backtrace-logging | ~626.13 K ops/sec | -9.97 % | ~21.84 K ops/sec | -10.62 % | 8 Million |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
- With 4 and, below, 8 threads, write() and spdlog logging performance
seems to go down quite a bit. This is consistent with the hypothesis that
concurrent logging degrades these schemes even more.
- But fprintf() degradation is fluctuating -- this may be due to
file-system / glibc caching?
-
*Performance comparison for NumClients=32, NumOps=32000000 (32 Million),
NumThreads=8*
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Run-Type | Server throughput | Srv:Drop | Client throughput | Cli:Drop | NumOps/thread |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Baseline - No logging | ~744.79 K ops/sec | 0.00 % | ~26.40 K ops/sec | 0.00 % | 4 Million |
| L3-logging (no LOC) | ~734.90 K ops/sec | -1.33 % | ~26.08 K ops/sec | -1.22 % | 4 Million |
| L3-fast logging (no LOC) | ~719.97 K ops/sec | -3.33 % | ~25.50 K ops/sec | -3.42 % | 4 Million |
| L3-fprintf() logging (no LOC) | ~681.84 K ops/sec | -8.45 % | ~23.86 K ops/sec | -9.61 % | 4 Million |
| L3-write() logging (no LOC) | ~293.85 K ops/sec | -60.55 % | ~9.63 K ops/sec | -63.51 % | 4 Million |
| L3-logging default LOC | ~720.68 K ops/sec | -3.24 % | ~25.44 K ops/sec | -3.66 % | 4 Million |
| L3-logging LOC-ELF | ~726.58 K ops/sec | -2.44 % | ~25.68 K ops/sec | -2.75 % | 4 Million |
| spdlog-logging | ~581.29 K ops/sec | -21.95 % | ~19.93 K ops/sec | -24.50 % | 4 Million |
| spdlog-backtrace-logging | ~758.00 K ops/sec | 1.77 % | ~27.20 K ops/sec | 3.01 % | 4 Million |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
*Summary Conclusions*:
- Overall, with increased concurrency other logging schemes show
greater degradation.
- We are not seeing the ~1% performance degradation with L3-logging.
It's always 3+%. Maybe a measurement anomaly [?]
- spdlog-backtrace logging is competitive with L3 and l3-fast logging,
in terms of degradation, but you cannot unpack the log-info post-the run.
(Backtrace only dumps while the server is running.)
- L3-fast logging still has some non-trivial degradation. You need to
investigate and address open issue #76
<#76>.
—
Reply to this email directly, view it on GitHub
<#79>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANHOS7EVS2L44EMCXQTGN5LZHDUO3AVCNFSM6AAAAABJHLUHUWVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZWHAYTIMRZGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
https://undo.io | Record. Replay. Resolve.
Technical Paper: Fix Bugs Faster with Time Travel Debugging
<https://info.undo.io/time-travel-debugging-whitepaper>
|
Beta Was this translation helpful? Give feedback.
-
|
All very odd, isn't it? The numthreads=1 and numthreads=4 are reasonable. numthreads=2 is disappointing because, as you say, spdlog appears to be competitive. But on numthreads=8 we once again see spdlog and L3 all have negative slowdown - i.e. a speedup compared to baseline. This implies that we are still not able to measure accurately. Did you say the machine you're using has 72 physical cores? That is presumably a NUMA architecture, and weird shit can indeed happen there. Can you try with a smaller machine, specifically one that is not NUMA? And/or we may need to experiment with pinning threads to CPU cores - e.g. |
Beta Was this translation helpful? Give feedback.
-
|
I experimented with my laptop. I carefully shut down all other programs, plugged in, and deliberately throttled CPUs to try to avoid weird thermal/throttling issues.There is some bizarre effect with spdlog and two threads: everything else looks vaguely sane. There are a few -ve slowdowns, but all under 1% so I think we conclude "no measurable difference if the change against baseline is < 1% either way." But spdlog multi-threaded is just weird: 4 and 8 threads are still less slowdown than 1, which makes no sense: there is a mutex, it will be always uncontended in the single threaded case. |
Beta Was this translation helpful? Give feedback.
-
|
Oh wait, hang on, it looks like maybe you need to configure spdlog differently for multithreaded. Do we do that? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
HI, @gregthelaw --
Updated 15.Jun.2024:
I now have a new commit under PR #85 that can be used to run multiple iterations of client-server performance tests.
I had to massage
perf-report.pyto find the median value of metrics and generate the comparison report.Here are the reports:
Perf run parameters: 32-clients, 1 Million msgs/client, run 5 iterations with different server-thread configurations
Observations:
write()logging and spdlog logging does go up.Performance comparison for For 32 clients, 32000000 (32 Million) msgs, num_threads=1
Performance comparison for For 32 clients, 32000000 (32 Million) msgs, num_threads=2
**** (Using median value of metric across 5 iterations) ****
Performance comparison for For 32 clients, 32000000 (32 Million) msgs, num_threads=4
**** (Using median value of metric across 5 iterations) ****
Performance comparison for For 32 clients, 32000000 (32 Million) msgs, num_threads=8
**** (Using median value of metric across 5 iterations) ****
I now have the
test.shand report generation productized under the 2 commits under PR #75 .I ran a perf-test on AWS bare-metal instance with 72 cores.
Test configuration: 32-clients, 1Mil messages / client: Server: 1, 2, 4, 8 thread-count
Here are the post-processed results.
I was expecting to see increasing degradation due to concurrent threads logging but the results don't reflect this behaviour.
Below each chunk of output, I have annotated my observations. Comments welcome:
Performance comparison for NumClients=32, NumOps=32000000 (32 Million), NumThreads=1
Performance comparison for NumClients=32, NumOps=32000000 (32 Million), NumThreads=2
fprintf()but it has gone down.write()perf has worsened - I guess this could be due to concurrent threads logging [?]Performance comparison for NumClients=32, NumOps=32000000 (32 Million), NumThreads=4
write()andspdloglogging performance seems to go down quite a bit. This is consistent with the hypothesis that concurrent logging degrades these schemes even more.fprintf()degradation is fluctuating -- this may be due to file-system / glibc caching?Performance comparison for NumClients=32, NumOps=32000000 (32 Million), NumThreads=8
Summary Conclusions:
spdlog-backtracelogging is competitive with L3 and l3-fast logging, in terms of degradation, but you cannot unpack the log-info post-the run. (Backtrace only dumps while the server is running.)L3-fastlogging still has some non-trivial degradation. You need to investigate and address open issue L3-Fast logging interface improvements. #76.Beta Was this translation helpful? Give feedback.
All reactions