Conversation
|
@GandalfTea heyo, #12 introduces some conflicts with this one but we can merge your PR later together, its just some code locations that were changed & some typings. |
|
should be an easy fix but I'll need to wrap more code in tracer frames to get detailed timings. I'll resolve with master later tonight, still have quite a few local changes to push. |
|
llama3 model spends most time stalling on comm queue. also true for gpt_oss. qwen3 doesn't have this problem. looking into this |
7c3934b to
9456d61
Compare
|
added For now frames are aggregated into groups like @erhant, @andthattoo let me know if you have any other default metrics to add here. I have more info I don't surface yet like network statistics that I would like to add. |
Correctly aggregating the lower-level frames to get time distribution per node. There's now a staging buffer for events that arrive before the |
|
Rough metrics for default config llama 3.3 70b 4bit on 2 Macs (M4, M4 Pro) and a MacBook (M3), 56gb total ram, TB4: |
This reverts commit a12eefb6f3807ff9f1812cd755743bc4664a8714.
fixes #10
This pr implements: