Skip to content

Conversation

@wooway777
Copy link
Collaborator

@wooway777 wooway777 commented Jan 22, 2026

resolves #523

硬件型号:mlu 590 +phytum cpu

单卡

python examples/jiuge.py --cambricon --model_path=/data/pepe/9G7B_MHA/ --max_new_tokens=1024
 Generation completed in 2108.3 ms
 Batchsize=1  Per_Batch_Input_Len=13  Per_Batch_New_Tokens=40

 Prefill TTFT: 0.11ms  Throughput: 118.66tok/s

 Decode  Avg ITL: 51.25ms   Throughput: 19.51tok/s

分布式由于通信起不来跑不了,应该不是代码问题

基准测试

python test/bench/test_benchmark.py --cambricon /data/pepe/9G7B_MHA/ --bench ceval --subject middle_school_mathematics --num_samples 100 --backend cpp --ndev 1

Overall 成绩: 61/100 = 61.00%
Total Latency: 798.5647848986555 seconds
Total Tokens Processed: 25442 tokens
Overall Throughput: 31.86 tokens/s

@wooway777 wooway777 requested review from a team and zhangyue207 January 22, 2026 02:48
@wooway777
Copy link
Collaborator Author

单卡
image
基准
image
分布式
image
通信
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DEV] Cambricon SDK 1.22

2 participants