I load llama2 model like example successfully but the speed to generate text is really slow.

[1] I'm not sure it use mps to accelerate generation.
How to confirm it?
[2] Is there a smaller LLM than 7B?
Here is my env
- Macbook Air / M2 / 16GB / Sonoma 14.5
- Xcode 15.4
- ckpt: coreml-projects/Llama-2-7b-chat-coreml
I load llama2 model like example successfully but the speed to generate text is really slow.
[1] I'm not sure it use mps to accelerate generation.
How to confirm it?
[2] Is there a smaller LLM than 7B?
Here is my env