Generation speed issue

I load llama2 model like example successfully but the speed to generate text is really slow. 

![image](https://github.com/user-attachments/assets/f3275132-9cce-4ac5-b928-803c6e66147f)


[1] I'm not sure it use mps to accelerate generation.
How to confirm it?
[2] Is there a smaller LLM than 7B?

Here is my env
- Macbook Air / M2 / 16GB / Sonoma 14.5
- Xcode 15.4
- ckpt: coreml-projects/Llama-2-7b-chat-coreml


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generation speed issue #26

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Generation speed issue #26

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions