It would be helpful to add a package like [xllamacpp](https://github.com/xorbitsai/xllamacpp) to facilitate local VLM inference rather than being reliant on the Google API. The xllamacpp package supports [Vulkan](https://xorbitsai.github.io/xllamacpp/whl/vulkan) and MPS inference as well as [CUDA](https://xorbitsai.github.io/xllamacpp/whl/cu128). Sample inference code can be reviewed [here](https://github.com/xorbitsai/xllamacpp/blob/f83e5fcd8007f6f2368cc469010fee9d28971863/tests/test_server.py#L295).
It would be helpful to add a package like xllamacpp to facilitate local VLM inference rather than being reliant on the Google API.
The xllamacpp package supports Vulkan and MPS inference as well as CUDA. Sample inference code can be reviewed here.