Skip to content

Commit 8949066

Browse files
authored
docs: add Gemma 4 Colab notebook (abetlen#2274)
1 parent 4684985 commit 8949066

2 files changed

Lines changed: 132 additions & 1 deletion

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -535,7 +535,7 @@ Below are the supported multi-modal models and their respective chat handlers (P
535535
| [llama-3-vision-alpha](https://huggingface.co/abetlen/llama-3-vision-alpha-gguf) | `Llama3VisionAlphaChatHandler` | `llama-3-vision-alpha` |
536536
| [minicpm-v-2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) | `MiniCPMv26ChatHandler` | `minicpm-v-2.6` |
537537
| [qwen2.5-vl](https://huggingface.co/unsloth/Qwen2.5-VL-3B-Instruct-GGUF) | `Qwen25VLChatHandler` | `qwen2.5-vl` |
538-
| [gemma-4](https://huggingface.co/unsloth/gemma-4-E4B-it-GGUF) | `Gemma4ChatHandler` | `gemma4` |
538+
| [gemma-4](https://huggingface.co/unsloth/gemma-4-E4B-it-GGUF) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/abetlen/llama-cpp-python/blob/main/examples/colab/notebook.ipynb) | `Gemma4ChatHandler` | `gemma4` |
539539
| GGUF models with an mtmd projector and embedded chat template | `MTMDChatHandler` | `mtmd` |
540540

541541
Then you'll need to use a custom chat handler to load the clip model and process the chat messages and images.

examples/colab/notebook.ipynb

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
{
2+
"nbformat": 4,
3+
"nbformat_minor": 5,
4+
"metadata": {
5+
"colab": {
6+
"provenance": [],
7+
"gpuType": "T4"
8+
},
9+
"accelerator": "GPU",
10+
"kernelspec": {
11+
"name": "python3",
12+
"display_name": "Python 3"
13+
},
14+
"language_info": {
15+
"name": "python"
16+
}
17+
},
18+
"cells": [
19+
{
20+
"cell_type": "markdown",
21+
"metadata": {},
22+
"source": [
23+
"# Gemma 4 12B Multimodal Chat\n",
24+
"\n",
25+
"Run Gemma 4 12B locally in Google Colab with the pre-built CUDA wheel for `llama-cpp-python`.\n",
26+
"\n",
27+
"Use a GPU runtime before running this notebook: **Runtime > Change runtime type > T4 GPU**.\n",
28+
"\n",
29+
"Current Colab CUDA images commonly provide CUDA 12 user-space libraries even when `nvidia-smi` reports a CUDA 13-capable driver, so this notebook installs the `cu125` wheel. If your runtime provides `libcudart.so.13`, switch the wheel index URL to `/whl/cu130`.\n"
30+
]
31+
},
32+
{
33+
"cell_type": "code",
34+
"execution_count": null,
35+
"metadata": {},
36+
"outputs": [],
37+
"source": [
38+
"!pip install --no-cache-dir --upgrade --force-reinstall \\\n",
39+
" \"huggingface-hub>=0.23.0\" \\\n",
40+
" llama-cpp-python \\\n",
41+
" --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu125\n"
42+
]
43+
},
44+
{
45+
"cell_type": "code",
46+
"execution_count": null,
47+
"metadata": {},
48+
"outputs": [],
49+
"source": [
50+
"from llama_cpp import Llama\n",
51+
"from llama_cpp.llama_chat_format import Gemma4ChatHandler\n",
52+
"\n",
53+
"MODEL_REPO = \"ggml-org/gemma-4-12B-it-GGUF\"\n",
54+
"MODEL_FILE = \"gemma-4-12B-it-Q4_K_M.gguf\"\n",
55+
"MMPROJ_FILE = \"mmproj-gemma-4-12B-it-Q8_0.gguf\"\n",
56+
"\n",
57+
"chat_handler = Gemma4ChatHandler.from_pretrained(\n",
58+
" repo_id=MODEL_REPO,\n",
59+
" filename=MMPROJ_FILE,\n",
60+
" verbose=False,\n",
61+
")\n",
62+
"\n",
63+
"llm = Llama.from_pretrained(\n",
64+
" repo_id=MODEL_REPO,\n",
65+
" filename=MODEL_FILE,\n",
66+
" chat_handler=chat_handler,\n",
67+
" n_gpu_layers=-1,\n",
68+
" n_ctx=8192,\n",
69+
" flash_attn=True,\n",
70+
" verbose=False,\n",
71+
")\n"
72+
]
73+
},
74+
{
75+
"cell_type": "code",
76+
"execution_count": null,
77+
"metadata": {},
78+
"outputs": [],
79+
"source": [
80+
"response = llm.create_chat_completion(\n",
81+
" messages=[\n",
82+
" {\n",
83+
" \"role\": \"user\",\n",
84+
" \"content\": \"Write the exact string `<stdio.h>` and nothing else.\",\n",
85+
" }\n",
86+
" ],\n",
87+
" max_tokens=32,\n",
88+
" temperature=0.0,\n",
89+
")\n",
90+
"\n",
91+
"print(response[\"choices\"][0][\"message\"][\"content\"])\n"
92+
]
93+
},
94+
{
95+
"cell_type": "code",
96+
"execution_count": null,
97+
"metadata": {},
98+
"outputs": [],
99+
"source": [
100+
"from IPython.display import Image, display\n",
101+
"\n",
102+
"IMAGE_URL = \"https://raw.githubusercontent.com/abetlen/llama-cpp-python/main/vendor/llama.cpp/tools/mtmd/test-1.jpeg\"\n",
103+
"\n",
104+
"display(Image(url=IMAGE_URL, width=320))\n"
105+
]
106+
},
107+
{
108+
"cell_type": "code",
109+
"execution_count": null,
110+
"metadata": {},
111+
"outputs": [],
112+
"source": [
113+
"response = llm.create_chat_completion(\n",
114+
" messages=[\n",
115+
" {\n",
116+
" \"role\": \"user\",\n",
117+
" \"content\": [\n",
118+
" {\"type\": \"text\", \"text\": \"Describe this image in one concise sentence.\"},\n",
119+
" {\"type\": \"image_url\", \"image_url\": {\"url\": IMAGE_URL}},\n",
120+
" ],\n",
121+
" }\n",
122+
" ],\n",
123+
" max_tokens=128,\n",
124+
" temperature=0.2,\n",
125+
")\n",
126+
"\n",
127+
"print(response[\"choices\"][0][\"message\"][\"content\"])\n"
128+
]
129+
}
130+
]
131+
}

0 commit comments

Comments
 (0)