Ollama 0.15.6 with SYCL backend for Intel GPU (~2x faster than Vulkan)#1
Merged
Ollama 0.15.6 with SYCL backend for Intel GPU (~2x faster than Vulkan)#1
Conversation
Fix the ambiguous intel-basekit package in Dockerfile
Update Dockerfile to use Intel public ipex container
Update ipex-llm image from Intel to 2.2.0-SNAPSHOT
illlustrates -> illustrates
docs: update README.md
Update to latest ipex-llm dockerfile 20250211
…ed in Dockerfile (if no build args provided)
Dockerfile ARGs to make it easier to use latest IPEX-LLM Ollama Portable Zip
Update to ipex-llm-2.2.0b20250313
Update Intel libraries
Update default to ipex-llm v2.2.0 (guide for v2.3.0-nightly in docs)
Revised `IPEXLLM_RELEASE_REPO` value and adjusted file and path references for consistency. Updated `docker-compose.yml` with refined environment variables, device mapping, restart policies, and added necessary port bindings for better functionality and maintainability.
- level-zero v1.22.4 -> v1.28.0 - IGC v2.11.7 -> v2.28.4 - compute-runtime 25.18.33578.6 -> 26.05.37020.3 - libigdgmm 22.7.0 -> 22.9.0 - ipex-llm ollama nightly 2.3.0b20250612 -> 2.3.0b20250725 - Docker compose: disable webui auth, stateless webui volume - README formatting and GPU model update Co-authored-by: Cursor <cursoragent@cursor.com>
…trypoint The IPEX-LLM bundled start-ollama.sh hardcodes OLLAMA_HOST=127.0.0.1 and OLLAMA_KEEP_ALIVE=10m, overriding docker-compose environment variables and preventing external connections through Docker port mapping. - Add custom start-ollama.sh that honours env vars with sensible defaults - Mount it read-only into the container - Fix LD_LIBRARY_PATH env var syntax (: -> =) - Add .gitignore for IDE/swap/webui data files - Update CHANGELOG and README with fix documentation Co-authored-by: Cursor <cursoragent@cursor.com>
… Intel GPU Replace the IPEX-LLM portable zip (bundling a patched ollama 0.9.3 with SYCL) with the official ollama 0.15.6 release using the Vulkan backend for Intel GPU acceleration. The official ollama project does not ship a SYCL backend; Vulkan is their supported path for Intel GPUs. - Use official ollama binary with Vulkan runner (OLLAMA_VULKAN=1) - Strip CUDA/MLX runners from image to save space - Add mesa-vulkan-drivers for Intel ANV Vulkan ICD - Remove all IPEX-LLM env vars and wrapper scripts - Simplify entrypoint to /usr/bin/ollama serve directly - Clean up docker-compose.yml: remove IPEX build args and env vars Tested: Intel Arc Graphics (MTL) detected, 17/17 layers offloaded to Vulkan0 Co-authored-by: Cursor <cursoragent@cursor.com>
…on Intel GPUs Build ggml-sycl from upstream llama.cpp (commit a5bb8ba4, matching ollama's vendored ggml) using Intel oneAPI 2025.1.1 in a multi-stage Docker build. Patch two ollama-specific API divergences via patch-sycl.py: added batch_size parameter to graph_compute, removed GGML_TENSOR_FLAG_COMPUTE skip-check that caused all compute nodes to be bypassed. Tested: gemma3:1b — 27/27 layers on GPU, 10.2 tok/s gen, 65.3 tok/s prompt eval. Co-authored-by: Cursor <cursoragent@cursor.com>
Rewrite README with clear value proposition, architecture diagram, troubleshooting section, and streamlined structure. Update CHANGELOG to reflect full history of Vulkan-to-SYCL migration. Co-authored-by: Cursor <cursoragent@cursor.com>
Workflow triggers on push to main/release branches, tags, PRs, and manual dispatch. Uses Docker Buildx with GHA cache for faster rebuilds. Tags images with ollama version, git SHA, and branch/tag names. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
Co-authored-by: Andrey Oblivantsev <eslider@gmail.com>
Co-authored-by: Andrey Oblivantsev <eslider@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
• Upgrade ollama from 0.9.3 (IPEX-LLM bundle) to official v0.15.6
• Replace Vulkan backend with SYCL — custom-built ggml-sycl from upstream llama.cpp using Intel oneAPI 2025.1.1, delivering ~2x
inference speed on Intel GPUs
• Add GitHub Actions CI to build and push the Docker image to GHCR automatically
• Update Intel GPU driver stack to latest releases (Level-Zero 1.28.0, IGC 2.28.4, compute-runtime 26.05)
Why SYCL over Vulkan
Ollama's official release only ships a Vulkan backend for Intel GPUs. SYCL with oneAPI unlocks oneMKL, oneDNN, and Level-Zero
direct access — benchmarks show +45-100% token/s on integrated Arc (MTL) and +57-83% on discrete Arc.
Tested on Intel Core Ultra 7 155H (Meteor Lake): gemma3:1b — 27/27 layers on GPU, 10.2 tok/s generation, 65.3 tok/s prompt eval.
How the SYCL build works
Ollama intentionally excludes ggml-sycl source from its vendored ggml. This PR rebuilds it in a multi-stage Dockerfile:
• graph_compute signature: ollama adds int batch_size parameter
• GGML_TENSOR_FLAG_COMPUTE: ollama removes this flag; without the patch, all compute nodes get skipped producing garbage output
Changes
Test plan
Note
Medium Risk
Reworks the container build/run stack (multi-stage oneAPI build, custom SYCL runner, updated Intel GPU runtimes), which could break runtime compatibility or device detection if versions/patching drift. No direct auth/data-path changes, but it modifies the primary deployment artifact and defaults (e.g., WebUI auth disabled).
Overview
Switches the Docker build to a SYCL-first Intel GPU stack. The
Dockerfileis rewritten into a multi-stage build that compileslibggml-sycl.sowith Intel oneAPI, bundles required oneAPI runtime libraries, installs the official Ollamav0.15.6binary, and removes unused runners (CUDA/MLX/Vulkan) while updating Intel GPU user-space drivers.Adds SYCL compatibility patching and updates runtime defaults. Introduces
patch-sycl.pyto patch upstreamggml-syclfor Ollama API differences, and updatesdocker-compose.ymlto pass the Ollama version build arg, expose11434, increaseshm_size, set SYCL/Level-Zero env vars, and disable WebUI auth while switching WebUI to:latestwith local persisted data.Adds automation and repo hygiene/docs. Adds a GitHub Actions workflow to build and push the image to GHCR with versioned tagging, plus new/updated
README.md/CHANGELOG.md,.gitignore, and.github/FUNDING.yml, and removes legacy IPEX-LLM init/run scripts and the WSL2 compose file.Written by Cursor Bugbot for commit e397010. This will update automatically on new commits. Configure here.