Skip to content

Ollama 0.15.6 with SYCL backend for Intel GPU (~2x faster than Vulkan)#1

Merged
eSlider merged 50 commits intomainfrom
release/Intel-Core-Ultra-7-155H
Feb 13, 2026
Merged

Ollama 0.15.6 with SYCL backend for Intel GPU (~2x faster than Vulkan)#1
eSlider merged 50 commits intomainfrom
release/Intel-Core-Ultra-7-155H

Conversation

@eSlider
Copy link
Owner

@eSlider eSlider commented Feb 12, 2026

Summary

• Upgrade ollama from 0.9.3 (IPEX-LLM bundle) to official v0.15.6
• Replace Vulkan backend with SYCL — custom-built ggml-sycl from upstream llama.cpp using Intel oneAPI 2025.1.1, delivering ~2x
inference speed on Intel GPUs
• Add GitHub Actions CI to build and push the Docker image to GHCR automatically
• Update Intel GPU driver stack to latest releases (Level-Zero 1.28.0, IGC 2.28.4, compute-runtime 26.05)

Why SYCL over Vulkan

Ollama's official release only ships a Vulkan backend for Intel GPUs. SYCL with oneAPI unlocks oneMKL, oneDNN, and Level-Zero
direct access — benchmarks show +45-100% token/s on integrated Arc (MTL) and +57-83% on discrete Arc.

Intel GPU Vulkan SYCL Gain
MTL iGPU (155H) ~8-11 tok/s ~16 tok/s +45-100%
Arc A770 ~30-35 tok/s ~55 tok/s +57-83%

Tested on Intel Core Ultra 7 155H (Meteor Lake): gemma3:1b — 27/27 layers on GPU, 10.2 tok/s generation, 65.3 tok/s prompt eval.

How the SYCL build works

Ollama intentionally excludes ggml-sycl source from its vendored ggml. This PR rebuilds it in a multi-stage Dockerfile:

  1. Clones ollama v0.15.6 source (for ggml build system + headers)
  2. Fetches ggml-sycl from the exact llama.cpp commit (a5bb8ba4) that ollama vendors — critical for ABI compatibility
  3. Applies two patches via patch-sycl.py:
    • graph_compute signature: ollama adds int batch_size parameter
    • GGML_TENSOR_FLAG_COMPUTE: ollama removes this flag; without the patch, all compute nodes get skipped producing garbage output
  4. Compiles with icpx + oneAPI, bundles runtime libs into the image

Changes

File What changed
Dockerfile Multi-stage build: oneAPI SYCL compile -> minimal Ubuntu runtime
patch-sycl.py New — patches ggml-sycl for ollama API compatibility
docker-compose.yml SYCL env vars, full /dev/dri access, debug logging
README.md Full rewrite with architecture diagram, benchmarks, troubleshooting
CHANGELOG.md Documents full migration history
.github/workflows/build-push.yml New — CI to build and push image to GHCR
start-ollama.sh Custom entrypoint fixing hardcoded OLLAMA_HOST (from IPEX-LLM era)
.gitignore IDE and swap file exclusions

Test plan

  • docker compose build completes without errors
  • curl http://localhost:11434/api/tags returns model list
  • SYCL backend detected in logs (SYCL0, Intel(R) Arc(TM) Graphics)
  • All model layers offloaded to GPU (27/27)
  • Inference produces correct output (gemma3:1b, gemma3:4b)
  • CI workflow builds successfully on GitHub Actions
  • Pre-built image pullable from GHCR

Note

Medium Risk
Reworks the container build/run stack (multi-stage oneAPI build, custom SYCL runner, updated Intel GPU runtimes), which could break runtime compatibility or device detection if versions/patching drift. No direct auth/data-path changes, but it modifies the primary deployment artifact and defaults (e.g., WebUI auth disabled).

Overview
Switches the Docker build to a SYCL-first Intel GPU stack. The Dockerfile is rewritten into a multi-stage build that compiles libggml-sycl.so with Intel oneAPI, bundles required oneAPI runtime libraries, installs the official Ollama v0.15.6 binary, and removes unused runners (CUDA/MLX/Vulkan) while updating Intel GPU user-space drivers.

Adds SYCL compatibility patching and updates runtime defaults. Introduces patch-sycl.py to patch upstream ggml-sycl for Ollama API differences, and updates docker-compose.yml to pass the Ollama version build arg, expose 11434, increase shm_size, set SYCL/Level-Zero env vars, and disable WebUI auth while switching WebUI to :latest with local persisted data.

Adds automation and repo hygiene/docs. Adds a GitHub Actions workflow to build and push the image to GHCR with versioned tagging, plus new/updated README.md/CHANGELOG.md, .gitignore, and .github/FUNDING.yml, and removes legacy IPEX-LLM init/run scripts and the WSL2 compose file.

Written by Cursor Bugbot for commit e397010. This will update automatically on new commits. Configure here.

mattcurf and others added 30 commits October 28, 2024 16:33
Fix the ambiguous intel-basekit package in Dockerfile
Update Dockerfile to use Intel public ipex container
Update ipex-llm image from Intel to 2.2.0-SNAPSHOT
illlustrates -> illustrates
Update to latest ipex-llm dockerfile 20250211
…ed in Dockerfile (if no build args provided)
mattcurf and others added 17 commits March 17, 2025 09:24
Dockerfile ARGs to make it easier to use latest IPEX-LLM Ollama Portable Zip
Update to ipex-llm-2.2.0b20250313
Update default to ipex-llm v2.2.0 (guide for v2.3.0-nightly in docs)
Revised `IPEXLLM_RELEASE_REPO` value and adjusted file and path references for consistency. Updated `docker-compose.yml` with refined environment variables, device mapping, restart policies, and added necessary port bindings for better functionality and maintainability.
- level-zero v1.22.4 -> v1.28.0
- IGC v2.11.7 -> v2.28.4
- compute-runtime 25.18.33578.6 -> 26.05.37020.3
- libigdgmm 22.7.0 -> 22.9.0
- ipex-llm ollama nightly 2.3.0b20250612 -> 2.3.0b20250725
- Docker compose: disable webui auth, stateless webui volume
- README formatting and GPU model update

Co-authored-by: Cursor <cursoragent@cursor.com>
…trypoint

The IPEX-LLM bundled start-ollama.sh hardcodes OLLAMA_HOST=127.0.0.1 and
OLLAMA_KEEP_ALIVE=10m, overriding docker-compose environment variables and
preventing external connections through Docker port mapping.

- Add custom start-ollama.sh that honours env vars with sensible defaults
- Mount it read-only into the container
- Fix LD_LIBRARY_PATH env var syntax (: -> =)
- Add .gitignore for IDE/swap/webui data files
- Update CHANGELOG and README with fix documentation

Co-authored-by: Cursor <cursoragent@cursor.com>
… Intel GPU

Replace the IPEX-LLM portable zip (bundling a patched ollama 0.9.3 with SYCL)
with the official ollama 0.15.6 release using the Vulkan backend for Intel GPU
acceleration. The official ollama project does not ship a SYCL backend; Vulkan
is their supported path for Intel GPUs.

- Use official ollama binary with Vulkan runner (OLLAMA_VULKAN=1)
- Strip CUDA/MLX runners from image to save space
- Add mesa-vulkan-drivers for Intel ANV Vulkan ICD
- Remove all IPEX-LLM env vars and wrapper scripts
- Simplify entrypoint to /usr/bin/ollama serve directly
- Clean up docker-compose.yml: remove IPEX build args and env vars

Tested: Intel Arc Graphics (MTL) detected, 17/17 layers offloaded to Vulkan0
Co-authored-by: Cursor <cursoragent@cursor.com>
…on Intel GPUs

Build ggml-sycl from upstream llama.cpp (commit a5bb8ba4, matching ollama's
vendored ggml) using Intel oneAPI 2025.1.1 in a multi-stage Docker build.
Patch two ollama-specific API divergences via patch-sycl.py: added batch_size
parameter to graph_compute, removed GGML_TENSOR_FLAG_COMPUTE skip-check that
caused all compute nodes to be bypassed.

Tested: gemma3:1b — 27/27 layers on GPU, 10.2 tok/s gen, 65.3 tok/s prompt eval.
Co-authored-by: Cursor <cursoragent@cursor.com>
Rewrite README with clear value proposition, architecture diagram,
troubleshooting section, and streamlined structure. Update CHANGELOG
to reflect full history of Vulkan-to-SYCL migration.

Co-authored-by: Cursor <cursoragent@cursor.com>
Workflow triggers on push to main/release branches, tags, PRs, and
manual dispatch. Uses Docker Buildx with GHA cache for faster rebuilds.
Tags images with ollama version, git SHA, and branch/tag names.

Co-authored-by: Cursor <cursoragent@cursor.com>
@eSlider eSlider added the enhancement New feature or request label Feb 12, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursoragent and others added 2 commits February 13, 2026 10:22
Co-authored-by: Andrey Oblivantsev <eslider@gmail.com>
Co-authored-by: Andrey Oblivantsev <eslider@gmail.com>
@eSlider eSlider merged commit 0ab3060 into main Feb 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants