Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
ff2a979
Merge pull request #6 from mattcurf/fix_oneapi_dependency
mattcurf Oct 28, 2024
2e18d91
Update webui
pepijndevos Nov 7, 2024
0e4cf4f
Update docker-compose-wsl2.yml
pepijndevos Nov 7, 2024
eb23896
Merge pull request #8 from pepijndevos/patch-1
mattcurf Nov 8, 2024
c74f6f2
Update Dockerfile to use Intel public ipex container
mattcurf Nov 14, 2024
91d2045
Merge pull request #11 from mattcurf/ipex_intel_image
mattcurf Nov 28, 2024
6df0d8d
Update ipex-llm image from Intel to 2.2.0-SNAPSHOT
mattcurf Jan 17, 2025
07e8a24
Merge pull request #21 from mattcurf/update_tags
mattcurf Jan 21, 2025
b74bab0
Update to latest open-webui releases
mattcurf Jan 24, 2025
d51c656
Merge pull request #24 from mattcurf/update-webui
mattcurf Jan 26, 2025
8e69333
Update README.md
mattcurf Jan 26, 2025
c230c45
Update README.md
mattcurf Jan 26, 2025
ddd565f
docs: update README.md
eltociear Jan 30, 2025
1581a50
Merge pull request #27 from eltociear/patch-1
mattcurf Feb 5, 2025
765a8c0
Update to latest ipex-llm dockerfile 20250211
mattcurf Feb 12, 2025
f08a310
Update README.md
mattcurf Feb 12, 2025
ec7dec8
Merge pull request #36 from mattcurf/updated-docker-image
mattcurf Feb 17, 2025
2fc5265
Update to use new ipex portable .zip packages
mattcurf Feb 19, 2025
dd84c20
Minor fixes
mattcurf Feb 19, 2025
c47c879
Merge branch 'main' into ollama_portable_zip
mattcurf Feb 19, 2025
fed3cf9
Update README.md
mattcurf Feb 19, 2025
fa579db
Increase context window size
mattcurf Feb 19, 2025
db8d96c
Merge pull request #39 from mattcurf/ollama_portable_zip
mattcurf Feb 22, 2025
85e28fc
Cache link
mattcurf Feb 23, 2025
d81b21c
Merge pull request #42 from mattcurf/fix-links
mattcurf Feb 23, 2025
e1da4a4
Allow for user choice of ollama portable zip at build time
blebo Mar 16, 2025
2c82aed
Update compose file with build args
blebo Mar 16, 2025
1e92fbe
Updates to allow latest ollama in compose file, with fallback to cach…
blebo Mar 16, 2025
b33c01f
Updated README.md for Dockerfile args.
blebo Mar 16, 2025
451f910
Revert compose to cached .tgz by default.
blebo Mar 17, 2025
86f0765
Merge pull request #49 from blebo/dockerfile-args
mattcurf Mar 17, 2025
61288f5
Update to ipex-llm-2.2.0b20250313
mattcurf Mar 17, 2025
6964b45
Merge pull request #50 from mattcurf/update_ipex
mattcurf Mar 17, 2025
504a1d3
Update default to ipex-llm v2.2.0 (guide for v2.3.0-nightly in docs)
blebo Apr 16, 2025
f1bbedb
Update Intel libraries
charlescng Apr 18, 2025
dea2fd0
Merge pull request #55 from charlescng/update_intel_libs
mattcurf Apr 19, 2025
8172339
Merge pull request #54 from blebo/update-ipex-v2.2.0
mattcurf Apr 20, 2025
1759294
Update Docker configurations for deployment improvements
eSlider Apr 22, 2025
c98fd71
Ignore shelf
eSlider Apr 22, 2025
0a7f974
Update Docker configurations and Intel GPU runtimes for improved perf…
eSlider Jun 20, 2025
1239010
Clean up Dockerfile by adding autoremove and autoclean commands to re…
eSlider Jun 20, 2025
96913a2
Update Intel GPU stack and ipex-llm to latest available versions
eSlider Feb 12, 2026
8debf20
Fix ollama not reachable from host due to hardcoded OLLAMA_HOST in en…
eSlider Feb 12, 2026
63c3b81
Upgrade ollama from 0.9.3 (IPEX-LLM) to 0.15.6 (official) with Vulkan…
eSlider Feb 12, 2026
c56646e
Switch GPU backend from Vulkan to SYCL for ~2x inference performance …
eSlider Feb 12, 2026
971852d
Rework README for better GitHub presentation
eSlider Feb 12, 2026
52672c3
Add GitHub Actions CI to build and push Docker image to GHCR
eSlider Feb 12, 2026
e397010
Create FUNDING.yml
eSlider Feb 12, 2026
5924879
chore: remove committed JetBrains .idea gitignore
cursoragent Feb 13, 2026
9f56e70
Remove dead libsycl glob no-op in Dockerfile
cursoragent Feb 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# These are supported funding model platforms

github: eSlider
88 changes: 88 additions & 0 deletions .github/workflows/build-push.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
name: Build and push Docker image

on:
push:
branches:
- main
- master
- "release/**"
tags:
- "v*"
pull_request:
branches:
- main
- master
workflow_dispatch:
inputs:
ollama_version:
description: "Ollama version to build"
required: false
default: "0.15.6"

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
OLLAMA_VERSION: "0.15.6"

jobs:
build-and-push:
runs-on: ubuntu-latest
timeout-minutes: 60

permissions:
contents: read
packages: write

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Log in to GitHub Container Registry
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Resolve Ollama version
id: version
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ] && [ -n "${{ inputs.ollama_version }}" ]; then
echo "ollama_version=${{ inputs.ollama_version }}" >> "$GITHUB_OUTPUT"
else
echo "ollama_version=${{ env.OLLAMA_VERSION }}" >> "$GITHUB_OUTPUT"
fi

- name: Extract metadata (tags, labels)
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
# Tag with ollama version on default branch
type=raw,value=ollama-${{ steps.version.outputs.ollama_version }},enable={{is_default_branch}}
# Tag "latest" on default branch
type=raw,value=latest,enable={{is_default_branch}}
# Tag with git tag (v1.0.0 -> 1.0.0)
type=semver,pattern={{version}}
# Tag with branch name for release branches
type=ref,event=branch,enable=${{ startsWith(github.ref, 'refs/heads/release/') }}
# Tag with short SHA always
type=sha,prefix=

- name: Build and push
uses: docker/build-push-action@v6
with:
context: .
file: Dockerfile
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
build-args: |
OLLAMA_VERSION=${{ steps.version.outputs.ollama_version }}
cache-from: type=gha
cache-to: type=gha,mode=max
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# IDE
.idea/

# Swap files
*.swp
*.swo

# Open WebUI local data
webui/
36 changes: 36 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Changelog

## 2026-02-12 — Switch to SYCL backend

### GPU backend: Vulkan -> SYCL

- Replaced Vulkan GPU backend with custom-built SYCL backend for ~2x inference
speed on Intel GPUs
- Multi-stage Dockerfile: builds `libggml-sycl.so` from upstream llama.cpp
(commit `a5bb8ba4`) using Intel oneAPI 2025.1.1
- Added `patch-sycl.py` to fix two ollama-specific API divergences:
- `graph_compute` signature (`int batch_size` parameter)
- `GGML_TENSOR_FLAG_COMPUTE` removal (critical — without this patch all
compute nodes are skipped, producing garbage output)
- Bundled oneAPI runtime libraries (SYCL, oneMKL, oneDNN, TBB, Level-Zero)
into the runtime image

### Ollama upgrade: 0.9.3 -> 0.15.6

- Upgraded from IPEX-LLM bundled ollama 0.9.3 to official ollama v0.15.6
- Switched from IPEX-LLM portable zip to official ollama binary
- Removed CUDA/MLX/Vulkan runners from image to reduce size

### Intel GPU runtime stack

- **level-zero**: v1.22.4 -> v1.28.0
- **intel-graphics-compiler (IGC)**: v2.11.7 -> v2.28.4
- **compute-runtime**: 25.18.33578.6 -> 26.05.37020.3
- **libigdgmm**: 22.7.0 -> 22.9.0

### Docker Compose

- Device mapping changed to full `/dev/dri` access for SYCL/Level-Zero
- Added `ONEAPI_DEVICE_SELECTOR=level_zero:0` and `ZES_ENABLE_SYSMAN=1`
- Removed `OLLAMA_VULKAN=1`
- Disabled web UI authentication (`WEBUI_AUTH=False`)
199 changes: 138 additions & 61 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,68 +1,145 @@
FROM ubuntu:22.04
ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=america/los_angeles
# =============================================================================
# Stage 1: Build ggml-sycl backend from ollama's ggml source using Intel oneAPI
# =============================================================================
FROM intel/oneapi-basekit:2025.1.1-0-devel-ubuntu24.04 AS sycl-builder

ARG OLLAMA_VERSION=0.15.6

# Clone ollama source and the MATCHING ggml-sycl source from upstream llama.cpp.
# ollama v0.15.6 vendors ggml at commit a5bb8ba4 — we MUST use the same commit
# to ensure struct layouts, operation enums, and internal APIs match exactly.
# (ollama excludes ggml-sycl from its vendored ggml, but keeps the header)
ARG GGML_COMMIT=a5bb8ba4c50257437630c136210396810741bbf7
RUN git clone --depth 1 --branch v${OLLAMA_VERSION} \
https://github.com/ollama/ollama.git /ollama && \
git init /tmp/llama.cpp && \
cd /tmp/llama.cpp && \
git remote add origin https://github.com/ggml-org/llama.cpp.git && \
git sparse-checkout set ggml/src/ggml-sycl && \
git fetch --depth 1 origin ${GGML_COMMIT} && \
git checkout FETCH_HEAD && \
cp -r /tmp/llama.cpp/ggml/src/ggml-sycl \
/ollama/ml/backend/ggml/ggml/src/ggml-sycl && \
rm -rf /tmp/llama.cpp

WORKDIR /ollama

# Patch ggml-sycl to match ollama's modified ggml backend API:
# 1. graph_compute has an extra int batch_size parameter in ollama
# 2. GGML_TENSOR_FLAG_COMPUTE doesn't exist in ollama's ggml
COPY patch-sycl.py /tmp/patch-sycl.py
RUN python3 /tmp/patch-sycl.py ml/backend/ggml/ggml/src/ggml-sycl/ggml-sycl.cpp

# Build the SYCL backend as a dynamic library
# Note: oneAPI env is already set in the base image, no need to source setvars.sh
RUN cmake -B build \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx \
-DGGML_SYCL=ON \
-DGGML_SYCL_TARGET=INTEL \
-DOLLAMA_RUNNER_DIR=sycl && \
cmake --build build --parallel $(nproc) --target ggml-sycl

# Collect the SYCL runner and its oneAPI runtime dependencies into /sycl-runner
RUN mkdir -p /sycl-runner && \
cp build/lib/ollama/libggml-sycl.so /sycl-runner/ && \
# SYCL / DPC++ runtime
cp /opt/intel/oneapi/compiler/latest/lib/libsycl.so* /sycl-runner/ && \
# Unified Runtime (oneAPI 2025+) — search multiple possible locations
find /opt/intel/oneapi -name 'libur_loader.so*' | head -3 | xargs -I{} cp {} /sycl-runner/ && \
find /opt/intel/oneapi -name 'libur_adapter_level_zero.so*' | head -3 | xargs -I{} cp {} /sycl-runner/ && \
find /opt/intel/oneapi -maxdepth 4 -name 'libumf.so*' | head -3 | xargs -I{} cp {} /sycl-runner/ && \
# oneDNN
cp /opt/intel/oneapi/dnnl/latest/lib/libdnnl.so* /sycl-runner/ 2>/dev/null; \
# oneMKL
cp /opt/intel/oneapi/mkl/latest/lib/libmkl_core.so* /sycl-runner/ && \
cp /opt/intel/oneapi/mkl/latest/lib/libmkl_intel_ilp64.so* /sycl-runner/ && \
cp /opt/intel/oneapi/mkl/latest/lib/libmkl_sycl_blas.so* /sycl-runner/ && \
cp /opt/intel/oneapi/mkl/latest/lib/libmkl_tbb_thread.so* /sycl-runner/ && \
# TBB
cp /opt/intel/oneapi/tbb/latest/lib/intel64/gcc*/libtbb.so* /sycl-runner/ && \
# Intel compiler runtime
cp /opt/intel/oneapi/compiler/latest/lib/libsvml.so /sycl-runner/ && \
cp /opt/intel/oneapi/compiler/latest/lib/libimf.so /sycl-runner/ && \
cp /opt/intel/oneapi/compiler/latest/lib/libintlc.so* /sycl-runner/ && \
cp /opt/intel/oneapi/compiler/latest/lib/libirng.so /sycl-runner/ && \
cp /opt/intel/oneapi/compiler/latest/lib/libiomp5.so /sycl-runner/ && \
# Level-zero PI plugin (legacy, may not exist)
cp /opt/intel/oneapi/compiler/latest/lib/libpi_level_zero.so* /sycl-runner/ 2>/dev/null; \
# SYCL SPIR-V fallback kernels (needed for bfloat16, complex math, etc.)
cp /opt/intel/oneapi/compiler/latest/lib/libsycl-fallback*.spv /sycl-runner/ && \
# Strip debug symbols to reduce size
strip --strip-unneeded /sycl-runner/*.so* 2>/dev/null; true

# =============================================================================
# Stage 2: Runtime image
# =============================================================================
FROM ubuntu:24.04
ENV DEBIAN_FRONTEND=noninteractive \
TZ=America/Los_Angeles

# Base packages
RUN apt update && \
apt install --no-install-recommends -q -y \
software-properties-common \
ca-certificates \
gnupg \
wget \
curl \
python3 \
python3-pip \
ocl-icd-libopencl1

# Intel GPU compute user-space drivers
RUN mkdir -p /tmp/gpu && \
cd /tmp/gpu && \
wget https://github.com/oneapi-src/level-zero/releases/download/v1.18.3/level-zero_1.18.3+u22.04_amd64.deb && \
wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.17791.9/intel-igc-core_1.0.17791.9_amd64.deb && \
wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.17791.9/intel-igc-opencl_1.0.17791.9_amd64.deb && \
wget https://github.com/intel/compute-runtime/releases/download/24.39.31294.12/intel-level-zero-gpu_1.6.31294.12_amd64.deb && \
wget https://github.com/intel/compute-runtime/releases/download/24.39.31294.12/intel-opencl-icd_24.39.31294.12_amd64.deb && \
wget https://github.com/intel/compute-runtime/releases/download/24.39.31294.12/libigdgmm12_22.5.2_amd64.deb && \
dpkg -i *.deb && \
rm *.deb

# Required compute runtime level-zero variables
ENV ZES_ENABLE_SYSMAN=1
RUN apt-get update && \
apt-get install --no-install-recommends -q -y \
ca-certificates \
wget \
zstd \
ocl-icd-libopencl1 \
libhwloc15 && \
rm -rf /var/lib/apt/lists/*

# oneAPI
RUN wget -qO - https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | \
gpg --dearmor --output /usr/share/keyrings/oneapi-archive-keyring.gpg && \
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | \
tee /etc/apt/sources.list.d/oneAPI.list && \
apt update && \
apt install --no-install-recommends -q -y \
intel-oneapi-common-vars=2024.0.0-49406 \
intel-oneapi-common-oneapi-vars=2024.0.0-49406 \
intel-oneapi-compiler-dpcpp-cpp=2024.0.2-49895 \
intel-oneapi-dpcpp-ct=2024.0.0-49381 \
intel-oneapi-mkl=2024.0.0-49656 \
intel-oneapi-mpi=2021.11.0-49493 \
intel-oneapi-dal=2024.0.1-25 \
intel-oneapi-ippcp=2021.9.1-5 \
intel-oneapi-ipp=2021.10.1-13 \
intel-oneapi-tlt=2024.0.0-352 \
intel-oneapi-ccl=2021.11.2-5 \
intel-oneapi-dnnl=2024.0.0-49521 \
intel-oneapi-tcm-1.0=1.0.0-435

# Required oneAPI environment variables
ENV USE_XETLA=OFF
ENV SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
ENV SYCL_CACHE_PERSISTENT=1

COPY _init.sh /usr/share/lib/init_workspace.sh
COPY _run.sh /usr/share/lib/run_workspace.sh

# Ollama via ipex-llm
RUN pip3 install --pre --upgrade ipex-llm[cpp]
# Intel GPU runtimes (release 26.05.37020.3)
# Provides level-zero, IGC, compute-runtime for Intel GPU kernel support
RUN mkdir -p /tmp/gpu && cd /tmp/gpu && \
wget https://github.com/oneapi-src/level-zero/releases/download/v1.28.0/level-zero_1.28.0+u24.04_amd64.deb && \
wget https://github.com/intel/intel-graphics-compiler/releases/download/v2.28.4/intel-igc-core-2_2.28.4+20760_amd64.deb && \
wget https://github.com/intel/intel-graphics-compiler/releases/download/v2.28.4/intel-igc-opencl-2_2.28.4+20760_amd64.deb && \
wget https://github.com/intel/compute-runtime/releases/download/26.05.37020.3/intel-ocloc-dbgsym_26.05.37020.3-0_amd64.ddeb && \
wget https://github.com/intel/compute-runtime/releases/download/26.05.37020.3/intel-ocloc_26.05.37020.3-0_amd64.deb && \
wget https://github.com/intel/compute-runtime/releases/download/26.05.37020.3/intel-opencl-icd_26.05.37020.3-0_amd64.deb && \
wget https://github.com/intel/compute-runtime/releases/download/26.05.37020.3/libigdgmm12_22.9.0_amd64.deb && \
wget https://github.com/intel/compute-runtime/releases/download/26.05.37020.3/libze-intel-gpu1_26.05.37020.3-0_amd64.deb && \
dpkg -i *.deb *.ddeb && rm -rf /tmp/gpu

ENV OLLAMA_NUM_GPU=999
# Install official ollama binary + CPU runners (skip CUDA/MLX/Vulkan)
ARG OLLAMA_VERSION=0.15.6
RUN wget -qO- "https://github.com/ollama/ollama/releases/download/v${OLLAMA_VERSION}/ollama-linux-amd64.tar.zst" | \
zstd -d | tar -xf - -C /usr && \
rm -rf /usr/lib/ollama/cuda_* /usr/lib/ollama/mlx_* /usr/lib/ollama/vulkan

# Install SYCL runner from build stage
COPY --from=sycl-builder /sycl-runner/ /usr/lib/ollama/sycl/

# Clean up
RUN apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
apt-get autoremove -y --purge 2>/dev/null; \
apt-get autoclean -y 2>/dev/null; true

# Serve ollama on all interfaces
ENV OLLAMA_HOST=0.0.0.0:11434

ENTRYPOINT ["/bin/bash", "/usr/share/lib/run_workspace.sh"]
# Keep models loaded in memory
ENV OLLAMA_KEEP_ALIVE=24h
ENV OLLAMA_DEFAULT_KEEPALIVE=6h

# Concurrency and resource limits
ENV OLLAMA_NUM_PARALLEL=1
ENV OLLAMA_MAX_LOADED_MODELS=1
ENV OLLAMA_MAX_QUEUE=512
ENV OLLAMA_MAX_VRAM=0

# Use all GPU layers
ENV OLLAMA_NUM_GPU=999

# Intel GPU tuning
ENV ZES_ENABLE_SYSMAN=1
ENV ONEAPI_DEVICE_SELECTOR=level_zero:0

# For Intel Core Ultra Processors (Series 1), code name Meteor Lake
ENV IPEX_LLM_NPU_MTL=1

EXPOSE 11434
ENTRYPOINT ["/usr/bin/ollama"]
CMD ["serve"]
Loading