Deployment Automation, UI, Audio, and Security Improvements for MiniCPM-o 4.5#1076
Deployment Automation, UI, Audio, and Security Improvements for MiniCPM-o 4.5#1076LujiaJin wants to merge 13 commits intoOpenBMB:mainfrom
Conversation
- Dockerfile.backend: GPU inference container (CUDA 12.8.1) - Dockerfile.frontend: Vue.js + Nginx multi-stage build - docker-compose.yml: orchestration with GPU passthrough - nginx.docker.conf: reverse proxy with SSL support - gen_ssl_cert.sh: self-signed certificate generation - DEPLOY_WSL2_TO_H100_ZH.md: comprehensive deployment guide - Update .gitignore to exclude models/ and build artifacts
…ug scripts; update deployment configs
… details and structure
There was a problem hiding this comment.
Pull request overview
This PR updates the MiniCPM-o web demo and adds an offline/air-gapped deployment bundle aimed at reproducible remote deployment (including mobile HTTPS access), while also upgrading the demo backend to MiniCPM-o 4.5 streaming TTS semantics.
Changes:
- Add Docker-based deployment assets under
deploy/(backend/frontend Dockerfiles, compose file, Nginx reverse proxy template, SSL cert script, and EN/ZH deployment guides). - Update the demo backend (
model_server.py) for MiniCPM-o 4.5 APIs and improve the streaming audio path (in-memory WAV encoding, PCM_16). - Add/replace UI assets (new
miniCPM4.5.svg) and ignoremodels/in git.
Reviewed changes
Copilot reviewed 10 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| web_demos/minicpm-o_2.6/model_server.py | Updates streaming prompt/prefill/generation logic for MiniCPM-o 4.5 and adjusts audio streaming pipeline. |
| web_demos/minicpm-o_2.6/miniCPM4.5.svg | Adds 4.5 SVG asset for frontend UI. |
| web_demos/minicpm-o_2.6/miniCPM2.6-CxDaeLI9.svg.bak | Adds a backup SVG artifact (appears unintended). |
| deploy/requirements.backend.txt | Documents backend Python dependencies for offline install workflows. |
| deploy/nginx.docker.conf | Adds Nginx reverse proxy config intended for SSE/WebSocket + HTTPS mobile access. |
| deploy/gen_ssl_cert.sh | Adds a helper script to generate self-signed certs for HTTPS. |
| deploy/docker-compose.yml | Adds compose-based orchestration for backend + frontend containers. |
| deploy/Dockerfile.frontend | Adds multi-stage frontend build (pnpm + nginx). |
| deploy/Dockerfile.backend | Adds CUDA-based backend image build for inference service. |
| deploy/DEPLOY_WSL2_TO_H100_ZH.md | Adds Chinese offline deployment guide for WSL2 → H100 workflow. |
| deploy/DEPLOY_WSL2_TO_H100_EN.md | Adds English offline deployment guide for WSL2 → H100 workflow. |
| README.md | Adds a documentation link tip near the header. |
| .gitignore | Ignores models/ directory. |
Comments suppressed due to low confidence (10)
web_demos/minicpm-o_2.6/model_server.py:576
librosa.resampleis called for every generated audio chunk. This is relatively expensive and can add noticeable latency/CPU load during streaming. Consider either emitting 24kHz to the frontend (if supported) or switching to a faster resampler (e.g.,torchaudio.functional.resample/scipy.signal.resample_poly) and reusing any state if possible.
# Resample from model's 24kHz to frontend's expected 16kHz
audio_np = librosa.resample(audio_np, orig_sr=sr, target_sr=16000)
deploy/gen_ssl_cert.sh:17
- This script uses placeholders (
<YOUR_CN>,<YOUR_IP1>, etc.) inside the openssl command. If users run it without editing the script, certificate generation will fail (SANIP:fields must be real IPs). Consider accepting CN/SAN as args/env vars and providing safe defaults (e.g., CN=localhost, SAN=DNS:localhost,IP:127.0.0.1).
OUT_DIR="${1:-<YOUR_CERTS_OUTPUT_DIR>}"
mkdir -p "$OUT_DIR"
echo ">>> Generating self-signed SSL certificate to $OUT_DIR ..."
openssl req -x509 -nodes -days 3650 \
-newkey rsa:2048 \
-keyout "$OUT_DIR/server.key" \
-out "$OUT_DIR/server.crt" \
-subj "/C=CN/ST=Local/L=Local/O=MiniCPMo/OU=Dev/CN=<YOUR_CN>" \
-addext "subjectAltName=IP:<YOUR_IP1>,IP:<YOUR_IP2>,DNS:<YOUR_DNS>"
deploy/docker-compose.yml:27
${MODEL_PATH:-<YOUR_MODEL_PATH>}uses an angle-bracket placeholder as the default. IfMODEL_PATHis not set, Docker will try to mount a host path literally named<YOUR_MODEL_PATH>, which is confusing and likely wrong. Prefer a real default path (e.g.,./models/...) or make the variable required.
- ${MODEL_PATH:-<YOUR_MODEL_PATH>}:/models/MiniCPM-o-4_5:ro
deploy/docker-compose.yml:53
${CERTS_PATH:-<YOUR_CERTS_PATH>}uses an angle-bracket placeholder as the default. IfCERTS_PATHis not set, Docker will mount a host path literally named<YOUR_CERTS_PATH>, which can lead to hard-to-debug TLS failures. Prefer a real default (e.g.,./certs) or require the env var.
- ${CERTS_PATH:-<YOUR_CERTS_PATH>}:/etc/nginx/certs:ro
deploy/DEPLOY_WSL2_TO_H100_EN.md:12
- This guide claims CUDA 12.4 matches the Dockerfile base image
cuda:12.4.1, butdeploy/Dockerfile.backendcurrently usesnvidia/cuda:12.8.1-.... Update the guide (or the Dockerfile) so the stated CUDA/base image version is accurate.
| GPU | NVIDIA H100 (driver 550.90.12) |
| CUDA | 12.4 (fully matches the Dockerfile base image `cuda:12.4.1`) |
| Local | Win10 + WSL2 Ubuntu |
deploy/DEPLOY_WSL2_TO_H100_ZH.md:12
- This guide states CUDA 12.4 matches the Dockerfile base image
cuda:12.4.1, butdeploy/Dockerfile.backendcurrently usesnvidia/cuda:12.8.1-.... Please update the guide (or the Dockerfile) to keep the deployment instructions accurate.
| GPU | NVIDIA H100(驱动 550.90.12) |
| CUDA | 12.4(与 Dockerfile 基础镜像 `cuda:12.4.1` 完全匹配) |
| 本地 | Win10 + WSL2 Ubuntu |
deploy/Dockerfile.backend:5
- The deployment docs mention a CUDA 12.4 base image (
cuda:12.4.1), but this Dockerfile usesnvidia/cuda:12.8.1-.... Please align the Dockerfile and the docs on the intended CUDA base image version.
# Base image: NVIDIA CUDA 12.8 + Ubuntu 22.04
# ============================================
FROM nvidia/cuda:12.8.1-devel-ubuntu22.04
deploy/Dockerfile.backend:33
- This image is based on CUDA 12.8, but PyTorch is installed from the
cu124index URL. While it may work, the mixed CUDA targeting is confusing and can lead to unexpected library/runtime issues; consider using a base image matching the wheel CUDA version (or document why the mismatch is intentional).
# ---- PyTorch (CUDA 12.4) ----
RUN pip install --no-cache-dir \
"torch>=2.3.0,<=2.8.0" \
"torchaudio<=2.8.0" \
--index-url https://download.pytorch.org/whl/cu124
deploy/Dockerfile.backend:52
- The backend Dockerfile installs several third-party Python packages from public registries using floating versions (e.g.,
accelerate,librosa,soundfile,onnxruntime,fastapi,uvicorn,aiofiles,pydantic) and a version range fortorch, so each build may pull different artifacts and execute them inside the GPU-enabled backend. This creates a reproducible supply-chain attack surface: if any of these packages or the registry is compromised in the future, rebuilding the image could silently introduce malicious code into internal deployments. Please pin all third-party packages here to exact versions (and ideally hashes or an internal mirror) so backend images are deterministic and auditable, and updates happen only via explicit review.
RUN pip install --no-cache-dir \
"torch>=2.3.0,<=2.8.0" \
"torchaudio<=2.8.0" \
--index-url https://download.pytorch.org/whl/cu124
# ---- MiniCPM-o core dependencies ----
RUN pip install --no-cache-dir \
"transformers==4.51.0" \
accelerate \
"minicpmo-utils[all]>=1.0.5" \
librosa \
soundfile \
onnxruntime \
sentencepiece \
Pillow \
numpy
# ---- Web service dependencies ----
RUN pip install --no-cache-dir \
fastapi \
uvicorn \
aiofiles \
pydantic
deploy/requirements.backend.txt:29
- The backend requirements file is intended for offline
pipinstalls but leaves many critical dependencies (e.g.,accelerate,librosa,soundfile,onnxruntime,fastapi,uvicorn[standard],aiofiles,pydantic,httpx) unpinned, so future installs may pull newer, unvetted versions from public registries. This lack of version pinning undermines reproducibility and exposes deployments to supply-chain risk: a compromised or malicious new release of any of these packages could be introduced into internal environments without any code change. Please pin these packages to specific versions (and ideally hashes that match the tested container image) so offline installs are deterministic and dependency updates go through explicit review.
accelerate
minicpmo-utils[all]>=1.0.5
sentencepiece
# == Audio/Video Processing ==
librosa
soundfile
onnxruntime
Pillow
numpy
# == Web Service ==
fastapi
uvicorn[standard]
aiofiles
pydantic
httpx
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| audio_stream = None | ||
| try: | ||
| with open(input_audio_path, 'rb') as wav_file: | ||
| audio_stream = wav_file.read() | ||
| except FileNotFoundError: | ||
| print(f"File {input_audio_path} not found.") | ||
| logger.warning(f"File {input_audio_path} not found.") | ||
| yield base64.b64encode(audio_stream).decode('utf-8'), "assistant:\n" | ||
|
|
There was a problem hiding this comment.
audio_stream is initialized to None and then base64-encoded even when the input WAV file can't be read. If open(input_audio_path) fails, base64.b64encode(audio_stream) will raise a TypeError and abort the stream. Handle the missing-file case explicitly (e.g., return an error event / skip the initial yield / set audio_stream to b'' and log).
| ```powershell | ||
| $env:SSH_HOST = "127.0.0.1" | ||
| $env:SSH_HOST = "<YOUR_HOST>" | ||
| $env:SSH_PORT = "<YOUR_PORT>" | ||
| $env:SSH_USER = "<YOUR_USER>" | ||
|
|
||
| ## PowerShell Daily Three-Command Quick Reference (Recommended) | ||
|
|
||
| ```powershell | ||
| Set-MiniCPMSSH -Port "<YOUR_PORT>" -User "<YOUR_USER>" | ||
| # 1) Update SSH parameters when port changes | ||
| Set-MiniCPMSSH -Port "54062" -User "your_user" | ||
|
|
||
| # 2) Start mobile mode (open tunnel + print accessible URL) | ||
| Set-MiniCPMSSH -Port "<YOUR_PORT>" -User "<YOUR_USER>" | ||
| Start-MiniCPMMobile | ||
|
|
||
| # 3) Stop tunnel | ||
| Stop-MiniCPMMobile | ||
| scp -P $env:SSH_PORT .\file.tar.gz "$env:SSH_USER@$env:SSH_HOST:<YOUR_PATH>/deploy_pkg/" | ||
|
|
||
| Quick recovery after port change: | ||
|
|
||
| [string]$Host = "<YOUR_HOST>", | ||
| [string]$User = "<YOUR_USER>" | ||
| Restart-MiniCPMMobile | ||
| ``` | ||
|
|
||
| $env:SSH_HOST = $Host | ||
| $env:SSH_PORT = $Port | ||
| $env:SSH_USER = $User | ||
| ssh -p $env:SSH_PORT "$env:SSH_USER@$env:SSH_HOST" | ||
| scp -P $env:SSH_PORT .\file.tar.gz "$env:SSH_USER@$env:SSH_HOST:/data/minicpmo/deploy_pkg/" | ||
| ``` | ||
| Write-Host "[MiniCPM SSH] HOST=$env:SSH_HOST PORT=$env:SSH_PORT USER=$env:SSH_USER" |
There was a problem hiding this comment.
The top section has broken/overlapping fenced code blocks and interleaves shell commands into PowerShell blocks (e.g., a powershell block opened at line 24 is never closed before another powershell starts at line 32). This makes the guide hard to follow and renders incorrectly in Markdown viewers; please fix the fencing and separate PowerShell vs Bash snippets cleanly.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
|
@copilot open a new pull request to apply changes based on the comments in this thread |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Summary
This PR introduces improvements across deployment automation, cross-device access, audio quality, storage performance, and security hygiene for MiniCPM-o 4.5.
Changes
🚀 Remote Deployment Workflow (
deploy/)Added English and Chinese deployment guides (
DEPLOY_WSL2_TO_H100_EN.md,DEPLOY_WSL2_TO_H100_ZH.md) covering the full workflow:docker runUsers can now deploy MiniCPM-o 4.5 in offline or enterprise environments reproducibly, with both PC and mobile browser access supported.
🖼️ UI: Icon Version Patch
Replaced remaining v2.6 icons with v4.5 assets across all frontend pages. Updated icon references in
web_demos/minicpm-o_2.6/miniCPM4.5.svg.🔊 Audio & 💾 Storage Optimization
Refactored the audio pipeline's PCM_16 encoding path and improved buffer management and I/O handling in the backend storage layer, primarily in
web_demos/minicpm-o_2.6/model_server.py. Reduces artifacts and latency in real-time AI voice calls, and lowers stutter during high-concurrency or long-duration sessions.🛠️ Misc
Affected Files