Claude/fix modelscope wheels mp gpd#101
Open
mirai-gpro wants to merge 235 commits intoaigc3d:masterfrom
Open
Conversation
Root cause: defaults.py's default_setup() and default_config_parser() assume a distributed training environment with writable filesystem. On Cloud Run (read-only /app), this causes silent init failures. Changes: - app.py: Skip default_setup() entirely, manually set CPU/single-process config - app.py: Redirect save_path to /tmp (only writable dir on Cloud Run) - app.py: Add GCS FUSE mount path resolution with Docker-baked fallback - cloudbuild.yaml: Add Cloud Storage FUSE volume mount for model serving - cloudbuild.yaml: Increase max-instances to 4 - Include handoff docs and full LAM_Audio2Expression codebase https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
The LAM model file was misidentified as .tar but is actually a PyTorch weights file. Gemini renamed it to .pth on GCS. Also source wav2vec2 config.json from the model directory instead of LAM configs/. https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
- Import gourmet-sp from implementation-testing branch - Add sendAudioToExpression() to shop introduction TTS flow (firstShop and remainingShops now get lip sync data before playback) - Remove legacy event hooks in concierge-controller init() (replaced with clean linkTtsPlayer helper) - Clean up LAMAvatar.astro: remove legacy frame playback code (startFramePlaybackFromQueue, stopFramePlayback, frameQueue, etc.) - Simplify to single sync mechanism: frameBuffer + ttsPlayer.currentTime - Reduce health check interval from 2s to 10s https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
Using official LAM sample avatar as placeholder. Will be replaced with custom-generated avatar later. https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
- Add fade-in/fade-out smoothing (6 frames / 200ms) to prevent Gaussian Splat visual distortion at speech start/end - Parallelize expression generation with TTS synthesis: remaining sentence expression is pre-fetched during first sentence playback, eliminating wait time between segments - Add fetchExpressionFrames() for background expression fetch with pendingExpressionFrames buffer swap pattern - Apply same optimization to shop introduction flow https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
sendAudioToExpression fetch could hang indefinitely (Cloud Run cold start / service down), blocking await and preventing TTS play(). - Add AbortController timeout (8s) to all expression API fetches - Wrap expression await with Promise.race so TTS plays even if expression API is slow/down (lip sync degrades gracefully) - Applied to speakTextGCP, speakResponseInChunks, and shop flow https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
Root cause: sendAudioToExpression fetch hung in browser, blocking await and preventing TTS play() from ever being called. Fix: all expression API calls are now fire-and-forget - TTS playback starts immediately without waiting for expression frames. Frames arrive asynchronously and getExpressionData() picks them up in real-time from the frameBuffer. - Remove await/Promise.race from all sendAudioToExpression calls - Remove fetchExpressionFrames and pendingExpressionFrames (no longer needed - direct fire-and-forget is simpler) - Keep AbortController timeout (8s) inside sendAudioToExpression to prevent leaked connections https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
… calls
Architecture change: expression frames are now returned WITH TTS audio
from the backend, instead of the frontend calling audio2exp directly.
Backend (app_customer_support_modified.py):
- Replace fire-and-forget send_to_audio2exp with get_expression_frames
that returns {names, frames, frame_rate}
- Send MP3 directly to audio2exp (no separate PCM generation needed)
- TTS response: {success, audio, expression: {...}}
- Server-to-server communication: no CORS, stable, fast
Frontend (concierge-controller.ts):
- New queueExpressionFromTtsResponse() reads expression from TTS response
- Remove sendAudioToExpression (direct browser→audio2exp REST calls)
- Remove audio2expApiUrl, audio2expWsUrl, connectLAMAvatarWebSocket
- Remove EXPRESSION_API_TIMEOUT_MS, AbortController timeout
- Existing 1st-sentence-ahead pattern now automatically includes
expression data (no separate API call needed)
https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
…orget proxy - Backend: TTS endpoint no longer blocks on expression generation - Backend: New /api/audio2expression proxy (server-to-server, CORS-free) - Frontend: All expression calls use fireAndForgetExpression() (never blocks TTS play) - Removes ~2s first-sentence delay caused by synchronous expression in TTS https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
…aining Two bugs fixed: 1. Buffer corruption: frames from segment 1 mixed with segment 2 (ttsPlayer.currentTime resets but frameBuffer was concatenated) → Now clear buffer before each new TTS segment 2. 3-second delay: expression frames arrived after TTS started playing → Pre-fetch remaining segment's expression during first segment playback → When second segment starts, pre-fetched frames are immediately available New prefetchExpression() method returns Promise with parsed frames, applied non-blocking via .then() to never delay TTS playback. https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
Architecture change: backend includes expression data in TTS response (server-to-server audio2exp call ~150ms) instead of separate proxy. - Backend TTS endpoint calls audio2exp synchronously, includes result - Frontend applyExpressionFromTts(): instant buffer queue from TTS data - Proxy fireAndForgetExpression kept as fallback (timeout/error cases) - All 5 call sites (speakTextGCP, speakResponseInChunks x2, shop x2) updated - Removes prefetch complexity (TTS response already carries expression) Result: lip sync starts from frame 0, no 2-3 second gap. https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
Architecture redesign for true zero-delay TTS playback: - Backend TTS endpoint starts audio2exp in background thread, returns audio + expression_token immediately (no blocking) - New /api/expression/poll endpoint: frontend polls for result - Frontend pollExpression(): fire-and-forget polling at 150ms intervals - Removes sync expression, proxy, and prefetch approaches Timeline: TTS returns ~500ms, audio2exp completes ~150ms later (background), frontend first poll arrives ~200ms after TTS → expression available ~350ms after playback starts. Previous: 2-3 seconds delay or TTS blocked. https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
…aster response Backend: revert to sync expression in TTS response (remove async cache/polling). Frontend: replace pollExpression with applyExpressionFromTts (sync from TTS response). Frontend: fire sendMessage() immediately while ack plays (don't await firstAckPromise). pendingAckPromise is awaited before TTS playback to prevent ttsPlayer conflict. https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
…nterrupt) unlockAudioParams() does play→pause→reset on ttsPlayer for iOS unlock. When called during ack playback (parallel LLM mode), it kills the ack audio. Skip it when pendingAckPromise is active (audio already unlocked by ack). https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
…rentAudio safety Root cause: ack "はい" gets paused (not ended) by some interruption, so pendingAckPromise never resolves → speakResponseInChunks stuck forever. Fix 1: resolve pendingAckPromise on both 'ended' and 'pause' events. Fix 2: call stopCurrentAudio() after pendingAckPromise resolves to ensure ttsPlayer is clean before new TTS playback. https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
- Container: max-height 650px → height calc(100dvh - 40px), max-height 960px - Avatar stage: 140px → 300px (desktop), 100px → 200px (mobile) - Chat area: min-height 150px guaranteed for message display https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
Post-init camera: Z 1→0.6 (closer), Y 1.8→1.75 (slight down), FOV 50→36 (zoom in). Eliminates wasted space above avatar head in the 300px avatar-stage. https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
Previous: lookAt y=1.8 (head center) + tight zoom → mouth cut off at bottom. Fix: lower target to y=1.62 (nose/mouth center), adjust OrbitControls target to match. Camera Z=0.55, FOV=38 for balanced framing. https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
targetY 1.62→1.66 (avatar lower in frame), camera Y 1.62→1.72 (above target, slight downward angle instead of looking up from below) https://claude.ai/code/session_01C6n4TZ9PPdx46jCevmVo7P
Key improvements over existing lam_modal.py: - @modal.asgi_app() + Gradio 4.x instead of subprocess + patching - Direct Python integration with LAM pipeline (no regex patching) - Blender 4.2 included for GLB generation (OpenAvatarChat format) - Focused UI for concierge.zip generation with progress feedback - Proper ASGI serving resolves Gradio UI display issue on Modal Pipeline: Image → FLAME Tracking → LAM Inference → Blender GLB → ZIP https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
Major update to concierge_modal.py: - Custom video upload: VHAP FLAME tracking extracts per-frame expression/pose parameters from user's own motion video - Video preprocessing pipeline: frame extraction, face detection (VGGHead), background matting, landmark detection per frame - VHAP GlobalTracker integration for multi-frame optimization - Export to NeRF dataset format (transforms.json + flame_param/*.npz) - Gradio UI: motion source selector (custom video or sample) - Preview video with optional audio from source video - Max 300 frames (10s@30fps) cap for manageable processing This enables generating high-quality concierge.zip with custom expressions/movements instead of being limited to pre-set samples. https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
- Replace add_local_dir("./assets") with HuggingFace downloads for all
required model assets (FLAME tracking, parametric models, LAM assets)
- Remove REQUIRED_ASSET local check since assets are fetched at build time
- Build VHAP config programmatically instead of loading from YAML file
- Remove deprecated allow_concurrent_inputs parameter
- Add flame_vhap symlink for VHAP tracking compatibility
- Add critical file verification in _download_models()
Fixes FileNotFoundError: flame2023.pkl not found in container
https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
Replace container-build-time HuggingFace downloads with add_local_dir to mount model files from the user's local LAM repo. This is faster and avoids dependency on HuggingFace availability. - Add _has_model_zoo / _has_assets detection at module level - Mount ./model_zoo and ./assets via add_local_dir (conditional) - Add _setup_paths() to bridge directory layout differences: - assets/human_parametric_models → model_zoo/human_parametric_models - flame_assets/flame2023.pkl → flame_assets/flame/ (flat layout) - flame_vhap symlink for VHAP tracker - Add model file verification with find-based search https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
Modal requires add_local_dir to be the last image build step. Move _setup_model_paths() from run_function (build time) to _init_lam_pipeline() (container startup) to comply with this. https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
User keeps all models under assets/ (not model_zoo/). Instead of symlinking individual subdirectories, symlink the entire model_zoo -> assets when model_zoo doesn't exist. This bridges lam_models, flame_tracking_models, and human_parametric_models all at once. Also adds model.safetensors to the verification checklist. https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
Three files are not available locally and must be downloaded: - model.safetensors (LAM-20K model weights from 3DAIGC/LAM-20K) - template_file.fbx, animation.glb (from Ethan18/test_model LAM_assets.tar) Download runs via run_function BEFORE add_local_dir to satisfy Modal's ordering constraint. Downloads are cached in the image layer. https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
1. Downloaded LAM assets (template_file.fbx, animation.glb) were being overwritten by the add_local_dir mount of assets/. Fix: copy extracted assets into model_zoo/ during build so they survive the mount. Update all path references accordingly. 2. Pin gradio==4.44.0 and gradio_client==1.3.0 to avoid the json_schema_to_python_type TypeError on additionalProperties. https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
1. Switch assets download from Ethan18/test_model (incomplete) to official 3DAIGC/LAM-assets which includes sample_oac/ with template_file.fbx and animation.glb. 2. Monkey-patch gradio_client._json_schema_to_python_type to handle boolean additionalProperties schema (TypeError on bool). https://claude.ai/code/session_01XXVR6KsYFAQiJjHvdzCzoK
Instead of silently falling back to GitHub-built CUDA extensions (which produce "bird monster" artifacts), raise an error at build time if the wheels/ directory is empty. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
…e builds Replace all GitHub source builds of diff-gaussian-rasterization, simple-knn, and nvdiffrast with the official pre-built wheels from the 3DAIGC/LAM HuggingFace Space. This eliminates the need for local wheel files and ensures bit-identical CUDA extensions to the official ModelScope demo. Removes: wheels/ directory, local wheels override logic, GitHub clone+build steps https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
…ub source builds" This reverts commit ba006bb.
GitHub source builds of diff_gaussian_rasterization, simple_knn, nvdiffrast were redundant (overwritten by ModelScope wheels) and dangerous (caused "bird monster" artifacts when wheels were missing). Removed from both app_modal.py and Dockerfile. Only the local ModelScope official wheels in wheels/ are now used. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
…tch.py Matches official ModelScope app.py line 386. Was applied to concierge_modal.py (F1) but missed in lam_avatar_batch.py. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
Remove URL downloads for pytorch3d and fbx. All 4 wheels (pytorch3d, diff_gaussian_rasterization, simple_knn, fbx) are now installed only from the local wheels/ directory. No URL fallback. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
…g local wheels The local wheels/ directory only contains .gitkeep (no actual .whl files), causing the build to abort. Restore the working approach of downloading pytorch3d, diff_gaussian_rasterization, simple_knn, nvdiffrast, and fbx wheels directly from HuggingFace/official URLs during image build. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
…requiring local wheels" This reverts commit 55d2fdd.
The relative path ./wheels was resolved against the CWD, not the script location. Use Path(__file__).resolve().parent / "wheels" to reliably find the wheels directory regardless of where modal run is invoked from. Also log discovered .whl files for debugging. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
Windows Path() with backslashes can cause issues in some contexts. Using forward slashes which Python handles correctly on all platforms. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
The wheel files exist on the local Windows machine but not inside the Modal container. Guard the check with MODAL_IS_REMOTE env var so it only runs during `modal deploy` locally. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
nvdiffrast was never installed in the image definition, causing ModuleNotFoundError when _precompile_nvdiffrast() ran during image build. Added pip install from NVlabs GitHub before the precompile step. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
… official app.py Official ModelScope app.py installs nvdiffrast from ./external/nvdiffrast/ which is bundled in the LAM repo (already cloned to /root/LAM). https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
aigc3d/LAM repo does NOT contain external/nvdiffrast/ directory - that only exists in the ModelScope Space bundle. The LAM requirements.txt specifies the ShenhanQian fork with backface-culling branch. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
…l app.py Official app.py does: pip install ./external/nvdiffrast/ Since aigc3d/LAM clone doesn't include it, we clone NVlabs/nvdiffrast into the same path first, then install from there. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
Use the same vendored nvdiffrast that the official ModelScope Space bundles, sourced from mirai-gpro/LAM_gpro lam-large-upload branch. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
…ecated decorator - add_local_dir without copy=True prevents subsequent run_commands (Modal requires add_local_* to be last, or use copy=True to embed files into the image) - Rename @modal.web_endpoint to @modal.fastapi_endpoint (deprecated since 2025-03) https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
pytorch3d requires iopath for load_obj. Without it, the import falls through to a local utils.pytorch3d_load_obj fallback that fails with "No module named 'utils'" due to missing sys.path entry. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
Root cause of "bird monster" artifacts identified: - xformers was installed (official ModelScope UNINSTALLS it) DINOv2 uses different attention paths with/without xformers - nvdiffrast was built from GitHub source (official uses pre-built wheel) - numpy was 1.26.4 (official uses 1.23.0) Changes: - Remove xformers installation, add explicit uninstall after wheels - Remove nvdiffrast GitHub clone/build, use wheel instead - Install all 5 wheels individually with --force-reinstall (matching official order) - Change numpy from 1.26.4 to 1.23.0 - Add postmortem document https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
Claude Code repeatedly replaces ModelScope official wheels with GitHub
source builds due to general knowledge bias ("GitHub is authoritative").
This has happened 10+ times across sessions, each time causing the
"bird monster" avatar corruption.
CLAUDE.md is auto-read by Claude Code at session start, enforcing:
- Never replace wheels with GitHub source builds
- Never install xformers
- Never change numpy version
- Always read project docs before modifying code
https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
…onship Clarify that lam_avatar_batch.py imports app_modal.py, and that modal run triggers the full image build + inference pipeline. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
The --force-reinstall wheels can pull in numpy 2.x as a dependency, which crashes cpu_nms.so (compiled against numpy 1.x) at runtime with: "numpy.core.multiarray failed to import" https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
…eqs calls head_utils.prepare_motion_seqs() does not accept max_squen_length. Passing it causes TypeError at runtime. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
The official ModelScope head_utils.py has max_squen_length param but the GitHub LAM repo version does not. Add a sed patch during container build to match the official ModelScope function signature. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
BREAKING: The container was using `git clone https://github.com/aigc3d/LAM.git` which is a DIFFERENT codebase from the official ModelScope version. Key differences found: - head_utils.py: 421 lines (GitHub) vs 633 lines (ModelScope) - Missing max_squen_length parameter in GitHub version - Different model paths (model_zoo/ vs pretrained_models/) - Missing render_flame_mesh_gaga19() in GitHub version Now uses lam_modelscope/ directory extracted from lam-large-upload branch (the official ModelScope LAM_Large_Avatar_Model/ source). This eliminates the need for the head_utils.py sed patch. https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
…oded wheels path
- Replace add_local_dir("lam_modelscope") with add_local_dir("LAM_Large_Avatar_Model")
- Remove hardcoded C:/Users/hamad/LAM/wheels path and separate wheels mount
- Wheels now served from /root/LAM/wheels/ (included in LAM_Large_Avatar_Model/)
- Matches official ModelScope app.py structure: code + wheels in same directory
https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
…tar_Model
Wheels are at C:\Users\hamad\LAM\wheels\, not inside LAM_Large_Avatar_Model/.
Use add_local_dir("wheels") to mount them to /tmp/modelscope_wheels/ in the
container, matching the relative path from the modal run working directory.
https://claude.ai/code/session_0168ZsGdzsYF7SxETMDaKgt5
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.