Claude/platform design docs o e vkm by mirai-gpro · Pull Request #99 · aigc3d/LAM

mirai-gpro · 2026-03-02T06:23:36Z

No description provided.

Test scripts to verify A2E (Audio2Expression) lip sync quality with Japanese audio input, before investing in ZIP motion replacement or VHAP Japanese FLAME params. Includes: - generate_test_audio.py: EdgeTTS Japanese/English/Chinese audio samples - test_a2e_cpu.py: A2E model loading, Wav2Vec2 feature extraction, ZIP validation - save_a2e_output.py: Capture A2E 52-dim ARKit blendshape output - analyze_blendshapes.py: Lip sync quality scoring and language comparison - setup_oac_env.py: Auto-detect known OpenAvatarChat issues (CPU mode, deps, config) - chat_with_lam_jp.yaml: Corrected config (Gemini API + EdgeTTS ja-JP-NanamiNeural) - run_all_tests.py: Master test runner - TEST_PROCEDURE.md: Step-by-step test procedure https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Fix RuntimeError: Input data type <class 'list'> is not supported. - diagnose_onnx_error.py: Tests SileroVAD ONNX, SenseVoice, data flow - patch_vad_handler.py: Fixes timestamp[0] NoneType bug, adds defensive numpy type checking on ONNX inputs, handles 2/3-output model variants - setup_oac_env.py: Adds VAD handler bug detection (check 7/7) https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Simple test script that verifies environment, model files, data_bundle.py fix, Wav2Vec2 loading, and A2E module import. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Gemini's OpenAI-compatible API sometimes returns delta.content as dict/list instead of string, causing TypeError in set_main_data(). This patch script detects and safely converts non-string content before passing to data_bundle. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

gemini-2.0-flash returns 404 "no longer available to new users". The error dict then cascades into the set_main_data TypeError. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

SenseVoice auto-detection defaults to Chinese (<|zh|>), causing Japanese speech to be misrecognized as Chinese text. This patch forces language="ja" in the generate() call. - patch_asr_language.py: Auto-patches asr_handler_sensevoice.py - chat_with_lam_jp.yaml: Added language: "ja" to SenseVoice config - TEST_PROCEDURE.md: Added Step 4.5 for patch application https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Instead of creating a separate config file, this script patches the existing working config/chat_with_lam.yaml with 3 changes: 1. TTS voice → ja-JP-NanamiNeural 2. LLM system_prompt → Japanese 3. ASR language → ja https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Root cause analysis from production logs: - 1st ASR call: rtf=0.629 (1.25s) - OK - 2nd ASR call: rtf=15.027 (29.83s) - GPU memory exhausted, CPU fallback - fastrtc 60s timeout triggers, resets frame pipeline → system unresponsive Fix: Add torch.cuda.empty_cache() + gc.collect() after each SenseVoice and LAM inference to free GPU memory between calls. Also adds startup wrapper with PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Create the missing Audio2Expression inference service that bridges gourmet-support backend (which already has A2E hooks in /api/tts/synthesize) with the actual Wav2Vec2 + LAM A2E decoder pipeline. Services: - audio2exp-service: Flask API accepting MP3 audio, returning 52-dim ARKit blendshape coefficients at 30fps. Includes Wav2Vec2 feature extraction and fallback mode when A2E decoder is unavailable. - Frontend ExpressionManager: Maps A2E blendshapes to GVRM bone system, syncing with audio playback via currentTime. Architecture: TTS → MP3 → audio2exp-service → 52-dim blendshapes → frontend https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

The a2e_engine now searches multiple patterns for the checkpoint: - models/LAM_audio2exp_streaming.tar (flat, user's actual layout) - models/LAM_audio2exp/pretrained_models/*.tar (OpenAvatarChat layout) - models/LAM_audio2exp/*.tar (intermediate layout) Falls back to rglob search if none match. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Full drop-in replacement for gourmet-sp's concierge-controller.ts with Audio2Expression integration applied. Key changes marked with ★ comments: - ExpressionManager import and initialization - session_id added to /api/tts/synthesize requests - A2E expression data used for lip sync when available - FFT-based lip sync preserved as fallback - Proper cleanup in stopAvatarAnimation() and dispose() https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Replaces the scaffold version with the real concierge-controller.ts from gourmet-sp (claude/test-concierge-modal-rewGs branch). A2E integration is already built-in via applyExpressionFromTts() + lamAvatarController. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

uvicorn is an ASGI server (FastAPI/Starlette) and cannot serve Flask (WSGI). This caused the Cloud Run container to fail to start and listen on the port, resulting in deployment timeout. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Covers all components: backend (gourmet-support), frontend (gourmet-sp), audio2exp-service, A2E frontend patches, official HF Spaces ZIP generation procedure, test suite, deployment config, and end-to-end data flow diagrams. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

The audio2exp-service returns frames as arrays of numbers (number[][]), but applyExpressionFromTts expected objects with a .weights property ({weights: number[]}[]), causing TypeError and empty frame buffer. Changed f.weights[i] to frameData[i] to match the actual backend format. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

…AvatarController) The previous implementation used window.lamAvatarController which doesn't exist in this codebase, causing lip sync to completely fail (buffer=0, jaw=0, mouth=0). Additionally, the data format was wrong (f.weights[i] vs the actual number[][] response). Now uses ExpressionManager (vrm-expression-manager.ts) which: - Correctly handles the number[][] frame format from audio2exp-service - Syncs to audioElement.currentTime for accurate lip sync timing - Maps ARKit blendshapes (jawOpen, mouthFunnel, etc.) to GVRM bone system - Calls renderer.updateLipSync() directly Changes: - Import ExpressionManager and initialize in init() - Replace lamAvatarController dependency with ExpressionManager - Add expressionManager.stop() in stopAvatarAnimation() - All 5 call sites (speakTextGCP, speakResponseInChunks x2, shop TTS x2) now correctly drive lip sync through ExpressionManager https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

The import '../avatar/vrm-expression-manager' caused a Vite build error because that file doesn't exist in gourmet-sp's src/scripts/avatar/. Solution: inline the ExpressionManager class directly into concierge-controller.ts. This eliminates the need to copy a separate file into gourmet-sp and avoids import resolution issues. The ARKIT_INDEX map is trimmed to only the 7 mouth-related blendshapes actually used for lip sync (jawOpen, mouthFunnel, mouthPucker, etc.) https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Root cause: this.guavaRenderer doesn't exist on CoreController. LAMAvatar.astro has its own animation loop with buffer/ttsActive state. The ExpressionManager approach was completely wrong architecture. Correct approach: use window.lamAvatarController exposed by LAMAvatar.astro - setExternalTtsPlayer(): links ttsPlayer so LAMAvatar can track playback - queueExpressionFrames(): feeds A2E frames into LAMAvatar's buffer - clearFrameBuffer(): clears buffer on stop/new segment Changes: - Remove inlined ExpressionManager class (120 lines of dead code) - Restore lamAvatarController.setExternalTtsPlayer() with retry (500ms x 20) - applyExpressionFromTts: convert number[][] → {name: value}[] and queue - stopAvatarAnimation: call clearFrameBuffer() to close mouth Console should now show: - "[Concierge] ✅ Linked ttsPlayer with LAMAvatar controller" - "[Concierge] A2E: N frames queued @ 30fps" - LAM Health: buffer>0, ttsActive=true during speech https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

… code Read the ACTUAL LAMAvatar.astro, lam-websocket-manager.ts, and audio-sync-player.ts from gourmet-sp to understand the real architecture. Key findings: - LAMAvatar.getExpressionData() is called at 60fps by renderer - It reads frameBuffer[floor(ttsPlayer.currentTime * frameRate)] - Requires: externalTtsPlayer linked, frameBuffer filled, ttsActive=true - ttsActive is set by play event (requires setExternalTtsPlayer first) 4 chains must ALL work for lip sync: Chain1: Backend must return expression data (needs AUDIO2EXP_SERVICE_URL) Chain2: setExternalTtsPlayer must link ttsPlayer with LAMAvatar Chain3: applyExpressionFromTts must convert & queue frames Chain4: LAMAvatar renders from frameBuffer synced to currentTime Added diagnostic logs at each chain point: [A2E Chain1] expression received or null (backend config issue) [A2E Chain2] setExternalTtsPlayer success or LAMAvatar not found [A2E Chain3] frames queued with jawOpen sample value https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

…meBuffer, support both frame formats Compared with the ORIGINAL gourmet-sp concierge-controller.ts (from claude/test-concierge-modal-rewGs branch) and found 2 bugs: 1. stopAvatarAnimation() called clearFrameBuffer() which resets fadeOutStartTime=null, breaking LAMAvatar's graceful 200ms fade-out. The ORIGINAL code trusts LAMAvatar's own ended event handler. → Removed clearFrameBuffer() from stopAvatarAnimation() 2. Frame data format mismatch: - Original gourmet-sp: f.weights[i] (expects {weights: number[]}[]) - audio2exp-service: number[][] (raw arrays) → Now supports BOTH formats: Array.isArray(f) ? f : f.weights Key fact: before A2E changes, lip sync was working via the renderer's built-in FFT analysis. The A2E code path was dead code (AUDIO2EXP_SERVICE_URL not set). These changes ensure A2E is a pure overlay that doesn't break the existing FFT lip sync. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Root cause: When AUDIO2EXP_SERVICE_URL is set, the backend returns expression data. The original code's applyExpressionFromTts used f.weights[i] on raw number[] arrays, causing TypeError → caught by outer try/catch → isAISpeaking=false → STT worked (lucky bug). My both-format fix removed this error, so audio playback proceeds. But if the browser blocks autoplay (fires play then immediate pause), onended never fires → playPromise never resolves → initializeSession hangs → buttons never enabled → STT completely broken. Fix: Add onpause deadlock prevention to ALL 8 play-and-wait patterns, matching the existing pattern in ack playback (line 588): this.ttsPlayer.onpause = () => { if (this.ttsPlayer.currentTime < 0.1) done(); }; This detects "play then immediate pause" (autoplay block) and resolves the promise, preventing deadlock. Normal mid-playback pauses (currentTime > 0.1) are not affected. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

オリジナルのgourmet-sp concierge-controller.tsとの差分を最小化。唯一の実質変更は applyExpressionFromTts メソッドのみ: - フレーム形式: f.weights[i] → Array.isArray(f) ? f : (f.weights || []) (audio2exp-service の number[][] 形式に対応) - try/catch で非致命的エラーとして処理 - その他全メソッド(speakTextGCP, STT, sendMessage等)はオリジナルと同一 https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

…ration Previous patches removed all GVRM renderer integration (import, guavaRenderer, setupAudioAnalysis, startLipSyncLoop) and replaced with non-existent window.lamAvatarController calls, causing all A2E data to be silently dropped and lip sync to degrade to basic jaw flapping. This rewrite is based on the actual production concierge-controller.ts with minimal A2E additions: - Restore GVRM import, guavaRenderer, setupAudioAnalysis, startLipSyncLoop - Add a2eFrames/a2eFrameRate/a2eNames properties for expression storage - Add setA2EFrames() to store expression data from TTS response - Add computeMouthOpenness() to convert 52-dim ARKit blendshapes to scalar - Modify startLipSyncLoop() to use A2E frames when available, FFT as fallback - Override speakTextGCP() with inline fetch to include session_id - Add session_id to ALL TTS requests (ack, chunks, shop flow) https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

…t GVRM) Root cause: The patch was based on gourmet-support's concierge-controller.ts which uses GVRM renderer, but the actual deployed frontend (gourmet-sp) uses LAMAvatar.astro with a completely different rendering pipeline. Previous patch problems: - Added GVRM import/renderer that doesn't exist in gourmet-sp - Missing linkTtsPlayer() - LAMAvatar never received ttsPlayer reference -> ttsActive=false, buffer=0, lip sync completely dead - Added setupAudioAnalysis()/startLipSyncLoop() for FFT - unnecessary with LAMAvatar - Called clearFrameBuffer() in stopAvatarAnimation() - breaks LAMAvatar fade-out Fix: Use the exact gourmet-sp version which correctly: - Links ttsPlayer to LAMAvatar via setExternalTtsPlayer() in init() - Sends A2E frames via applyExpressionFromTts() -> lamAvatarController.queueExpressionFrames() - Lets LAMAvatar handle all lip sync rendering internally - Does NOT call clearFrameBuffer() in stopAvatarAnimation() https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

…rpolate frames Changes to applyExpressionFromTts(): 1. Mouth blendshape amplification: Scale jawOpen (1.4x), mouthFunnel/Pucker (1.5x), mouthSmile (1.3x), mouthStretch (1.2x) etc. for more visible Japanese vowel distinctions (あ/い/う/え/お) 2. Frame interpolation: 30fps→60fps via linear interpolation between consecutive frames, matching the renderer's ~60fps render loop for smoother animation 3. Diagnostic logging: jawOpen/mouthFunnel/mouthSmile max/avg values logged per expression segment for live quality monitoring 4. LinkTtsPlayer retry: Multiple retry attempts (500ms, 1s, 2s, 4s) with logging to reliably connect ttsPlayer to LAMAvatar even with async initialization Quality context: A2E streaming model (wav2vec2-base-960h, no transformer) produces subtle Japanese phoneme variations. Frontend amplification makes these visible. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

… objects) The user rewrote audio2exp-service with a2e_engine.py (Flask) which returns frames as plain arrays [[0.1, ...], ...] instead of the old FastAPI format [{"weights": [0.1, ...]}, ...]. Frontend now detects both formats: Array.isArray(f) ? f : f.weights https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Step 1: Add __testLipSync() diagnostic to concierge-controller.ts patch - Generates 5 Japanese vowel patterns (あいうえお) with known ARKit values - Creates silent WAV audio, queues frames to LAMAvatar, plays through ttsPlayer - Verifies whether renderer supports full 52-dim blendshapes Step 3: Fix a2e_engine.py to use the proper LAM INFER pipeline - Restore LAM_Audio2Expression module (engines, models, utils, configs) - Rewrite _load_a2e_decoder → _try_load_infer_pipeline using INFER.build() - Use infer_streaming_audio() with context for chunked processing - Includes full postprocessing: smooth_mouth, frame_blending, savitzky_golay, symmetrize, eye_blinks - Falls back to Wav2Vec2 energy-based approximation when INFER unavailable - Add librosa, scipy, addict to requirements.txt - Add libsndfile to Dockerfile https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Three issues fixed during local testing: 1. transformers v5.x requires ignore_mismatched_sizes=True and attn_implementation="eager" for Wav2Vec2Model.from_pretrained() 2. HuggingFace checkpoint is double-wrapped (tar.gz containing pretrained_models/lam_audio2exp_streaming.tar) - auto-extract 3. Bare except in infer.py swallowed tracebacks and crashed on uninitialized output_dict - now logs actual error and recovers Result: audio2exp-service starts with mode="infer" and produces 52-dim ARKit blendshapes from audio input. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Exclude downloaded model weights (wav2vec2, LAM checkpoint ~1.1GB) from version control. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Flask's app.run() auto-loads .env files, which crashes with UnicodeDecodeError if a non-UTF-8 .env exists in the path. Pass load_dotenv=False since env vars are set externally. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

A2E model output characteristics: - jawOpen: very weak (avg ~0.05) → 1.8x to prevent mumbling - mouthLowerDown: very strong (raw ~0.84) → 0.45x to prevent jaw pull - All other channels: 1.0 (neutral baseline) https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

Previous tuning (jawOpen 1.8x, mouthLowerDown 0.45x) caused lipsync to appear completely stopped. Reverting to 1.0 baseline to restore working state before re-tuning. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

…ic logs Root cause: INFER.build() hangs indefinitely on Cloud Run CPU, blocking the engine from ever becoming ready. All /api/audio2expression requests return 503, so TTS responses have no expression data → no lipsync. Changes: 1. a2e_engine.py: wrap _try_load_infer_pipeline() in a timeout thread (INFER_LOAD_TIMEOUT env var, default 600s). On timeout, fall back to Wav2Vec2 mode which provides approximate lipsync immediately. 2. a2e_engine.py: add timing logs at each step (import, config parse, INFER.build, model.to) to pinpoint the bottleneck. 3. Dockerfile: pre-extract model tar archive at build time, saving ~7 minutes of runtime extraction on every cold start. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

INFER.build() can complete on CPU given enough time (previous instance logs showed successful weight loading). Default 600s was too short. With tar pre-extraction saving 7 min, INFER.build() needs ~10-15 min. 1200s (20 min) provides sufficient margin. Wav2Vec2 fallback remains as safety net but should not normally be needed. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

Previous successful deployment used ENGINE_LOAD_TIMEOUT=1500. Match the INFER_LOAD_TIMEOUT default to the same proven value. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

INFER model outputs jawOpen in 0.13-0.32 range, causing mumbling appearance. Scale all blendshapes by 1.8x (clamped to 0-1) to improve mouth visibility. Tunable via EXPRESSION_SCALE env var without redeploying. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

…ode architecture Design document covering the plan to evolve the gourmet-support system into a reusable platform supporting multiple AI application modes (gourmet concierge, customer support, interview) with Gemini Live API integration, while preserving existing endpoints for alpha testing. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

Root cause: A2E fallback mode outputs noisy per-frame blendshape values (e.g., jaw oscillating 0.5→0.09→0.27 between frames) which are applied directly to the 3D avatar without any frame-to-frame smoothing, causing visible choppy vibration (カクカク). Frontend fix (LAMAvatar.astro): - Add exponential moving average (EMA) with alpha=0.35 to getExpressionData() - Each frame blends smoothly with the previous: smoothed = prev + 0.35*(target-prev) - At 60fps this gives ~95% convergence in ~130ms — smooth yet responsive - Reset EMA state on buffer clear and expression reset Backend fix (a2e_engine.py): - Upgrade fallback smoothing from 3-frame uniform to 2-pass filter: Pass 1: 5-frame Gaussian-like kernel [0.06, 0.24, 0.40, 0.24, 0.06] Pass 2: 3-frame uniform for additional smoothness - Approximates the INFER pipeline's savitzky_golay post-processing https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

Claude による 7コミット分の変更を全て取り消し、前回ヘルスチェック OK だった e36190d の状態に完全復元。取り消し対象: - 770dfd1 INFER load timeout + Wav2Vec2 fallback - 84902f6 INFER_LOAD_TIMEOUT 1200s - 38e9f24 INFER_LOAD_TIMEOUT 1500s - ce103ad conservative expression scaling - 2964376 EMA temporal smoothing - d466f6a revert to Streaming model - 36bf69b switch to Non-Streaming full model https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

Claude による2コミット分の変更を取り消し: - bae5578 reset all parameters to neutral baseline - 2964376 EMA temporal smoothing https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

前回AI作成の PLATFORM_DESIGN.md の信頼性問題を踏まえ、別のAI/エンジニアに設計を一からやり直してもらうための指示書。内容: - 現状構成の確定事実と未確認事項の明確な区別 - 前回設計書の各セクション信頼性評価 - プラットフォーム化・LiveAPI統合の要件 - 参照すべきリポジトリ・論文・OSSの一覧 https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

- §0: プラットフォーム化の目的と直近のゴール（なぜやるのか） - §0.4: 短期記憶（stt_stream.py）と長期記憶（gourmet-support）が別々に開発された経緯と、統一仕様化の要件を明示 - §2.3: LiveAPI導入理由（レイテンシ・割り込み・相槌の問題解決） - §2.3: FLASH版の累積文字数制限と回避ロジックの詳細（コード行番号付き） - §2.3: Live/RESTハイブリッド方式の説明 - §4: 設計書必須セクションに「記憶機能の統一設計」を追加 https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

DESIGN_REQUEST.md に基づき、実コード読解をベースに2文書を新規作成: - PLATFORM_REQUIREMENTS.md (要件定義書) - 現状分析（audio2exp-service, stt_stream.py, frontend-patches の確認済み事実） - 機能要件（マルチモード、Live API統合、記憶機能、音声処理、アバター） - 非機能要件（パフォーマンス、可用性、拡張性、デバイス対応） - 技術的制約、未確認事項、リスク評価 - PLATFORM_ARCHITECTURE.md (設計書) - 全体アーキテクチャ（Gateway Layer + サービス層） - データフロー（REST API経路 / Live API経路の両方を明示） - Live API統合設計（LiveRelay, ReconnectManager, SpeechDetector） - 記憶機能の統一設計（SessionMemory + LongTermMemory） - API設計（/api/v2/ 新エンドポイント + 既存互換） - iPhone SE対応戦略（方式A/B判断基準） - 開発ロードマップ（Phase 0-3）確認済み/推定/未検証を全箇所で明記。 https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

既存の gourmet-sp / gourmet-support に実装済みの4言語対応（ja/en/ko/zh）をプラットフォーム設計に組み込み: PLATFORM_REQUIREMENTS.md: - §4.7 多言語対応: 現状の実装状況を確認済み/推定で整理 - フロントエンド: t(), LANGUAGE_CODE_MAP, splitIntoSentences() - バックエンド: language パラメータ, TTS言語指定 - Live API: 日本語のみ（ja-JP ハードコード）の制約を明記 - FR-I18N-01〜07: 多言語機能要件を追加 PLATFORM_ARCHITECTURE.md: - §7 多言語対応設計（新セクション）: - LanguageConfig（言語マスター）設計 - SpeechRules（言語別文分割・途切れ検知）設計 - Live API の speech_config.language_code 動的設定 - フロントエンド i18n 設計（I18n クラス） - Session への language フィールド統合 - セクション番号を8〜12に繰り下げ - API設計にsupported_languages追加 https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

要件定義書・設計書に基づき、Gemini Live API を Web で利用するためのプラットフォームバックエンドを実装。主要コンポーネント: - server.py: FastAPI エントリーポイント (REST + WebSocket) - live/relay.py: Browser ↔ Gemini Live API WebSocket 中継 - live/reconnect.py: 累積文字数制限による自動再接続 - live/speech_detector.py: 多言語発話未完了検出 (ja/en/ko/zh) - memory/session_memory.py: 20ターン短期記憶 + コンテキスト要約 - session/manager.py: セッションライフサイクル管理 - services/a2e_client.py: audio2exp-service 非同期クライアント - i18n/language_config.py: 4言語プロファイル - modes/: プラグインアーキテクチャ (BaseModePlugin, GourmetModePlugin) アバター連携: Live API 経路でも A2E → Expression WebSocket 送信で LAMAvatarController の frameBuffer と連携。 https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

- Python標準ライブラリの platform モジュールとの名前衝突を回避 - 全モジュールの import パスを from platform.* → from lam_platform.* に修正 - uvicorn のアプリパスも lam_platform.server:app に修正 - Cloud Run デプロイ用 Dockerfile を追加 - .dockerignore を追加 https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

- パッケージ名を support_base に変更 - 全モジュールの import パスを support_base.* に修正 - Dockerfile の COPY パス・CMD も support_base に修正 https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

gourmet-support の3コアファイルを support_base/core/ に統合: - api_integrations.py: HotPepper/TripAdvisor/Google Places API 連携 (そのまま) - long_term_memory.py: Supabase 長期記憶管理 (そのまま) - support_core.py: SupportSession/SupportAssistant + GCS プロンプト読込 (import パス修正のみ) Flask → FastAPI 変換: - rest/router.py: 全 REST エンドポイントを FastAPI APIRouter に変換 (session/start, chat, finalize, cancel, tts/synthesize, stt/transcribe, stt/stream) - server.py: REST ルーターを include_router で統合プラグイン改善: - gourmet/plugin.py: ハードコードプロンプト → GCS/ローカル読込プロンプトを優先使用設定追加: - config/settings.py: PROMPTS_BUCKET_NAME, Google/Supabase API キー - requirements.txt: google-cloud-*, supabase, requests 追加 https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

support_core.py: - google.cloud.storage を optional import に変更 (未インストールでも起動可) - Gemini クライアント初期化を try-except で wrap (API キー未設定でも起動可) - load_prompts_from_gcs() に GCS 利用可否チェック追加 cloudbuild.yaml: - GitHub → Cloud Build → Cloud Run 自動デプロイパイプライン - support_base/ をビルドコンテキストに指定 - substitutions で SERVICE_NAME/REGION/MEMORY を設定可能 https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

この環境からは support-base リポへの直接プッシュが不可のため、ローカルPCで実行するスクリプトを用意。使い方: bash scripts/push_support_base.sh https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

- rsync → cp -r + find で置換（Windows に rsync がない） - mktemp -d → mkdir -p で置換（互換性向上） https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

グルメコンシェルジュのLive API化に向けたフロントエンド基盤を構築。 PLATFORM_ARCHITECTURE.md §8 の設計に準拠し、既存パッチのオーディオコード（iPhone 16/17 対策済み）は一切改変せず、プラットフォーム設計思想を反映したモジュール分割を実施。構成: - live-ws-client.ts: LiveRelay WebSocket クライアント (relay.py プロトコル準拠) - audio-io.ts: Live API 用 PCM I/O (既存AudioManagerパターン踏襲) - dialogue-manager.ts: REST/Live 切替の共通インターフェース - platform-controller.ts: メインコントローラー (ConciergeController パターン踏襲) - gourmet-mode.ts: グルメモード固有ロジック - expression-manager.ts: vrm-expression-manager.ts の re-export - index.astro: メインページ (アバター + チャットパネル) https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

コンシェルジュモード・グルメモード両方のLiveAPI化、プロンプトGCS外部保存、短期/長期記憶、検索SDK、多言語対応、マルチデバイス、実写アバターリップシンクの7要件を網羅。 support_base の実コード読解に基づく実装ステータスとAPI仕様を記載。 https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

- フロントエンドをgourmet-sp→gourmet-sp2に修正（アバター実装・テスト済） - gourmet-sp2の全モジュール構成・コントローラー階層を追記 - LAMAvatarコンポーネント・GS/GVRMレンダラーの詳細仕様を追記 - TTS+A2E同期フロー・リップシンク診断テストを追記 - Vercelデプロイ設定手順を追記（現在未連携→連携予定） - iOS/Android AudioWorkletの実装差分を追記 - PWA対応を「未着手」→「実装済み」に更新 - 13章の実装ステータスをgourmet-sp2の実コード確認結果で全面更新 https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

Implementation of PLATFORM_SPEC_v2.md for gourmet-sp2 frontend: New platform modules (src/scripts/platform/): - live-ws-client.ts: WebSocket client for support_base LiveRelay - live-audio-io.ts: PCM 16kHz mic + PCM 24kHz playback via AudioContext - dialogue-manager.ts: REST/Live API unified session & dialogue layer Modified controllers (src/scripts/chat/): - core-controller.ts: DialogueManager integration, Live API events, mic streaming, all APIs via /api/v2/ - concierge-controller.ts: Live API expression -> LAMAvatar, TTS via DialogueManager, session via /api/v2/ Deployment config: - vercel.json: COOP/COEP headers, API proxy rewrites - .env.example: PUBLIC_API_URL documentation These files are committed in gourmet-sp2 repo locally and mirrored here for reference. Build verified: astro build passes. https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

Handover doc for pushing the Live API platform integration from a gourmet-sp2 Claude Code session (git proxy limitation). https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

claude added 30 commits February 20, 2026 03:00

Add standalone A2E Japanese audio test script

081f904

Simple test script that verifies environment, model files, data_bundle.py fix, Wav2Vec2 loading, and A2E module import. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Update Gemini model to gemini-2.5-flash (2.0-flash deprecated)

b50178e

gemini-2.0-flash returns 404 "no longer available to new users". The error dict then cascades into the set_main_data TypeError. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

chore: add .gitignore for audio2exp-service model files

a8a68c3

Exclude downloaded model weights (wav2vec2, LAM checkpoint ~1.1GB) from version control. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

claude and others added 30 commits February 25, 2026 08:11

revert(lipsync): restore MOUTH_AMPLIFY to 1.0 baseline

a6d81a1

Previous tuning (jawOpen 1.8x, mouthLowerDown 0.45x) caused lipsync to appear completely stopped. Reverting to 1.0 baseline to restore working state before re-tuning. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

fix(audio2exp): match INFER_LOAD_TIMEOUT to proven 1500s

38e9f24

Previous successful deployment used ENGINE_LOAD_TIMEOUT=1500. Match the INFER_LOAD_TIMEOUT default to the same proven value. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

Create stt_stream

d16e30c

Add files via upload

014c1d2

Delete AI_Meeting_App/stt_stream

bce4c6a

docs: update CLAUDE.md - audio2exp-service health check now OK

1e80e3f

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

revert(frontend): restore LAMAvatar.astro to original (e955778)

2b8109b

Claude による2コミット分の変更を取り消し: - bae5578 reset all parameters to neutral baseline - 2964376 EMA temporal smoothing https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

rename: lam_platform → support_base

413439c

- パッケージ名を support_base に変更 - 全モジュールの import パスを support_base.* に修正 - Dockerfile の COPY パス・CMD も support_base に修正 https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

fix: push_support_base.sh を Windows Git Bash 対応に修正

611ec48

- rsync → cp -r + find で置換（Windows に rsync がない） - mktemp -d → mkdir -p で置換（互換性向上） https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

docs: add gourmet-sp2 handover document for new session

1cac5d6

Handover doc for pushing the Live API platform integration from a gourmet-sp2 Claude Code session (git proxy limitation). https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/platform design docs o e vkm#99

Claude/platform design docs o e vkm#99
mirai-gpro wants to merge 79 commits intoaigc3d:masterfrom
mirai-gpro:claude/platform-design-docs-oEVkm

mirai-gpro commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mirai-gpro commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants