Skip to content

Claude/platform design docs o e vkm#99

Open
mirai-gpro wants to merge 79 commits intoaigc3d:masterfrom
mirai-gpro:claude/platform-design-docs-oEVkm
Open

Claude/platform design docs o e vkm#99
mirai-gpro wants to merge 79 commits intoaigc3d:masterfrom
mirai-gpro:claude/platform-design-docs-oEVkm

Conversation

@mirai-gpro
Copy link

No description provided.

Test scripts to verify A2E (Audio2Expression) lip sync quality
with Japanese audio input, before investing in ZIP motion replacement
or VHAP Japanese FLAME params.

Includes:
- generate_test_audio.py: EdgeTTS Japanese/English/Chinese audio samples
- test_a2e_cpu.py: A2E model loading, Wav2Vec2 feature extraction, ZIP validation
- save_a2e_output.py: Capture A2E 52-dim ARKit blendshape output
- analyze_blendshapes.py: Lip sync quality scoring and language comparison
- setup_oac_env.py: Auto-detect known OpenAvatarChat issues (CPU mode, deps, config)
- chat_with_lam_jp.yaml: Corrected config (Gemini API + EdgeTTS ja-JP-NanamiNeural)
- run_all_tests.py: Master test runner
- TEST_PROCEDURE.md: Step-by-step test procedure

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
Fix RuntimeError: Input data type <class 'list'> is not supported.
- diagnose_onnx_error.py: Tests SileroVAD ONNX, SenseVoice, data flow
- patch_vad_handler.py: Fixes timestamp[0] NoneType bug, adds defensive
  numpy type checking on ONNX inputs, handles 2/3-output model variants
- setup_oac_env.py: Adds VAD handler bug detection (check 7/7)

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
Simple test script that verifies environment, model files,
data_bundle.py fix, Wav2Vec2 loading, and A2E module import.

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
Gemini's OpenAI-compatible API sometimes returns delta.content as dict/list
instead of string, causing TypeError in set_main_data(). This patch script
detects and safely converts non-string content before passing to data_bundle.

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
gemini-2.0-flash returns 404 "no longer available to new users".
The error dict then cascades into the set_main_data TypeError.

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
SenseVoice auto-detection defaults to Chinese (<|zh|>), causing
Japanese speech to be misrecognized as Chinese text. This patch
forces language="ja" in the generate() call.

- patch_asr_language.py: Auto-patches asr_handler_sensevoice.py
- chat_with_lam_jp.yaml: Added language: "ja" to SenseVoice config
- TEST_PROCEDURE.md: Added Step 4.5 for patch application

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
Instead of creating a separate config file, this script patches
the existing working config/chat_with_lam.yaml with 3 changes:
1. TTS voice → ja-JP-NanamiNeural
2. LLM system_prompt → Japanese
3. ASR language → ja

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
Root cause analysis from production logs:
- 1st ASR call: rtf=0.629 (1.25s) - OK
- 2nd ASR call: rtf=15.027 (29.83s) - GPU memory exhausted, CPU fallback
- fastrtc 60s timeout triggers, resets frame pipeline → system unresponsive

Fix: Add torch.cuda.empty_cache() + gc.collect() after each SenseVoice
and LAM inference to free GPU memory between calls. Also adds startup
wrapper with PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True.

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
Create the missing Audio2Expression inference service that bridges
gourmet-support backend (which already has A2E hooks in /api/tts/synthesize)
with the actual Wav2Vec2 + LAM A2E decoder pipeline.

Services:
- audio2exp-service: Flask API accepting MP3 audio, returning 52-dim
  ARKit blendshape coefficients at 30fps. Includes Wav2Vec2 feature
  extraction and fallback mode when A2E decoder is unavailable.
- Frontend ExpressionManager: Maps A2E blendshapes to GVRM bone system,
  syncing with audio playback via currentTime.

Architecture: TTS → MP3 → audio2exp-service → 52-dim blendshapes → frontend

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
The a2e_engine now searches multiple patterns for the checkpoint:
- models/LAM_audio2exp_streaming.tar (flat, user's actual layout)
- models/LAM_audio2exp/pretrained_models/*.tar (OpenAvatarChat layout)
- models/LAM_audio2exp/*.tar (intermediate layout)
Falls back to rglob search if none match.

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
Full drop-in replacement for gourmet-sp's concierge-controller.ts with
Audio2Expression integration applied. Key changes marked with ★ comments:
- ExpressionManager import and initialization
- session_id added to /api/tts/synthesize requests
- A2E expression data used for lip sync when available
- FFT-based lip sync preserved as fallback
- Proper cleanup in stopAvatarAnimation() and dispose()

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
Replaces the scaffold version with the real concierge-controller.ts from
gourmet-sp (claude/test-concierge-modal-rewGs branch). A2E integration is
already built-in via applyExpressionFromTts() + lamAvatarController.

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
uvicorn is an ASGI server (FastAPI/Starlette) and cannot serve Flask
(WSGI). This caused the Cloud Run container to fail to start and listen
on the port, resulting in deployment timeout.

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
Covers all components: backend (gourmet-support), frontend (gourmet-sp),
audio2exp-service, A2E frontend patches, official HF Spaces ZIP generation
procedure, test suite, deployment config, and end-to-end data flow diagrams.

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
The audio2exp-service returns frames as arrays of numbers (number[][]),
but applyExpressionFromTts expected objects with a .weights property
({weights: number[]}[]), causing TypeError and empty frame buffer.

Changed f.weights[i] to frameData[i] to match the actual backend format.

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
…AvatarController)

The previous implementation used window.lamAvatarController which doesn't
exist in this codebase, causing lip sync to completely fail (buffer=0,
jaw=0, mouth=0). Additionally, the data format was wrong (f.weights[i]
vs the actual number[][] response).

Now uses ExpressionManager (vrm-expression-manager.ts) which:
- Correctly handles the number[][] frame format from audio2exp-service
- Syncs to audioElement.currentTime for accurate lip sync timing
- Maps ARKit blendshapes (jawOpen, mouthFunnel, etc.) to GVRM bone system
- Calls renderer.updateLipSync() directly

Changes:
- Import ExpressionManager and initialize in init()
- Replace lamAvatarController dependency with ExpressionManager
- Add expressionManager.stop() in stopAvatarAnimation()
- All 5 call sites (speakTextGCP, speakResponseInChunks x2, shop TTS x2)
  now correctly drive lip sync through ExpressionManager

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
The import '../avatar/vrm-expression-manager' caused a Vite build error
because that file doesn't exist in gourmet-sp's src/scripts/avatar/.

Solution: inline the ExpressionManager class directly into
concierge-controller.ts. This eliminates the need to copy a separate
file into gourmet-sp and avoids import resolution issues.

The ARKIT_INDEX map is trimmed to only the 7 mouth-related blendshapes
actually used for lip sync (jawOpen, mouthFunnel, mouthPucker, etc.)

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
Root cause: this.guavaRenderer doesn't exist on CoreController.
LAMAvatar.astro has its own animation loop with buffer/ttsActive state.
The ExpressionManager approach was completely wrong architecture.

Correct approach: use window.lamAvatarController exposed by LAMAvatar.astro
- setExternalTtsPlayer(): links ttsPlayer so LAMAvatar can track playback
- queueExpressionFrames(): feeds A2E frames into LAMAvatar's buffer
- clearFrameBuffer(): clears buffer on stop/new segment

Changes:
- Remove inlined ExpressionManager class (120 lines of dead code)
- Restore lamAvatarController.setExternalTtsPlayer() with retry (500ms x 20)
- applyExpressionFromTts: convert number[][] → {name: value}[] and queue
- stopAvatarAnimation: call clearFrameBuffer() to close mouth

Console should now show:
- "[Concierge] ✅ Linked ttsPlayer with LAMAvatar controller"
- "[Concierge] A2E: N frames queued @ 30fps"
- LAM Health: buffer>0, ttsActive=true during speech

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
… code

Read the ACTUAL LAMAvatar.astro, lam-websocket-manager.ts, and
audio-sync-player.ts from gourmet-sp to understand the real architecture.

Key findings:
- LAMAvatar.getExpressionData() is called at 60fps by renderer
- It reads frameBuffer[floor(ttsPlayer.currentTime * frameRate)]
- Requires: externalTtsPlayer linked, frameBuffer filled, ttsActive=true
- ttsActive is set by play event (requires setExternalTtsPlayer first)

4 chains must ALL work for lip sync:
  Chain1: Backend must return expression data (needs AUDIO2EXP_SERVICE_URL)
  Chain2: setExternalTtsPlayer must link ttsPlayer with LAMAvatar
  Chain3: applyExpressionFromTts must convert & queue frames
  Chain4: LAMAvatar renders from frameBuffer synced to currentTime

Added diagnostic logs at each chain point:
  [A2E Chain1] expression received or null (backend config issue)
  [A2E Chain2] setExternalTtsPlayer success or LAMAvatar not found
  [A2E Chain3] frames queued with jawOpen sample value

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
…meBuffer, support both frame formats

Compared with the ORIGINAL gourmet-sp concierge-controller.ts (from
claude/test-concierge-modal-rewGs branch) and found 2 bugs:

1. stopAvatarAnimation() called clearFrameBuffer() which resets
   fadeOutStartTime=null, breaking LAMAvatar's graceful 200ms fade-out.
   The ORIGINAL code trusts LAMAvatar's own ended event handler.
   → Removed clearFrameBuffer() from stopAvatarAnimation()

2. Frame data format mismatch:
   - Original gourmet-sp: f.weights[i] (expects {weights: number[]}[])
   - audio2exp-service: number[][] (raw arrays)
   → Now supports BOTH formats: Array.isArray(f) ? f : f.weights

Key fact: before A2E changes, lip sync was working via the renderer's
built-in FFT analysis. The A2E code path was dead code (AUDIO2EXP_SERVICE_URL
not set). These changes ensure A2E is a pure overlay that doesn't break
the existing FFT lip sync.

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
Root cause: When AUDIO2EXP_SERVICE_URL is set, the backend returns
expression data. The original code's applyExpressionFromTts used
f.weights[i] on raw number[] arrays, causing TypeError → caught by
outer try/catch → isAISpeaking=false → STT worked (lucky bug).

My both-format fix removed this error, so audio playback proceeds.
But if the browser blocks autoplay (fires play then immediate pause),
onended never fires → playPromise never resolves → initializeSession
hangs → buttons never enabled → STT completely broken.

Fix: Add onpause deadlock prevention to ALL 8 play-and-wait patterns,
matching the existing pattern in ack playback (line 588):
  this.ttsPlayer.onpause = () => {
    if (this.ttsPlayer.currentTime < 0.1) done();
  };

This detects "play then immediate pause" (autoplay block) and resolves
the promise, preventing deadlock. Normal mid-playback pauses (currentTime
> 0.1) are not affected.

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
オリジナルのgourmet-sp concierge-controller.tsとの差分を最小化。
唯一の実質変更は applyExpressionFromTts メソッドのみ:
- フレーム形式: f.weights[i] → Array.isArray(f) ? f : (f.weights || [])
  (audio2exp-service の number[][] 形式に対応)
- try/catch で非致命的エラーとして処理
- その他全メソッド(speakTextGCP, STT, sendMessage等)はオリジナルと同一

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
…ration

Previous patches removed all GVRM renderer integration (import, guavaRenderer,
setupAudioAnalysis, startLipSyncLoop) and replaced with non-existent
window.lamAvatarController calls, causing all A2E data to be silently dropped
and lip sync to degrade to basic jaw flapping.

This rewrite is based on the actual production concierge-controller.ts with
minimal A2E additions:
- Restore GVRM import, guavaRenderer, setupAudioAnalysis, startLipSyncLoop
- Add a2eFrames/a2eFrameRate/a2eNames properties for expression storage
- Add setA2EFrames() to store expression data from TTS response
- Add computeMouthOpenness() to convert 52-dim ARKit blendshapes to scalar
- Modify startLipSyncLoop() to use A2E frames when available, FFT as fallback
- Override speakTextGCP() with inline fetch to include session_id
- Add session_id to ALL TTS requests (ack, chunks, shop flow)

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
…t GVRM)

Root cause: The patch was based on gourmet-support's concierge-controller.ts
which uses GVRM renderer, but the actual deployed frontend (gourmet-sp) uses
LAMAvatar.astro with a completely different rendering pipeline.

Previous patch problems:
- Added GVRM import/renderer that doesn't exist in gourmet-sp
- Missing linkTtsPlayer() - LAMAvatar never received ttsPlayer reference
  -> ttsActive=false, buffer=0, lip sync completely dead
- Added setupAudioAnalysis()/startLipSyncLoop() for FFT - unnecessary with LAMAvatar
- Called clearFrameBuffer() in stopAvatarAnimation() - breaks LAMAvatar fade-out

Fix: Use the exact gourmet-sp version which correctly:
- Links ttsPlayer to LAMAvatar via setExternalTtsPlayer() in init()
- Sends A2E frames via applyExpressionFromTts() -> lamAvatarController.queueExpressionFrames()
- Lets LAMAvatar handle all lip sync rendering internally
- Does NOT call clearFrameBuffer() in stopAvatarAnimation()

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
…rpolate frames

Changes to applyExpressionFromTts():
1. Mouth blendshape amplification: Scale jawOpen (1.4x), mouthFunnel/Pucker (1.5x),
   mouthSmile (1.3x), mouthStretch (1.2x) etc. for more visible Japanese vowel
   distinctions (あ/い/う/え/お)
2. Frame interpolation: 30fps→60fps via linear interpolation between consecutive
   frames, matching the renderer's ~60fps render loop for smoother animation
3. Diagnostic logging: jawOpen/mouthFunnel/mouthSmile max/avg values logged per
   expression segment for live quality monitoring
4. LinkTtsPlayer retry: Multiple retry attempts (500ms, 1s, 2s, 4s) with logging
   to reliably connect ttsPlayer to LAMAvatar even with async initialization

Quality context: A2E streaming model (wav2vec2-base-960h, no transformer) produces
subtle Japanese phoneme variations. Frontend amplification makes these visible.

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
… objects)

The user rewrote audio2exp-service with a2e_engine.py (Flask) which returns
frames as plain arrays [[0.1, ...], ...] instead of the old FastAPI format
[{"weights": [0.1, ...]}, ...].

Frontend now detects both formats: Array.isArray(f) ? f : f.weights

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
Step 1: Add __testLipSync() diagnostic to concierge-controller.ts patch
  - Generates 5 Japanese vowel patterns (あいうえお) with known ARKit values
  - Creates silent WAV audio, queues frames to LAMAvatar, plays through ttsPlayer
  - Verifies whether renderer supports full 52-dim blendshapes

Step 3: Fix a2e_engine.py to use the proper LAM INFER pipeline
  - Restore LAM_Audio2Expression module (engines, models, utils, configs)
  - Rewrite _load_a2e_decoder → _try_load_infer_pipeline using INFER.build()
  - Use infer_streaming_audio() with context for chunked processing
  - Includes full postprocessing: smooth_mouth, frame_blending, savitzky_golay,
    symmetrize, eye_blinks
  - Falls back to Wav2Vec2 energy-based approximation when INFER unavailable
  - Add librosa, scipy, addict to requirements.txt
  - Add libsndfile to Dockerfile

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
Three issues fixed during local testing:
1. transformers v5.x requires ignore_mismatched_sizes=True and
   attn_implementation="eager" for Wav2Vec2Model.from_pretrained()
2. HuggingFace checkpoint is double-wrapped (tar.gz containing
   pretrained_models/lam_audio2exp_streaming.tar) - auto-extract
3. Bare except in infer.py swallowed tracebacks and crashed on
   uninitialized output_dict - now logs actual error and recovers

Result: audio2exp-service starts with mode="infer" and produces
52-dim ARKit blendshapes from audio input.

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
Exclude downloaded model weights (wav2vec2, LAM checkpoint ~1.1GB)
from version control.

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
Flask's app.run() auto-loads .env files, which crashes with
UnicodeDecodeError if a non-UTF-8 .env exists in the path.
Pass load_dotenv=False since env vars are set externally.

https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM
claude and others added 30 commits February 25, 2026 08:11
A2E model output characteristics:
- jawOpen: very weak (avg ~0.05) → 1.8x to prevent mumbling
- mouthLowerDown: very strong (raw ~0.84) → 0.45x to prevent jaw pull
- All other channels: 1.0 (neutral baseline)

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ
Previous tuning (jawOpen 1.8x, mouthLowerDown 0.45x) caused
lipsync to appear completely stopped. Reverting to 1.0 baseline
to restore working state before re-tuning.

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ
…ic logs

Root cause: INFER.build() hangs indefinitely on Cloud Run CPU,
blocking the engine from ever becoming ready. All /api/audio2expression
requests return 503, so TTS responses have no expression data → no lipsync.

Changes:
1. a2e_engine.py: wrap _try_load_infer_pipeline() in a timeout thread
   (INFER_LOAD_TIMEOUT env var, default 600s). On timeout, fall back to
   Wav2Vec2 mode which provides approximate lipsync immediately.
2. a2e_engine.py: add timing logs at each step (import, config parse,
   INFER.build, model.to) to pinpoint the bottleneck.
3. Dockerfile: pre-extract model tar archive at build time, saving
   ~7 minutes of runtime extraction on every cold start.

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ
INFER.build() can complete on CPU given enough time (previous instance
logs showed successful weight loading). Default 600s was too short.

With tar pre-extraction saving 7 min, INFER.build() needs ~10-15 min.
1200s (20 min) provides sufficient margin. Wav2Vec2 fallback remains
as safety net but should not normally be needed.

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ
Previous successful deployment used ENGINE_LOAD_TIMEOUT=1500.
Match the INFER_LOAD_TIMEOUT default to the same proven value.

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ
INFER model outputs jawOpen in 0.13-0.32 range, causing mumbling appearance.
Scale all blendshapes by 1.8x (clamped to 0-1) to improve mouth visibility.
Tunable via EXPRESSION_SCALE env var without redeploying.

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ
…ode architecture

Design document covering the plan to evolve the gourmet-support system
into a reusable platform supporting multiple AI application modes
(gourmet concierge, customer support, interview) with Gemini Live API
integration, while preserving existing endpoints for alpha testing.

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ
Root cause: A2E fallback mode outputs noisy per-frame blendshape values
(e.g., jaw oscillating 0.5→0.09→0.27 between frames) which are applied
directly to the 3D avatar without any frame-to-frame smoothing, causing
visible choppy vibration (カクカク).

Frontend fix (LAMAvatar.astro):
- Add exponential moving average (EMA) with alpha=0.35 to getExpressionData()
- Each frame blends smoothly with the previous: smoothed = prev + 0.35*(target-prev)
- At 60fps this gives ~95% convergence in ~130ms — smooth yet responsive
- Reset EMA state on buffer clear and expression reset

Backend fix (a2e_engine.py):
- Upgrade fallback smoothing from 3-frame uniform to 2-pass filter:
  Pass 1: 5-frame Gaussian-like kernel [0.06, 0.24, 0.40, 0.24, 0.06]
  Pass 2: 3-frame uniform for additional smoothness
- Approximates the INFER pipeline's savitzky_golay post-processing

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ
Claude による 7コミット分の変更を全て取り消し、
前回ヘルスチェック OK だった e36190d の状態に完全復元。

取り消し対象:
- 770dfd1 INFER load timeout + Wav2Vec2 fallback
- 84902f6 INFER_LOAD_TIMEOUT 1200s
- 38e9f24 INFER_LOAD_TIMEOUT 1500s
- ce103ad conservative expression scaling
- 2964376 EMA temporal smoothing
- d466f6a revert to Streaming model
- 36bf69b switch to Non-Streaming full model

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ
Claude による2コミット分の変更を取り消し:
- bae5578 reset all parameters to neutral baseline
- 2964376 EMA temporal smoothing

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ
前回AI作成の PLATFORM_DESIGN.md の信頼性問題を踏まえ、
別のAI/エンジニアに設計を一からやり直してもらうための指示書。

内容:
- 現状構成の確定事実と未確認事項の明確な区別
- 前回設計書の各セクション信頼性評価
- プラットフォーム化・LiveAPI統合の要件
- 参照すべきリポジトリ・論文・OSSの一覧

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ
- §0: プラットフォーム化の目的と直近のゴール(なぜやるのか)
- §0.4: 短期記憶(stt_stream.py)と長期記憶(gourmet-support)が別々に
  開発された経緯と、統一仕様化の要件を明示
- §2.3: LiveAPI導入理由(レイテンシ・割り込み・相槌の問題解決)
- §2.3: FLASH版の累積文字数制限と回避ロジックの詳細(コード行番号付き)
- §2.3: Live/RESTハイブリッド方式の説明
- §4: 設計書必須セクションに「記憶機能の統一設計」を追加

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ
DESIGN_REQUEST.md に基づき、実コード読解をベースに2文書を新規作成:

- PLATFORM_REQUIREMENTS.md (要件定義書)
  - 現状分析(audio2exp-service, stt_stream.py, frontend-patches の確認済み事実)
  - 機能要件(マルチモード、Live API統合、記憶機能、音声処理、アバター)
  - 非機能要件(パフォーマンス、可用性、拡張性、デバイス対応)
  - 技術的制約、未確認事項、リスク評価

- PLATFORM_ARCHITECTURE.md (設計書)
  - 全体アーキテクチャ(Gateway Layer + サービス層)
  - データフロー(REST API経路 / Live API経路の両方を明示)
  - Live API統合設計(LiveRelay, ReconnectManager, SpeechDetector)
  - 記憶機能の統一設計(SessionMemory + LongTermMemory)
  - API設計(/api/v2/ 新エンドポイント + 既存互換)
  - iPhone SE対応戦略(方式A/B判断基準)
  - 開発ロードマップ(Phase 0-3)

確認済み/推定/未検証を全箇所で明記。

https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1
既存の gourmet-sp / gourmet-support に実装済みの4言語対応(ja/en/ko/zh)を
プラットフォーム設計に組み込み:

PLATFORM_REQUIREMENTS.md:
- §4.7 多言語対応: 現状の実装状況を確認済み/推定で整理
  - フロントエンド: t(), LANGUAGE_CODE_MAP, splitIntoSentences()
  - バックエンド: language パラメータ, TTS言語指定
  - Live API: 日本語のみ(ja-JP ハードコード)の制約を明記
- FR-I18N-01〜07: 多言語機能要件を追加

PLATFORM_ARCHITECTURE.md:
- §7 多言語対応設計(新セクション):
  - LanguageConfig(言語マスター)設計
  - SpeechRules(言語別文分割・途切れ検知)設計
  - Live API の speech_config.language_code 動的設定
  - フロントエンド i18n 設計(I18n クラス)
  - Session への language フィールド統合
- セクション番号を8〜12に繰り下げ
- API設計にsupported_languages追加

https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1
要件定義書・設計書に基づき、Gemini Live API を Web で利用するための
プラットフォームバックエンドを実装。

主要コンポーネント:
- server.py: FastAPI エントリーポイント (REST + WebSocket)
- live/relay.py: Browser ↔ Gemini Live API WebSocket 中継
- live/reconnect.py: 累積文字数制限による自動再接続
- live/speech_detector.py: 多言語発話未完了検出 (ja/en/ko/zh)
- memory/session_memory.py: 20ターン短期記憶 + コンテキスト要約
- session/manager.py: セッションライフサイクル管理
- services/a2e_client.py: audio2exp-service 非同期クライアント
- i18n/language_config.py: 4言語プロファイル
- modes/: プラグインアーキテクチャ (BaseModePlugin, GourmetModePlugin)

アバター連携: Live API 経路でも A2E → Expression WebSocket 送信で
LAMAvatarController の frameBuffer と連携。

https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1
- Python標準ライブラリの platform モジュールとの名前衝突を回避
- 全モジュールの import パスを from platform.* → from lam_platform.* に修正
- uvicorn のアプリパスも lam_platform.server:app に修正
- Cloud Run デプロイ用 Dockerfile を追加
- .dockerignore を追加

https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1
- パッケージ名を support_base に変更
- 全モジュールの import パスを support_base.* に修正
- Dockerfile の COPY パス・CMD も support_base に修正

https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1
gourmet-support の3コアファイルを support_base/core/ に統合:
- api_integrations.py: HotPepper/TripAdvisor/Google Places API 連携 (そのまま)
- long_term_memory.py: Supabase 長期記憶管理 (そのまま)
- support_core.py: SupportSession/SupportAssistant + GCS プロンプト読込 (import パス修正のみ)

Flask → FastAPI 変換:
- rest/router.py: 全 REST エンドポイントを FastAPI APIRouter に変換
  (session/start, chat, finalize, cancel, tts/synthesize, stt/transcribe, stt/stream)
- server.py: REST ルーターを include_router で統合

プラグイン改善:
- gourmet/plugin.py: ハードコードプロンプト → GCS/ローカル読込プロンプトを優先使用

設定追加:
- config/settings.py: PROMPTS_BUCKET_NAME, Google/Supabase API キー
- requirements.txt: google-cloud-*, supabase, requests 追加

https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1
support_core.py:
- google.cloud.storage を optional import に変更 (未インストールでも起動可)
- Gemini クライアント初期化を try-except で wrap (API キー未設定でも起動可)
- load_prompts_from_gcs() に GCS 利用可否チェック追加

cloudbuild.yaml:
- GitHub → Cloud Build → Cloud Run 自動デプロイパイプライン
- support_base/ をビルドコンテキストに指定
- substitutions で SERVICE_NAME/REGION/MEMORY を設定可能

https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1
この環境からは support-base リポへの直接プッシュが不可のため、
ローカルPCで実行するスクリプトを用意。

使い方: bash scripts/push_support_base.sh

https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1
- rsync → cp -r + find で置換(Windows に rsync がない)
- mktemp -d → mkdir -p で置換(互換性向上)

https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1
グルメコンシェルジュのLive API化に向けたフロントエンド基盤を構築。
PLATFORM_ARCHITECTURE.md §8 の設計に準拠し、既存パッチの
オーディオコード(iPhone 16/17 対策済み)は一切改変せず、
プラットフォーム設計思想を反映したモジュール分割を実施。

構成:
- live-ws-client.ts: LiveRelay WebSocket クライアント (relay.py プロトコル準拠)
- audio-io.ts: Live API 用 PCM I/O (既存AudioManagerパターン踏襲)
- dialogue-manager.ts: REST/Live 切替の共通インターフェース
- platform-controller.ts: メインコントローラー (ConciergeController パターン踏襲)
- gourmet-mode.ts: グルメモード固有ロジック
- expression-manager.ts: vrm-expression-manager.ts の re-export
- index.astro: メインページ (アバター + チャットパネル)

https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1
コンシェルジュモード・グルメモード両方のLiveAPI化、
プロンプトGCS外部保存、短期/長期記憶、検索SDK、多言語対応、
マルチデバイス、実写アバターリップシンクの7要件を網羅。
support_base の実コード読解に基づく実装ステータスとAPI仕様を記載。

https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1
- フロントエンドをgourmet-sp→gourmet-sp2に修正(アバター実装・テスト済)
- gourmet-sp2の全モジュール構成・コントローラー階層を追記
- LAMAvatarコンポーネント・GS/GVRMレンダラーの詳細仕様を追記
- TTS+A2E同期フロー・リップシンク診断テストを追記
- Vercelデプロイ設定手順を追記(現在未連携→連携予定)
- iOS/Android AudioWorkletの実装差分を追記
- PWA対応を「未着手」→「実装済み」に更新
- 13章の実装ステータスをgourmet-sp2の実コード確認結果で全面更新

https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1
Implementation of PLATFORM_SPEC_v2.md for gourmet-sp2 frontend:

New platform modules (src/scripts/platform/):
- live-ws-client.ts: WebSocket client for support_base LiveRelay
- live-audio-io.ts: PCM 16kHz mic + PCM 24kHz playback via AudioContext
- dialogue-manager.ts: REST/Live API unified session & dialogue layer

Modified controllers (src/scripts/chat/):
- core-controller.ts: DialogueManager integration, Live API events,
  mic streaming, all APIs via /api/v2/
- concierge-controller.ts: Live API expression -> LAMAvatar,
  TTS via DialogueManager, session via /api/v2/

Deployment config:
- vercel.json: COOP/COEP headers, API proxy rewrites
- .env.example: PUBLIC_API_URL documentation

These files are committed in gourmet-sp2 repo locally and
mirrored here for reference. Build verified: astro build passes.

https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1
Handover doc for pushing the Live API platform integration
from a gourmet-sp2 Claude Code session (git proxy limitation).

https://claude.ai/code/session_01E9rf3QsqK1jCcMpd5RR9f1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants