Skip to content

draft: add ARKit52 NPC voice lip sync lane#319

Merged
JOY (JOY) merged 21 commits into
devfrom
codex/arkit52-lipsync-root
May 30, 2026
Merged

draft: add ARKit52 NPC voice lip sync lane#319
JOY (JOY) merged 21 commits into
devfrom
codex/arkit52-lipsync-root

Conversation

@JOY
Copy link
Copy Markdown
Contributor

@JOY JOY (JOY) commented May 29, 2026

Summary

  • Adds ARKit52/provider lip-sync plumbing for scoped NPC voice sessions.
  • Adds local ARKit52 smoke/helper scripts, a sidecar launcher, and blendshape reporting utilities.
  • Adds a synthetic Unity editor ARKit52 driver smoke hook that feeds provider-style frames into PrototypeFacialAnimationDriver and checks actual SkinnedMeshRenderer blendshape weights.
  • Adds visual prefab catalog safety checks and documents the current missing paid-source prefab state.
  • Keeps the lane presentation-only: voice and lip-sync payloads do not mutate authoritative gameplay state.
  • Shows focused NPC text before voice buffering, and stops stale NPC voice presentation when the player sends the next line so chat stays responsive.
  • Extends the ARKit52 smoke helper to verify secondspawn_voice_audio_chunk_get, so chunked Gemini audio retrieval is tested instead of only voice-session metadata.
  • Adds non-mutating lip-sync provider readiness and Unity lip-sync contract checkers so agents can verify local provider/testability state without touching production services.
  • Adds Nakama runtime coverage for Gemini voice gender pool selection and stable actor voice assignment.
  • Merged current origin/dev into this draft branch after resolving the voice lane doc conflict.

Verification

  • dotnet build Unity\SecondSpawn.AI.csproj --nologo: passed, 0 warnings.
  • dotnet build Unity\SecondSpawn.UI.csproj --nologo: passed, 0 warnings.
  • dotnet build Unity\Assembly-CSharp-Editor.csproj --nologo: passed. Warnings are existing Photon/Fusion package warnings, not new SECOND SPAWN code.
  • npm.cmd run build: passed.
  • npm.cmd test: passed after build, including Gemini voice pool selection coverage.
  • git diff --check: passed.
  • python -m py_compile tools\lipsync\check_unity_lipsync_contract.py tools\lipsync\check_lipsync_provider_readiness.py tools\lipsync\run_arkit52_smoke.py tools\lipsync\wav2arkit_http_server.py: passed.
  • python tools\lipsync\check_unity_lipsync_contract.py: passed 13/13 checks for ARKit52 sidecar channel shape, Nakama chunk smoke assertions, Unity DTO shape, presenter lip-sync forwarding, facial driver blendshape mapping, synthetic editor driver smoke hook, focused/free-mode voice gating, and portrait ARKit status reporting.
  • python tools\lipsync\run_arkit52_smoke.py: passed, 99 frames, 52 channels, 72.11 ms wall in the isolated worktree.
  • python tools\lipsync\run_arkit52_smoke.py --include-nakama --actor-id npc-scrap-warden-0441 --text "Xin chao. Day la spike test ARKit nam muoi hai voi giong nhan vat nam.": passed, provider gemini_tts, transport nakama_audio_chunks, voice Algenib, male hint, 158 ARKit52 frames, 52 channels, 104 ms lip-sync latency, 6580.26 ms total RPC wall time, 3 audio chunks, 335360 base64 chars fetched.
  • Attempted Unity.exe -batchmode -nographics -quit -projectPath .claude/worktrees/arkit52-lipsync-root/Unity -executeMethod SecondSpawn.EditorTools.SecondSpawnFacialBlendshapeReportUtility.RunArkit52DriverSmoke: Unity did not reach the smoke method because Package Manager failed startup with The "path" argument must be of type string. Received undefined. This is recorded as a worktree batch runner blocker, not a failed driver smoke.
  • Earlier male/female voice smoke: npc-scrap-warden-0441 returned stable voice Algenib twice with male hint; npc-clinic-operator-0320 returned Sulafat with female hint. Both returned Gemini audio chunks plus ARKit52 frames.
  • python tools\lipsync\check_lipsync_provider_readiness.py --scratch-root D:\Projects\Second-Spawn\.tmp-import\lipsync-spike: passed. Only wav2arkit_cpu is testable now; SALSA/uLipSync/Convai/FaceSync are not installed in Unity, NeuroSync is blocked by broken torchaudio, and NVIDIA A2F SDK is blocked by CUDA 13.3 plus missing TENSORRT_ROOT_DIR.
  • python tools\unity\check_visual_prefab_catalog.py: passed and reported 50 entries, 29 missing generated prefabs, 50 missing source assets, 3 unresolved generated prefab source GUIDs.
  • Local review fallback: no provider keys in Unity, Unity only requests presentation tiers through Nakama/Fusion boundary, branch stays presentation-only.

Still Missing Before Ready

  • Unity MCP console and Play Mode smoke could not run. Coplay-style tools returned telemetry but no_unity_session; official Unity MCP tools were visible, and the Editor log showed 20 official MCP tools discovered, but Unity_ManageEditor and Unity_ReadConsole timed out after 120 seconds from this agent session.
  • Current root Editor is still on dev; this branch is not merged into dev, so the feature is not yet applied to the root game workspace.
  • Final visual proof still requires one focused Ida Faber NPC in Play Mode with audible Gemini voice, honest portrait status, and visible ARKit mapped mouth motion.

Related: #139, #288

@JOY JOY (JOY) force-pushed the codex/arkit52-lipsync-root branch 2 times, most recently from dd500da to 7777bd2 Compare May 29, 2026 19:42
@JOY JOY (JOY) force-pushed the codex/arkit52-lipsync-root branch from 3047fd0 to bb2ab0d Compare May 29, 2026 19:51
@JOY JOY (JOY) merged commit e8cdc56 into dev May 30, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant