Real-time pose estimation, object detection, and 3D avatar rendering with WebSocket streaming.
- 2D Pose Estimation — YOLOv8-Pose real-time 2D keypoint detection
- 3D Pose Estimation — Switchable models (MediaPipe / YOLO+RTMPose) with avatar or skeleton rendering
- Object Detection — EfficientDet-Lite2 bounding boxes and labels
- Hand Gesture Recognition — Per-hand landmark tracking with gesture classification
- Live 3D Avatar — Mixamo-rigged avatar driven by FK quaternions from pose estimation
- Avatar Voice Control — Voice/text commands to control 3D avatar via SecondBrain AI (coming soon)
- Camera & Video Input — Live camera streams or video file upload
- Real-time Log Streaming — Backend logs streamed live to the frontend
- Auto-Deployment — GitHub Actions CI/CD to production and staging
- Python 3.12+
- Node.js 18+
- Modern web browser with camera access
git clone <repository-url>
cd pose-spatial-studio
# Backend
cd backend
pip install -r requirements.txt
# Frontend
cd ../frontend
npm install# Terminal 1: Backend (port 49101)
cd backend
./run_server.sh
# Terminal 2: Frontend (port 8585)
cd frontend
./run_ui.shOpen http://localhost:8585 in your browser.
- Select a function from the left sidebar (2D Pose, 3D Pose, Object Detection, Hand Gesture)
- Choose input source — camera or video file upload
- Click Start — grants camera permission if needed, initializes the processing pipeline
- Interact with results:
- 2D views show annotated frames directly on canvas
- 3D view supports orbit controls (rotate, pan, zoom) and toggle between Avatar and Skeleton rendering
- 3D Pose supports model switching between MediaPipe and YOLO+RTMPose
- View logs in the right sidebar panel for real-time backend diagnostics
- Avatar Voice Control — coming soon (hidden easter egg: click 15 times to unlock)
┌─────────────┐ WebSocket ┌──────────────────────┐
│ Browser │ ◄──────────────────────► │ FastAPI + Socket.IO │
│ (React/TS) │ │ (Python 3.13) │
└──────┬───────┘ └──────────┬────────────┘
│ │
│ 10 FPS JPEG frames │ Processor Pipeline
│ │
▼ ▼
┌──────────────┐ ┌──────────────────────┐
│ View2D/View3D│ │ Image → Data → │
│ Three.js │ Annotated frames │ Pose/Detection │
│ Avatar/Skel │ ◄────────────────────── │ Processor │
└──────────────┘ └──────────────────────┘
pose-spatial-studio/
├── backend/ # Python FastAPI server
│ ├── app.py # Server entry point
│ ├── config.py # Configuration + GPU detection
│ ├── core/ # WebSocket handler + log streaming
│ ├── processors/ # 8 processors (pose, detection, gesture)
│ ├── models/ # ML model files (MediaPipe, TCPFormer)
│ └── utils/ # Kinetic converter, filters, I/O
│
├── frontend/ # React TypeScript UI
│ ├── src/
│ │ ├── components/ # Controls, View2D, View3D, LogPanel, etc.
│ │ ├── three/ # AvatarRenderer, StickBallRenderer, VideoPlane
│ │ ├── stores/ # Zustand state management
│ │ ├── hooks/ # Camera devices, WebSocket, log stream
│ │ ├── services/ # Socket client, stream init, frame transmission
│ │ └── types/ # Function definitions, pose data types
│ └── public/avatars/ # Mixamo skeleton.glb
│
├── tests/ # Playwright E2E tests (production + staging)
├── .github/workflows/ # CI/CD: deploy frontend/backend to prod/staging
└── .claude/ # Project docs + Claude Code skills
See PROJECT_STRUCTURE.md for detailed documentation.
| Event | Payload | Description |
|---|---|---|
initialize_stream |
{ stream_id, processor_type, processor_config, source_type } |
Initialize processing pipeline |
process_frame |
{ stream_id, frame (base64), timestamp_ms } |
Send frame for processing |
cleanup_processor |
{ stream_id } |
Tear down processor |
switch_model |
{ stream_id, processor_type } |
Switch 3D pose model |
subscribe_logs |
{} |
Start receiving backend logs |
solve_ik |
{ request_id, joints, root_position } |
Send joint coordinates for IK solving |
| Event | Payload | Description |
|---|---|---|
stream_initialized |
{ stream_id, status, message, processor_type } |
Pipeline ready |
pose_result |
{ stream_id, frame (base64), pose_data, timestamp_ms } |
Processed frame + data |
stream_error |
{ stream_id, message, active_streams?, max_streams? } |
Error with capacity info |
log_batch |
[{ level, message, timestamp, logger }] |
Batched log entries |
fk_result |
{ request_id, fk_data, root_position, error? } |
FK quaternion result |
GET /health— Health check with per-stream metricsGET /— Server infoGET /info— Feature capabilities
Production: robot.yingliu.site Staging: staging.robot.yingliu.site
Push to main deploys to production. Push to staging deploys to staging. Both trigger GitHub Actions workflows with Cloudflare Access authentication.
VM1 (Frontend) VM2 (GPU Backend)
┌──────────────┐ ┌──────────────────────┐
│ Nginx │ WS │ Docker container │
│ Cloudflare │ ──────────► │ FastAPI + Socket.IO │
│ TLS │ │ CUDA GPU │
└──────────────┘ └──────────────────────┘
| Layer | Technologies |
|---|---|
| Backend | Python 3.13, FastAPI, Socket.IO, MediaPipe, rtmlib (RTMPose3D), Ultralytics (YOLOv8), PyTorch (TCPFormer), OpenCV |
| Frontend | React 19, TypeScript, Three.js, React Three Fiber, Drei, Zustand, Socket.IO Client, SecondBrain (guest chat API), Vite |
| Testing | Playwright (E2E, production + staging) |
| CI/CD | GitHub Actions, Cloudflare Tunnels, rsync |
| Infra | Nginx, Docker, NVIDIA CUDA, Cloudflare (TLS, WAF) |
HOST = os.getenv("POSE_STUDIO_HOST", "0.0.0.0") # Server host
PORT = int(os.getenv("POSE_STUDIO_PORT", 49101)) # Server port
POSE_WORKERS = min(cpu_count, 16) # Thread pool size
MAX_CONCURRENT_STREAMS = 3 # Server-wide limit.env.local→VITE_BACKEND_URL=http://localhost:49101.env.production→VITE_BACKEND_URL=https://pose-backend.yingliu.siteVITE_SECOND_BRAIN_URL→ SecondBrain guest chat API base URL
cd tests
npx playwright test # All tests
npx playwright test --config playwright.staging.config.ts # Staging
npx playwright test --headed # Visible browser
npx playwright show-report # HTML report- Camera not starting — Check browser permissions, ensure camera isn't used by another app
- No pose results — Check LogPanel (right sidebar) or
tail -f logs/$(date +%Y-%m-%d).log - Stream limit reached — Max 3 concurrent streams (configurable via
MAX_CONCURRENT_STREAMS) - Low performance — Reduce FPS, lower JPEG quality, decrease resolution
- Create a feature branch:
git checkout -b feat/your-feature - Make changes following code conventions
- Run tests:
cd tests && npx playwright test - Commit with conventional messages:
feat: add new feature - Create a pull request to
staging, then merge tomain
MIT License
- MediaPipe — Pose estimation, object detection, gesture recognition
- rtmlib — RTMPose3D inference
- Three.js / React Three Fiber — 3D rendering
- FastAPI / Socket.IO — Backend framework
- Playwright — E2E testing