Maya is an advanced AI assistant designed to give a "brain" and personality to the Unitree G1 humanoid robot. It orchestrates voice interaction, vision, and motion to create a lifelike companion.
Core Goal: Build an ultra-low latency voice pipeline, cutting down on every millisecond possible to achieve a smooth, natural back-and-forth conversation with the user.
This project is designed for both laymen (to understand the capabilities and interact with Maya) and developers (to extend the features and build on top of the Unitree SDK).
- Wake Word Detection: Listens for the name "Maya" to activate.
- Async Voice Pipeline: Full async pipeline linking Speech-to-Text (STT), Small Language Model (SLM), and Text-to-Speech (TTS).
- Emotional Intelligence: Leverages the PAD emotional state model baked into the SLM to simulate realistic emotions and personality.
- Streaming Response: SLM output is piped directly to TTS for near-instant responses.
- Hardware Integration: Controls Unitree G1 LEDs to show states (Listening, Thinking, Speaking) and triggers arm motions synchronized with speech.
- Smooth Motion Interpolation: Implements smooth transitions between joint states to eliminate jerky movements, making Maya's physical responses feel fluid and lifelike.
- Remote Control: Support for wireless remote stop button (key 64) to interrupt the robot in noisy environments.
- Core: Python 3.x with
asyncio - SLM: Ollama (HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive)
- TTS: ElevenLabs (Cloud)
- STT: Sherpa-ONNX (nvidia/parakeet-tdt_ctc-110m)
- Wake Word: openWakeWord
- Audio: SoundDevice
- Robot SDK:
unitree_sdk2py
main.py: The central orchestrator tying all services together.SLM/: Small Language Model service for generating responses.TTS/: Text-to-Speech service (currently using ElevenLabs).STT/: Speech-to-Text service for transcribing user input.WAKEWORD/: Wake word detection service listening for "Maya".INTERFACE/: Services for hardware interaction (LEDs, Motion).HIGH_LEVEL/: High-level control logic for the robot.
- Clone the repository to the robot's compute board or a connected local machine.
- Install dependencies:
pip install -r requirements.txt
- Configure Wake Word: Ensure your microphone device name is correctly set in
WAKEWORD/config.json. - Ollama: Ensure Ollama is installed and running with the appropriate model.
To start the Maya voice pipeline, run:
python main.pyOnce started, say "Maya" to interact with the robot!
- Local TTS: Add a local TTS system that can produce similar high-quality audio results as the current solution without relying on the cloud.
- Multilingual STT: Add a multilingual STT model rather than the current one which only supports English.
- VLM Implementation: Integrate a Vision-Language Model (VLM) so Maya can "see" and have a sense of understanding of its environment.
- Pure Autonomy: Push the boundaries to help the robot reach pure autonomy.
- Enhanced Safety Measures: Implement better safety guardrails and measures for physical movements and interaction.
[Specify License, e.g., MIT]
