Welcome to VisualTalk Junior, a high-performance, AI-driven voice companion specifically architected for early childhood engagement (ages 3–5). VisualTalk Junior is a child-friendly AI web application that creates a real-time voice conversation with young children based on a visual scene. The AI describes the image, asks simple questions, listens to the child’s responses, and rewards correct answers with interactive star feedback.
The project is designed to demonstrate natural human-AI interaction, real-time speech processing, and engaging user experience for early childhood learning.
VisualTalk Junior creates an immersive linguistic environment where static imagery becomes a gateway to conversation. By leveraging cutting-edge large language models (LLMs) and native browser speech capabilities, it provides a safe, responsive, and educational experience that mimics the interaction style of a compassionate preschool teacher.
The repository follows a clean, decoupled full-stack architecture designed for seamless scalability and ease of deployment.
/api: Modern Serverless entry point for production. Contains the idempotent Lambda-style function used by Vercel to handle AI negotiations./backend: Traditional Express-based server used for local development and testing. It provides a mirror of the production API behavior./frontend: The React core. A high-speed Vite-powered single-page application (SPA) that manages the user interface, state, and client-side voice processing.vercel.json: The orchestration layer that routes traffic between the frontend static assets and the backend serverless endpoints.package.json: The root manifest used for cross-environment build orchestration.
VisualTalk-Junior/
├── api/ # Production Serverless Backend
│ └── chat.js # Groq AI implementation
├── backend/ # Local Development Backend
│ ├── server.js # Entry point for local testing
│ └── .env # Sensitive environment variables
├── frontend/ # Desktop/Mobile User Interface
│ ├── src/
│ │ ├── components/ # Modular UI blocks (Image, Controls, Stars)
│ │ ├── services/ # Hardware abstraction (Speech API, Network)
│ │ └── assets/ # Visual design tokens and media
│ └── vite.config.js # Build optimization config
├── vercel.json # Cloud routing & rewrite rules
├── package.json # Root build scripts
└── .gitignore # Version control exclusions
The AI personality is strictly bounded by rules designed for 3-5 year olds. It uses a lexile-appropriate vocabulary, restricts sentence length to 5-7 words, and maintains a warm "Teacher Persona."
The system utilizes the Web Speech API for ultra-low latency interaction:
- Recognition: Captured locally on the device to minimize bandwidth and maximize privacy.
- Synthesis: High-quality, female-leaning voices are selected to ensure the AI sounds approachable.
Every interaction utilizes "scaffolding"—a technique used in early education to build confidence:
- The Hook: A warm, enthusiastic greeting.
- Contextualization: A simple summary of what the child is seeing.
- The Prompt: A single, focused question to encourage speech.
The application includes a specialized "Correction Engine." If a child provides an incorrect or unrelated answer, the system avoids negative reinforcement. Instead, it uses kind redirection: "Nice try! The dog is actually brown. What color do you see?"
The technology stack is selected for speed, cost-efficiency, and a "premium" feel.
graph TD
A[React SPA] -->|JSON/POST| B[Vercel Function]
B -->|Groq Protocol| C[Llama 3.3 70B]
C -->|Completion| B
B -->|Stream/JSON| A
A -->|Hardware Access| D[Microphone/Speakers]
- Groq AI: Utilizes LPU (Language Processing Unit) inference to deliver response times faster than human reactivity.
- Tailwind CSS: A utility-first CSS framework used to build the glassmorphic, responsive layout.
- Framer Motion: Smooth entry/exit animations for stars and UI transitions.
VisualTalk Junior can be run locally in three simple steps.
- Runtime: Node.js v18.x or above.
- API Access: An active Groq API Key (Secret).
- Navigate to the backend directory:
cd backend - Install dependencies:
npm install - Configure environment: Create a
.envfile and add:GROQ_API_KEY=your_secret_key_here
- Boot the server:
npm start(Runs on port 3000)
- Open a new terminal and navigate to:
cd frontend - Install dependencies:
npm install - Launch development server:
npm run dev - Access the app at:
http://localhost:5173
This repository is optimized for deployment on the Vercel Hobby Plan.
- Click "New Project" in Vercel and import your GitHub repository.
To ensure the hybrid frontend/backend builds correctly, use these settings in the Vercel dashboard:
- Framework Preset:
Other(or let it auto-detect Vite) - Root Directory:
./(Leave as default) - Build Command:
npm run build - Output Directory:
frontend/dist - Install Command:
npm install
Add the following secret in the Environment Variables section:
- Key:
GROQ_API_KEY - Value: Your Groq API Key
Click Deploy. Vercel will build your React app and automatically host the serverless function in the /api directory.
- Permissions: The browser will request Microphone access. Click "Allow."
- Engagement: Click the "START TALKING" button to begin the session.
- Response Cycle: Wait for the "Listening..." status to appear before the child speaks.
- Conclusion: Use the "I am done! 👋" button to end the conversation early.
Distributed under the MIT License. Developed for the next generation of digital learning.