Speech Training Application - System Architecture Pipeline

Complete User Flow Diagram

┌─────────────────────────────────────────────────────────────────────────┐
│                          USER INTERFACE (React)                          │
│                         Hosted on Vercel                                 │
└────────────────────────────────┬────────────────────────────────────────┘
                                 │
                                 │ User visits app
                                 ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                      AUTHENTICATION LAYER                                │
│                         Supabase Auth                                    │
│  • Login/Register                                                        │
│  • JWT token management                                                  │
│  • Session handling                                                      │
└────────────────────────────────┬────────────────────────────────────────┘
                                 │
                                 │ Authenticated
                                 ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         PRACTICE SESSION START                           │
│                                                                          │
│  Frontend displays:                                                      │
│  • Target sentence: "Think about the weather"                           │
│  • [Optional] Play TTS example (ElevenLabs)                             │
│  • Record button                                                         │
└────────────────────────────────┬────────────────────────────────────────┘
                                 │
                                 │ User clicks "Record"
                                 ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                      AUDIO CAPTURE (Frontend)                            │
│                                                                          │
│  Web Audio API + MediaRecorder API                                      │
│  • Start recording from microphone                                       │
│  • Real-time waveform visualization (WaveSurfer.js)                     │
│  • User speaks: "Fink about the wedder"                                 │
│  • Stop recording                                                        │
│  • Convert to WAV/MP3 format                                            │
└────────────────────────────────┬────────────────────────────────────────┘
                                 │
                                 │ POST /api/v1/practice/submit-audio
                                 │ Payload: { audio: File, reference_text: "..." }
                                 ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                      BACKEND API (FastAPI)                               │
│                     Hosted on Railway                                    │
│                                                                          │
│  1. Validate request (auth token, file size, format)                    │
│  2. Upload audio to storage                                             │
│  3. Initialize session tracking                                         │
└────────────────────────────────┬────────────────────────────────────────┘
                                 │
                                 │ Audio file ready
                                 ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                    AUDIO PREPROCESSING LAYER                             │
│                                                                          │
│  Python libraries: pydub, soundfile                                     │
│  • Normalize audio volume                                               │
│  • Remove leading/trailing silence                                      │
│  • Resample to 16kHz (standard for speech)                              │
│  • Convert to required format                                           │
└────────────────────────────────┬────────────────────────────────────────┘
                                 │
                                 │ Preprocessed audio
                                 ▼
          ┌──────────────────────┴──────────────────────┐
          │                                              │
          ▼                                              ▼
┌──────────────────────────┐              ┌──────────────────────────────┐
│   WHISPER API (OpenAI)   │              │  AZURE SPEECH SERVICES       │
│                          │              │  Pronunciation Assessment    │
│  • Transcription         │              │                              │
│  • Word-level timestamps │              │  Input:                      │
│  • Confidence scores     │              │  • Audio file                │
│                          │              │  • Reference text:           │
│  Returns:                │              │    "Think about the weather" │
│  "Fink about the wedder" │              │                              │
└────────────┬─────────────┘              │  Returns:                    │
             │                            │  • Overall accuracy: 68/100  │
             │                            │  • Phoneme-level scores:     │
             │                            │    - θ → f (score: 31)       │
             │                            │    - ð → d (score: 38)       │
             │                            │  • Word-level breakdown      │
             │                            │  • Error types               │
             │                            └──────────────┬───────────────┘
             │                                           │
             │                                           │
             └─────────────────┬─────────────────────────┘
                               │
                               │ Both results combined
                               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                    ERROR ANALYSIS ENGINE (Backend)                       │
│                         Python + Business Logic                          │
│                                                                          │
│  1. Compare Whisper transcription vs reference text                     │
│  2. Parse Azure phoneme scores                                          │
│  3. Identify error patterns:                                            │
│     • Phoneme substitutions (θ→f, ð→d)                                  │
│     • Omissions                                                          │
│     • Timing/duration issues                                             │
│     • Prosody problems                                                   │
│                                                                          │
│  4. Query user history from database:                                   │
│     • Past phoneme performance                                           │
│     • Improvement trends                                                 │
│     • Recurring error patterns                                           │
│                                                                          │
│  5. Classify impediment type:                                            │
│     • Frontal lisp (θ→f, ð→d pattern)                                   │
│     • Rhotacism (r→w)                                                    │
│     • Etc.                                                               │
└────────────────────────────────┬────────────────────────────────────────┘
                                 │
                                 │ Structured error report
                                 ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                    DATABASE UPDATE (Supabase)                            │
│                          PostgreSQL                                      │
│                                                                          │
│  INSERT INTO phoneme_performance:                                       │
│  • user_id, session_id, timestamp                                       │
│  • phoneme, accuracy_score, error_type                                  │
│  • word, position                                                        │
│                                                                          │
│  UPDATE user_progress:                                                  │
│  • sessions_completed++                                                  │
│  • overall_score_trend                                                   │
│  • current_difficulty_level                                              │
└────────────────────────────────┬────────────────────────────────────────┘
                                 │
                                 │ Data saved
                                 ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                  AI FEEDBACK GENERATION (Claude API)                     │
│                      Anthropic Claude Sonnet 4                           │
│                                                                          │
│  Input payload:                                                          │
│  {                                                                       │
│    "target_sentence": "Think about the weather",                        │
│    "user_transcription": "Fink about the wedder",                       │
│    "phoneme_errors": [                                                   │
│      {"phoneme": "θ", "actual": "f", "score": 31, "word": "think"},    │
│      {"phoneme": "ð", "actual": "d", "score": 38, "word": "the"}       │
│    ],                                                                    │
│    "historical_patterns": {                                              │
│      "θ_substitution_rate": 0.87,                                       │
│      "sessions_completed": 12,                                           │
│      "improvement_trend": "slight"                                       │
│    },                                                                    │
│    "user_difficulty_level": 2                                            │
│  }                                                                       │
│                                                                          │
│  Claude analyzes and generates:                                         │
│  • Personalized, encouraging feedback                                    │
│  • 5 adaptive practice sentences (progressive difficulty)               │
│  • Specific articulation tips                                            │
│  • Difficulty adjustment recommendation                                  │
└────────────────────────────────┬────────────────────────────────────────┘
                                 │
                                 │ Claude response received
                                 ▼
┌─────────────────────────────────────────────────────────────────────────┐
│              TEXT-TO-SPEECH GENERATION (ElevenLabs)                      │
│                          Optional Step                                   │
│                                                                          │
│  If user needs to hear correct pronunciation:                           │
│  • Generate TTS for problem words: "think", "the", "weather"           │
│  • Generate TTS for next practice sentences                             │
│  • Use appropriate accent (en-US, en-GB, etc.)                          │
│  • Store audio URLs                                                      │
└────────────────────────────────┬────────────────────────────────────────┘
                                 │
                                 │ TTS audio ready
                                 ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                    RESPONSE ASSEMBLY (Backend)                           │
│                                                                          │
│  Compile complete response:                                             │
│  {                                                                       │
│    "session_id": "uuid",                                                │
│    "overall_score": 68,                                                 │
│    "accuracy_breakdown": {                                              │
│      "pronunciation": 65,                                               │
│      "fluency": 82,                                                     │
│      "completeness": 100                                                │
│    },                                                                    │
│    "errors": [                                                           │
│      {                                                                   │
│        "phoneme": "θ",                                                  │
│        "word": "think",                                                  │
│        "score": 31,                                                      │
│        "feedback": "Try placing tongue between teeth"                   │
│      }                                                                   │
│    ],                                                                    │
│    "ai_feedback": {                                                      │
│      "text": "Good effort! I noticed you're substituting...",          │
│      "encouragement": "You're making progress on fluency!"              │
│    },                                                                    │
│    "practice_sentences": [                                               │
│      "The cat is here.",                                                │
│      "I think that's right.",                                           │
│      "This thing is smooth.",                                           │
│      "Three brothers thought about it.",                                │
│      "The weather is thoroughly unpredictable."                         │
│    ],                                                                    │
│    "tts_urls": {                                                         │
│      "correct_example": "https://storage.../correct.mp3",              │
│      "practice_1": "https://storage.../p1.mp3",                        │
│    },                                                                    │
│    "visual_data": {                                                      │
│      "phoneme_chart": [...],                                            │
│      "waveform_data": [...],                                            │
│      "progress_history": [...]                                          │
│    }                                                                     │
│  }                                                                       │
└────────────────────────────────┬────────────────────────────────────────┘
                                 │
                                 │ Return response to frontend
                                 ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                    FRONTEND RESULTS DISPLAY                              │
│                                                                          │
│  User sees:                                                              │
│  ┌───────────────────────────────────────────────────────────┐         │
│  │  Your Score: 68/100                                       │         │
│  │  ⭐⭐⭐☆☆                                                  │         │
│  │                                                            │         │
│  │  Great effort! I noticed you're substituting 'th'         │         │
│  │  sounds with 'f' and 'd'. This is very common and         │         │
│  │  completely fixable with practice.                        │         │
│  │                                                            │         │
│  │  Problem Areas:                                            │         │
│  │  🔴 "think" - θ sound (31/100)                            │         │
│  │     [Play Correct] [Play Your Recording]                  │         │
│  │                                                            │         │
│  │  💡 Tip: Place your tongue between your teeth and         │         │
│  │  blow air gently...                                        │         │
│  │                                                            │         │
│  │  📊 Progress Chart [Recharts visualization]               │         │
│  │                                                            │         │
│  │  Next Practice Sentences:                                 │         │
│  │  1. "The cat is here." [▶️ Listen] [🎤 Record]           │         │
│  │  2. "I think that's right." [▶️ Listen] [🎤 Record]      │         │
│  │  ...                                                       │         │
│  └────────────────────────────────────────────────────────────┘        │
└─────────────────────────────────────────────────────────────────────────┘
                                 │
                                 │ User continues practice
                                 ▼
                          [Loop back to top]

Data Flow Diagram

┌──────────────┐
│    USER      │
└──────┬───────┘
       │
       │ 1. Speaks into microphone
       ▼
┌──────────────────────┐
│   Web Audio API      │
│   (Browser)          │
│   • Capture audio    │
│   • Real-time visual │
└──────┬───────────────┘
       │
       │ 2. Audio blob (WAV/MP3)
       ▼
┌──────────────────────┐
│   React Frontend     │
│   • FormData prep    │
│   • Loading states   │
└──────┬───────────────┘
       │
       │ 3. HTTP POST with audio file + metadata
       ▼
┌──────────────────────────────────────────────────────────┐
│              FastAPI Backend (Railway)                   │
│                                                           │
│  Rate Limiting → Auth Check → File Validation            │
└──────┬───────────────────────────────────────────────────┘
       │
       │ 4. Save audio to storage
       ▼
┌──────────────────────┐
│  Supabase Storage    │
│  • Store audio file  │
│  • Generate URL      │
└──────┬───────────────┘
       │
       │ 5. Audio URL
       ▼
┌──────────────────────────────────────────┐
│      Parallel API Calls                  │
│                                           │
│  ┌─────────────┐      ┌─────────────┐   │
│  │   Whisper   │      │   Azure     │   │
│  │   (STT)     │      │   (Assess)  │   │
│  └──────┬──────┘      └──────┬──────┘   │
│         │                    │           │
│         │                    │           │
└─────────┼────────────────────┼───────────┘
          │                    │
          │ 6. Transcription   │ 7. Phoneme scores
          │                    │
          └────────┬───────────┘
                   │
                   ▼
          ┌────────────────┐
          │ Error Analysis │
          │ (Python Logic) │
          └────────┬───────┘
                   │
                   │ 8. Error patterns identified
                   ▼
          ┌────────────────┐
          │   PostgreSQL   │
          │   (Supabase)   │
          │ • Save results │
          │ • Get history  │
          └────────┬───────┘
                   │
                   │ 9. Historical data + current errors
                   ▼
          ┌────────────────┐
          │   Claude API   │
          │ • Generate     │
          │   feedback     │
          │ • Create       │
          │   sentences    │
          └────────┬───────┘
                   │
                   │ 10. Personalized response
                   ▼
          ┌────────────────┐
          │  ElevenLabs    │
          │  (Optional)    │
          │ • TTS for      │
          │   examples     │
          └────────┬───────┘
                   │
                   │ 11. Audio URLs
                   ▼
          ┌────────────────┐
          │  JSON Response │
          │  Assembly      │
          └────────┬───────┘
                   │
                   │ 12. Complete response object
                   ▼
          ┌────────────────┐
          │ React Frontend │
          │ • Display UI   │
          │ • Update state │
          │ • Show results │
          └────────────────┘

Storage & Database Flow

┌─────────────────────────────────────────────────────────┐
│                  SUPABASE (All-in-One)                  │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  ┌──────────────────────┐  ┌─────────────────────┐    │
│  │   PostgreSQL DB      │  │   Storage Buckets   │    │
│  │                      │  │                     │    │
│  │  Tables:             │  │  Buckets:           │    │
│  │  • users             │  │  • audio-uploads/   │    │
│  │  • sessions          │  │  • tts-generated/   │    │
│  │  • phoneme_performance│ │                     │    │
│  │  • practice_sentences│  │  Auto-delete after  │    │
│  │  • user_progress     │  │  30 days (GDPR)     │    │
│  │  • impediment_profiles│ │                     │    │
│  └──────────┬───────────┘  └─────────┬───────────┘    │
│             │                        │                 │
│             ▼                        ▼                 │
│  ┌──────────────────────────────────────────────┐    │
│  │        Supabase Realtime (Optional)          │    │
│  │  • Live progress updates                     │    │
│  │  • Multi-device sync                         │    │
│  └──────────────────────────────────────────────┘    │
│                                                        │
│  ┌──────────────────────────────────────────────┐    │
│  │            Supabase Auth                      │    │
│  │  • JWT tokens                                 │    │
│  │  • Row-level security                         │    │
│  │  • OAuth providers                            │    │
│  └──────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────┘

Complete Technology Stack Mapping

┌──────────────────────────────────────────────────────────────┐
│                     PRESENTATION LAYER                        │
│  React + TypeScript + Tailwind + shadcn/ui                   │
│  Hosted on: Vercel                                           │
└─────────────────────────┬────────────────────────────────────┘
                          │
                          │ HTTPS/WebSocket
                          ▼
┌──────────────────────────────────────────────────────────────┐
│                     APPLICATION LAYER                         │
│  FastAPI (Python 3.11+)                                      │
│  Hosted on: Railway/Render                                   │
└─────────────────────────┬────────────────────────────────────┘
                          │
          ┌───────────────┼───────────────┐
          │               │               │
          ▼               ▼               ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   AI Layer   │  │  Data Layer  │  │ Storage Layer│
│              │  │              │  │              │
│ • Whisper    │  │ • PostgreSQL │  │ • Supabase   │
│ • Azure      │  │ • Redis      │  │   Storage    │
│ • Claude     │  │   (cache)    │  │ • Cloudflare │
│ • ElevenLabs │  │ • Supabase   │  │   R2         │
└──────────────┘  └──────────────┘  └──────────────┘

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github		.github
.prompt-charter		.prompt-charter
api		api
datasets		datasets
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
railway.json		railway.json
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Training Application - System Architecture Pipeline

Complete User Flow Diagram

Data Flow Diagram

Storage & Database Flow

Complete Technology Stack Mapping

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech Training Application - System Architecture Pipeline

Complete User Flow Diagram

Data Flow Diagram

Storage & Database Flow

Complete Technology Stack Mapping

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages