feat: Add interview pipeline, TTS, and transcription endpoints by mlarsen-source · Pull Request #5 · mlarsen-source/Interview-Coach

mlarsen-source · 2026-06-01T07:50:06Z

Summary

Implements the core interview pipeline: POST /speech/tts streams Groq Orpheus audio for the interviewer voice, and POST /speech/transcribe sends audio to Groq Whisper for segment-level transcription then scores each segment with the local wav2vec2 emotion model. Introduces InterviewClient, a full-session interview UI that replaces the home page placeholder, with interviewer persona selection, voice activity detection, automatic recording flow, and per-segment arousal/dominance/valence display. Moves AGENTS.md and CLAUDE.md from frontend/ to the repository root, expands them to project-wide scope, and adds docs/architecture.md with a full system data-flow diagram. Updates all setup documentation and agent workflows to reflect Groq as the speech provider and the one-time Orpheus terms-acceptance requirement.

What's new

Surface	Status	Layer
`POST /speech/tts`	new	Backend
`POST /speech/transcribe`	new (was empty TODO)	Backend
Interview UI (`/`)	new	Frontend
`/dev/transcribe` debug page	new	Frontend
Question bank + interviewer personas	new	Frontend data
`AGENTS.md` / `CLAUDE.md`	moved from `frontend/` to root, expanded	Config
`docs/architecture.md`	new	Documentation
`docs/agent-workflows/run-project.md`	new	Documentation

File-by-file changes

Backend — API

backend/app.py: Added OPTIONS to allow_methods in CORS middleware so multipart file-upload preflight requests from the browser are not blocked.
backend/requirements.txt: Added groq>=0.9.0 as the Groq SDK dependency; updated the speech-to-text comment to reflect Groq Whisper replacing OpenAI.
backend/services/speech_to_text/router.py: Replaced the empty TODO stub with full implementations of POST /speech/tts (Groq Orpheus → WAV) and POST /speech/transcribe (Groq Whisper segmentation + per-segment wav2vec2 scoring); uses os.environ.get for the API key and raises HTTP 503 at request time when the key is absent so a missing key does not crash backend startup.

Frontend — Interview UI

frontend/app/InterviewClient.tsx: New client component implementing the full interview state machine — idle, playing (TTS), recording (MediaRecorder + Web Audio VAD), processing, and done — with interviewer selection, abort-signal-gated fetch calls to /speech/tts and /speech/transcribe, and per-session used-question tracking via a caller-owned Set ref.
frontend/app/InterviewClient.module.css: New module stylesheet for InterviewClient; all color values reference semantic tokens defined in globals.css.
frontend/app/page.tsx: Replaced the static placeholder home page with InterviewClient so the root route is the live interview UI.

Frontend — Dev page

frontend/app/dev/transcribe/page.tsx: New route entry point for the transcription debug page at /dev/transcribe.
frontend/app/dev/transcribe/TranscribeDevClient.tsx: New dev-only client component that records audio in accumulated WebM chunks and sends each to POST /speech/transcribe, displaying sent-chunk status and returned segments side by side in real time.
frontend/app/dev/transcribe/TranscribeDevClient.module.css: New module stylesheet for the dev transcribe page; all color values reference semantic tokens.

Frontend — Styles

frontend/app/globals.css: Added 18 semantic color tokens (--color-surface, --color-danger, --color-muted, etc.) so component module files reference tokens rather than hard-coded hex values.

Frontend — Data

frontend/lib/prompts/interviewers.ts: New module defining the six Groq Orpheus interviewer personas (name, title, voice name) and the default selection.
frontend/lib/prompts/questions.ts: New module with a 20-question bank (1 intro, 10 technical, 10 behavioral) and pickRandomQuestion(usedIds) that accepts a caller-owned Set to avoid module-level mutable state.

Agent configuration

frontend/AGENTS.md: Deleted — file moved to repository root and expanded to project-wide scope.
frontend/CLAUDE.md: Deleted — file moved to repository root.
AGENTS.md: New at repository root; expanded from frontend-only conventions to project-wide scope covering pipeline architecture, backend endpoints, Groq Orpheus setup requirement, run-project trigger phrases, and environment variable table.
CLAUDE.md: New at repository root; thin pointer to AGENTS.md, replacing the deleted frontend/CLAUDE.md.

Documentation

README.md: Updated setup steps (added numbered Groq Orpheus terms-acceptance step with symptom description), updated repo structure tree, and updated stack table to reflect Groq Whisper and Orpheus as the speech provider.
backend/.env.example: Added GROQ_API_KEY entry with console URL; reorganized comments to reflect Groq as the primary speech dependency.
backend/README.md: Added POST /speech/tts endpoint entry and full request/response documentation; updated POST /speech/transcribe status to done with curl example; added score-interpretation table.
docs/architecture.md: New document with full system diagram, step-by-step pipeline walkthrough for all seven stages, data schemas, key design decisions, environment variable table, and local dev URLs.
docs/agent-workflows/README.md: Updated references from frontend/AGENTS.md to AGENTS.md; added run-project.md to the workflow table.
docs/agent-workflows/code-quality-review.md: Updated frontend/AGENTS.md references to AGENTS.md.
docs/agent-workflows/css-and-component-standards-review.md: Updated frontend/AGENTS.md references to AGENTS.md.
docs/agent-workflows/feature-implementation-planning.md: Updated frontend/AGENTS.md references to AGENTS.md.
docs/agent-workflows/figma-design-to-code.md: Updated frontend/AGENTS.md references to AGENTS.md.
docs/agent-workflows/pre-merge-full-review.md: Updated frontend/AGENTS.md references to AGENTS.md.
docs/agent-workflows/run-project.md: New workflow document for agent-driven project startup; covers state assessment, prerequisite fixes, Groq Orpheus terms check, sequential backend/frontend launch, stop instructions, and error reference table.
docs/agent-workflows/test-suite-quality-review.md: Updated frontend/AGENTS.md references to AGENTS.md.

Environment variables

GROQ_API_KEY — add

feat: Add interview pipeline, TTS, transcription endpoints

ec2a10d

mlarsen-source self-assigned this Jun 1, 2026

mlarsen-source requested review from JacobWoodbury and dangrabo June 1, 2026 07:50

JacobWoodbury approved these changes Jun 4, 2026

View reviewed changes

mlarsen-source merged commit f8babc5 into main Jun 4, 2026
2 checks passed

mlarsen-source deleted the feat/interview-pipeline branch June 4, 2026 19:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add interview pipeline, TTS, and transcription endpoints#5

feat: Add interview pipeline, TTS, and transcription endpoints#5
mlarsen-source merged 1 commit into
mainfrom
feat/interview-pipeline

mlarsen-source commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mlarsen-source commented Jun 1, 2026

Summary

What's new

File-by-file changes

Backend — API

Frontend — Interview UI

Frontend — Dev page

Frontend — Styles

Frontend — Data

Agent configuration

Documentation

Environment variables

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants