A FastAPI-based microservice that generates multiple-choice quizzes from video input using speech-to-text and LLM processing.
This service processes video files through a comprehensive pipeline:
- Video Processing: Extracts audio from uploaded video files
- Speech-to-Text: Transcribes audio using PhoWhisper-tiny model
- Text Refinement: Uses LLM to clean up transcription errors
- Quiz Generation: Creates multiple-choice questions based on the content
- Python 3.12+
- UV package manager (or pip)
# Install dependencies
uv sync
# Or with pip
pip install -e .# Start the FastAPI server
python main.py
# Or with uvicorn directly
uvicorn main:app --host 0.0.0.0 --port 8000 --reloadThe service will be available at http://localhost:8000
Once running, visit:
- Interactive docs:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
Generates a multiple-choice quiz from a video file.
Request:
- Method:
POST - Content-Type:
multipart/form-data - Body: Video file (MP4 format recommended)
Response:
{
"quizTitle": "Kiểm tra Nghe/Đọc Hiểu",
"questions": [
{
"questionNumber": 1,
"question": "What is the main topic discussed?",
"options": [
{ "text": "Option A", "isCorrect": false },
{ "text": "Option B", "isCorrect": true },
{ "text": "Option C", "isCorrect": false },
{ "text": "Option D", "isCorrect": false }
]
}
]
}Health check endpoint.
Root endpoint with service information.
Run the test script to validate the API:
python test_api.pyMake sure you have a test video file (test.mp4) in the project directory.
# Upload a video file
curl -X POST "http://localhost:8000/generate-quiz" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "video=@test.mp4;type=video/mp4"The service is organized into modular components:
main.py- FastAPI application and endpoint definitionsstt.py- Speech-to-text functionality using PhoWhisperllm_utils.py- LLM operations for text refinement and quiz generationvideo_utils.py- Video processing and audio extractiontest_api.py- Testing utilities
The service uses OpenAI API for LLM operations. The API key is currently hardcoded but should be moved to environment variables for production:
export OPENAI_API_KEY="your-api-key-here"- Input: MP4 video files (other video formats supported by moviepy)
- Audio: Extracted as 16kHz WAV for optimal STT performance
- Output: JSON with structured quiz data
The service includes comprehensive error handling for:
- Invalid file types
- Video processing failures
- Speech-to-text errors
- LLM API failures
- Empty or unclear audio
All errors return appropriate HTTP status codes with descriptive messages.
- Video files are processed in temporary storage and cleaned up automatically
- Audio extraction is optimized for 16kHz sampling rate
- LLM calls may take several seconds depending on transcript length
- Consider implementing rate limiting for production use
- Support for multiple audio languages
- Customizable quiz difficulty levels
- Batch processing capabilities
- Audio-only input support
- Enhanced error reporting and logging
This project is provided as-is for educational and development purposes.