Caption is a full-stack application for transcribing audio and video files with word-level timestamps and confidence scores.
It consists of:
- Model Server: FastAPI backend (with ML models, e.g., Whisper) running in Docker.
- Client: React frontend for uploading files, live recording, and managing transcriptions.
- Upload audio/video files for transcription
- Live audio recording and transcription
- Word-level timestamps and confidence scores
- User authentication (Supabase)
- Stripe integration for credits/payments
- Background processing (optionally with Celery + Redis)
- Download SRT and JSON results
- Responsive UI
Caption/
│
├── model-service/ # FastAPI backend (Dockerized)
│ ├── api/ # API routers (audio, video, etc.)
│ ├── handlers/ # Processing logic (e.g., VideoProcessor)
│ ├── main.py # FastAPI app entrypoint
│ ├── requirements.txt # Python dependencies
│ ├── Dockerfile # Docker build file
│ └── ... # Other backend files
│
├── client/ # React frontend
│ ├── src/
│ │ ├── components/ # React components (upload, audiototext, record, etc.)
│ │ └── ... # Other frontend code
│ └── ... # React config, public, etc.
│
├── docker-compose.yml # (If using Celery/Redis)
└── README.md
- Docker
- Node.js (for client)
- Supabase project (for auth)
- (Optional) Redis and Celery for background tasks
cd model-service
docker build -t caption-backend .
docker run -d -p 8000:8000 --env-file .env --name caption-backend caption-backend- The API will be available at
http://localhost:8000/docs
Create a .env file in model-service/ with:
SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_key
STRIPE_SECRET_KEY=your_stripe_key
# (Add any other needed variables)
POST /analyze/audio/transcribe— Transcribe audio filePOST /analyze/video/transcribe— Transcribe video fileGET /analyze/audio/task-status/{task_id}— Check audio task status (if using Celery)GET /analyze/video/task-status/{task_id}— Check video task status (if using Celery)
cd client
npm installCreate a .env file in client/ with:
VITE_SUPABASE_URL=your_supabase_url
VITE_SUPABASE_ANON_KEY=your_supabase_anon_key
VITE_EMAILJS_SERVICE_ID=your_emailjs_service_id
VITE_EMAILJS_TEMPLATE_ID=your_emailjs_template_id
VITE_EMAILJS_PUBLIC_KEY=your_emailjs_public_key
npm run dev- The app will be available at
http://localhost:5173(or as shown in your terminal).
If you want to enable background task processing:
- Add
celeryandredistorequirements.txt - Use the provided
docker-compose.ymlto start backend, celery worker, and redis:
docker-compose up --build- Go to the client app, sign in, and upload or record audio/video.
- The backend will process the file and return results.
- Download SRT or JSON as needed.
- Email: hello@dobedosoft.com
- Phone: +91 9726181166
- Address: DOBODO soft, MOTA VARACHA, Surat, India
- GitHub
MIT License