Whisper#2
Conversation
Reviewer's GuideAdds a new Streamlit-based Whisper speech-to-text application (with YouTube/audio upload support) plus containerization and deployment scaffolding for local, Docker Playground, and Streamlit Cloud deployments. Sequence diagram for Whisper transcription flow in Streamlit appsequenceDiagram
actor User
participant Browser
participant StreamlitApp
participant WhisperModel
participant YTDLP
participant YouTube
participant FileSystem
User->>Browser: Open_app_url
Browser->>StreamlitApp: HTTP_GET_app
StreamlitApp-->>Browser: Render_UI(title_settings_inputs)
User->>Browser: Enter_youtube_url_or_upload_audio
User->>Browser: Click_Transcribe_button
Browser->>StreamlitApp: Submit_form(text_input,audio,with_timestamps,language)
StreamlitApp->>StreamlitApp: load_model_cached()
StreamlitApp->>WhisperModel: Initialize_if_not_cached
WhisperModel-->>StreamlitApp: Model_instance
StreamlitApp->>FileSystem: Delete_existing_audio_m4a_if_exists
alt Youtube_URL_provided
StreamlitApp->>YTDLP: download_yt_audio(url)
YTDLP->>YouTube: Fetch_video_audio_stream
YouTube-->>YTDLP: Audio_stream
YTDLP->>FileSystem: Write_audio_m4a
YTDLP-->>StreamlitApp: Return_status
else Uploaded_audio_file
StreamlitApp->>FileSystem: Write_uploaded_bytes_to_audio_m4a
end
StreamlitApp->>FileSystem: Check_audio_m4a_exists
alt Audio_exists
StreamlitApp-->>Browser: Display_audio_player
StreamlitApp->>WhisperModel: transcribe(audio_m4a,language,word_timestamps)
WhisperModel-->>StreamlitApp: Predictions_dict
StreamlitApp->>StreamlitApp: postprocess_transcription(predictions,include_timestamps)
StreamlitApp-->>Browser: Show_transcription_in_expander
StreamlitApp->>FileSystem: Delete_audio_m4a
else Audio_missing
StreamlitApp-->>Browser: Show_error_audio_generation_failed
end
User->>Browser: Click_Refresh_App_button
Browser->>StreamlitApp: Refresh_request
StreamlitApp->>FileSystem: Delete_audio_m4a_if_exists
StreamlitApp->>WhisperModel: Reload_model_cached
Class-style diagram for functional components in Whisper app.pyclassDiagram
class AppModule {
+load_model() torch_module
+duration_check(info, incomplete) str
+download_yt_audio(yt_url) None
+postprocess_transcription(predictions, include_timestamps) str
+main() None
}
class WhisperModelRuntime {
+transcribe(audio_path, verbose, word_timestamps, language) dict
}
class YTDLWrapper {
+download(yt_url) int
}
class StreamlitUI {
+set_page_config()
+title()
+sidebar_settings()
+text_input()
+file_uploader()
+button()
+toast()
+spinner()
+audio()
+expander()
+write()
+error()
+info()
+success()
}
class FileSystemHelper {
+write_audio_file(bytes_data)
+delete_audio_file()
+exists_audio_file() bool
}
AppModule --> WhisperModelRuntime : uses
AppModule --> YTDLWrapper : uses
AppModule --> StreamlitUI : uses
AppModule --> FileSystemHelper : uses
WhisperModelRuntime <.. AppModule : model_instance
YTDLWrapper <.. AppModule : youtube_download
FileSystemHelper <.. AppModule : audio_m4a_management
Flow diagram for main transcription logic in app.pyflowchart TD
A["Start_app_main"] --> B["Render_title_and_sidebar_settings"]
B --> C["Get_youtube_url_text_input"]
C --> D["Get_audio_file_upload"]
D --> E["User_clicks_Transcribe_button?"]
E -->|No| F["Show_info_request_input"]
E -->|Yes| G["Have_youtube_url_or_audio?"]
G -->|No| F
G -->|Yes| H["Delete_existing_audio_m4a_if_any"]
H --> I["Show_running_toast"]
I --> J{"Youtube_URL_provided?"}
J -->|Yes| K["Call_download_yt_audio_with_yt_dlp"]
J -->|No| L["Write_uploaded_audio_bytes_to_audio_m4a"]
K --> M["Check_audio_m4a_exists"]
L --> M
M -->|No| N["Show_error_audio_generation_failed"]
M -->|Yes| O["Display_audio_player_for_audio_m4a"]
O --> P["Call_model_transcribe_with_language_and_word_timestamps"]
P --> Q["Call_postprocess_transcription_with_timestamp_option"]
Q --> R{"Transcription_non_empty?"}
R -->|No| S["No_text_to_display"]
R -->|Yes| T["Store_in_session_state_and_show_in_expander"]
T --> U["Delete_audio_m4a"]
S --> U
U --> V["End_request_wait_for_next_interaction"]
F --> V
subgraph Refresh_flow
W["User_clicks_Refresh_App_button"] --> X["Delete_audio_m4a_if_exists"]
X --> Y["Reload_model_via_load_model_cache"]
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 6 issues, and left some high level feedback:
- In
docker-compose.yml, thecommandoverrides the DockerfileENTRYPOINTand tries to execute/app/app.pydirectly; consider either removingcommandor invokingstreamlit run app.py --server.port 8501so the container starts correctly. - In
postprocess_transcription,predictions.get("segments", {})should default to a list (e.g.[]) instead of a dict, and the timestamp formatting viastr(0) + str(timedelta(...)) + ",000"is a bit opaque—using an explicit SRT-style formatter will be clearer and less error-prone. - There are unused imports/dependencies (e.g.
pyperclip,torchand the commentedtorchrequirement) which could be removed to keep the runtime image smaller and the code easier to maintain.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `docker-compose.yml`, the `command` overrides the Dockerfile `ENTRYPOINT` and tries to execute `/app/app.py` directly; consider either removing `command` or invoking `streamlit run app.py --server.port 8501` so the container starts correctly.
- In `postprocess_transcription`, `predictions.get("segments", {})` should default to a list (e.g. `[]`) instead of a dict, and the timestamp formatting via `str(0) + str(timedelta(...)) + ",000"` is a bit opaque—using an explicit SRT-style formatter will be clearer and less error-prone.
- There are unused imports/dependencies (e.g. `pyperclip`, `torch` and the commented `torch` requirement) which could be removed to keep the runtime image smaller and the code easier to maintain.
## Individual Comments
### Comment 1
<location path="speech-to-text/app.py" line_range="71" />
<code_context>
+ if not include_timestamps:
+ return predictions.get("text")
+ result = []
+ for segment in predictions.get("segments", {}):
+ startTime = str(0) + str(timedelta(seconds=int(segment["start"]))) + ",000"
+ endTime = str(0) + str(timedelta(seconds=int(segment["end"]))) + ",000"
</code_context>
<issue_to_address>
**issue (bug_risk):** Use a list as the default for `segments` to avoid iterating over a dict's keys.
If `segments` is missing, `predictions.get("segments", {})` returns a dict, so the loop iterates over its keys (strings) and `segment["start"]` will raise at runtime. Using an empty list as the default keeps the type consistent with Whisper’s output and avoids this failure:
```python
for segment in predictions.get("segments", []):
...
```
</issue_to_address>
### Comment 2
<location path="speech-to-text/docker-compose.yml" line_range="8-9" />
<code_context>
+ build:
+ context: .
+ dockerfile: Dockerfile
+ command: >
+ /app/app.py --server.port 8501
+ ports:
+ - "8501:8501"
</code_context>
<issue_to_address>
**issue (bug_risk):** The custom `command` conflicts with the Dockerfile ENTRYPOINT and likely breaks `streamlit` startup.
In the Dockerfile, `ENTRYPOINT` is `[
</issue_to_address>
### Comment 3
<location path="speech-to-text/docker-compose.yml" line_range="17-19" />
<code_context>
+ profiles:
+ - app
+
+networks:
+ app:
+ driver: bridge
\ No newline at end of file
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The declared `app` network is not attached to the `streamlit` service.
As written, `streamlit` will still use the default network because it never references `app`. If this network is needed, attach it explicitly under the `streamlit` service:
```yaml
streamlit:
...
networks:
- app
networks:
app:
driver: bridge
```
If you don’t need a custom network, you can remove the `networks` section entirely.
</issue_to_address>
### Comment 4
<location path="speech-to-text/README.md" line_range="17" />
<code_context>
+
+## Code Structure / Services
+- `app` - Complete application code built in streamlit.
+- `packages` - List of linux dependencies required to deploy code on streamlit cloud.
+- `docker-compose` - Compose file which starts application.
+
</code_context>
<issue_to_address>
**suggestion (typo):** Capitalize proper nouns like "Linux" and brand names consistently.
For example, you could rephrase this bullet as: "List of Linux dependencies required to deploy the code on Streamlit Cloud."
Suggested implementation:
```
Application is deployed on Streamlit Cloud [here](https://transcribe-whisper.streamlit.app/).
```
```
- `app` - Complete application code built in Streamlit.
- `packages` - List of Linux dependencies required to deploy the code on Streamlit Cloud.
- `docker-compose` - Compose file which starts the application.
```
</issue_to_address>
### Comment 5
<location path="speech-to-text/README.md" line_range="24" />
<code_context>
+## Deployment
+- Local deployment
+ - Install Docker. Instructions available [here](https://docs.docker.com/engine/install/). Make sure docker is up and running before proceeding.
+ - Install Git. Instruction [here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).
+ - Clone repo and run compose
+ ```
</code_context>
<issue_to_address>
**nitpick (typo):** Use plural "Instructions" to match the linked content.
Change the text to: "Install Git. Instructions [here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)."
```suggestion
- Install Git. Instructions [here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).
```
</issue_to_address>
### Comment 6
<location path="speech-to-text/README.md" line_range="31" />
<code_context>
+ git switch whisper && cd ./speech-to-text
+ docker compose --profile app up
+ ```
+ - `--profile app` will start on `localhost:8501` and `localhost:8501` ports.
+
+- Docker Playground Cloud Deployment
</code_context>
<issue_to_address>
**question (typo):** Duplicated port number and plural "ports" may be confusing.
This bullet lists `localhost:8501` twice but refers to "ports". If only one port is exposed, list it once and use "port". If multiple ports are intended, update the second value to the correct port number.
```suggestion
- `--profile app` will start on `localhost:8501` port.
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| if not include_timestamps: | ||
| return predictions.get("text") | ||
| result = [] | ||
| for segment in predictions.get("segments", {}): |
There was a problem hiding this comment.
issue (bug_risk): Use a list as the default for segments to avoid iterating over a dict's keys.
If segments is missing, predictions.get("segments", {}) returns a dict, so the loop iterates over its keys (strings) and segment["start"] will raise at runtime. Using an empty list as the default keeps the type consistent with Whisper’s output and avoids this failure:
for segment in predictions.get("segments", []):
...| command: > | ||
| /app/app.py --server.port 8501 |
There was a problem hiding this comment.
issue (bug_risk): The custom command conflicts with the Dockerfile ENTRYPOINT and likely breaks streamlit startup.
In the Dockerfile, ENTRYPOINT is `[
| networks: | ||
| app: | ||
| driver: bridge No newline at end of file |
There was a problem hiding this comment.
suggestion (bug_risk): The declared app network is not attached to the streamlit service.
As written, streamlit will still use the default network because it never references app. If this network is needed, attach it explicitly under the streamlit service:
streamlit:
...
networks:
- app
networks:
app:
driver: bridgeIf you don’t need a custom network, you can remove the networks section entirely.
|
|
||
| ## Code Structure / Services | ||
| - `app` - Complete application code built in streamlit. | ||
| - `packages` - List of linux dependencies required to deploy code on streamlit cloud. |
There was a problem hiding this comment.
suggestion (typo): Capitalize proper nouns like "Linux" and brand names consistently.
For example, you could rephrase this bullet as: "List of Linux dependencies required to deploy the code on Streamlit Cloud."
Suggested implementation:
Application is deployed on Streamlit Cloud [here](https://transcribe-whisper.streamlit.app/).
- `app` - Complete application code built in Streamlit.
- `packages` - List of Linux dependencies required to deploy the code on Streamlit Cloud.
- `docker-compose` - Compose file which starts the application.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Summary by Sourcery
Introduce a Streamlit-based speech-to-text application using OpenAI Whisper with optional YouTube audio input and Dockerized deployment.
New Features:
Enhancements:
Build:
Documentation: