Whisper by uditmanav17 · Pull Request #2 · uditmanav17/ML-Projects

uditmanav17 · 2026-04-23T17:45:12Z

Summary by Sourcery

Introduce a Streamlit-based speech-to-text application using OpenAI Whisper with optional YouTube audio input and Dockerized deployment.

New Features:

Add Streamlit UI to transcribe uploaded audio files or YouTube videos using the Whisper base model with optional timestamps and language selection.
Add SRT-like postprocessed transcription output with in-app viewing and basic error handling for invalid or long YouTube inputs.

Enhancements:

Provide Streamlit configuration for dark-themed UI and basic app settings.
Add Dockerfile and docker-compose setup to containerize and run the Streamlit transcription app locally or in cloud environments.

Build:

Define Python dependencies and system packages (including ffmpeg and Whisper-related libraries) required to run the transcription app.

Documentation:

Add README documenting app usage, deployment via Docker and Docker Playground, and planned future improvements.

sourcery-ai · 2026-04-23T17:45:22Z

Reviewer's Guide

Adds a new Streamlit-based Whisper speech-to-text application (with YouTube/audio upload support) plus containerization and deployment scaffolding for local, Docker Playground, and Streamlit Cloud deployments.

Sequence diagram for Whisper transcription flow in Streamlit app

sequenceDiagram
    actor User
    participant Browser
    participant StreamlitApp
    participant WhisperModel
    participant YTDLP
    participant YouTube
    participant FileSystem

    User->>Browser: Open_app_url
    Browser->>StreamlitApp: HTTP_GET_app
    StreamlitApp-->>Browser: Render_UI(title_settings_inputs)

    User->>Browser: Enter_youtube_url_or_upload_audio
    User->>Browser: Click_Transcribe_button
    Browser->>StreamlitApp: Submit_form(text_input,audio,with_timestamps,language)

    StreamlitApp->>StreamlitApp: load_model_cached()
    StreamlitApp->>WhisperModel: Initialize_if_not_cached
    WhisperModel-->>StreamlitApp: Model_instance

    StreamlitApp->>FileSystem: Delete_existing_audio_m4a_if_exists

    alt Youtube_URL_provided
        StreamlitApp->>YTDLP: download_yt_audio(url)
        YTDLP->>YouTube: Fetch_video_audio_stream
        YouTube-->>YTDLP: Audio_stream
        YTDLP->>FileSystem: Write_audio_m4a
        YTDLP-->>StreamlitApp: Return_status
    else Uploaded_audio_file
        StreamlitApp->>FileSystem: Write_uploaded_bytes_to_audio_m4a
    end

    StreamlitApp->>FileSystem: Check_audio_m4a_exists
    alt Audio_exists
        StreamlitApp-->>Browser: Display_audio_player
        StreamlitApp->>WhisperModel: transcribe(audio_m4a,language,word_timestamps)
        WhisperModel-->>StreamlitApp: Predictions_dict

        StreamlitApp->>StreamlitApp: postprocess_transcription(predictions,include_timestamps)
        StreamlitApp-->>Browser: Show_transcription_in_expander

        StreamlitApp->>FileSystem: Delete_audio_m4a
    else Audio_missing
        StreamlitApp-->>Browser: Show_error_audio_generation_failed
    end

    User->>Browser: Click_Refresh_App_button
    Browser->>StreamlitApp: Refresh_request
    StreamlitApp->>FileSystem: Delete_audio_m4a_if_exists
    StreamlitApp->>WhisperModel: Reload_model_cached

Class-style diagram for functional components in Whisper app.py

classDiagram
    class AppModule {
        +load_model() torch_module
        +duration_check(info, incomplete) str
        +download_yt_audio(yt_url) None
        +postprocess_transcription(predictions, include_timestamps) str
        +main() None
    }

    class WhisperModelRuntime {
        +transcribe(audio_path, verbose, word_timestamps, language) dict
    }

    class YTDLWrapper {
        +download(yt_url) int
    }

    class StreamlitUI {
        +set_page_config()
        +title()
        +sidebar_settings()
        +text_input()
        +file_uploader()
        +button()
        +toast()
        +spinner()
        +audio()
        +expander()
        +write()
        +error()
        +info()
        +success()
    }

    class FileSystemHelper {
        +write_audio_file(bytes_data)
        +delete_audio_file()
        +exists_audio_file() bool
    }

    AppModule --> WhisperModelRuntime : uses
    AppModule --> YTDLWrapper : uses
    AppModule --> StreamlitUI : uses
    AppModule --> FileSystemHelper : uses

    WhisperModelRuntime <.. AppModule : model_instance
    YTDLWrapper <.. AppModule : youtube_download
    FileSystemHelper <.. AppModule : audio_m4a_management

Flow diagram for main transcription logic in app.py

flowchart TD
    A["Start_app_main"] --> B["Render_title_and_sidebar_settings"]
    B --> C["Get_youtube_url_text_input"]
    C --> D["Get_audio_file_upload"]
    D --> E["User_clicks_Transcribe_button?"]

    E -->|No| F["Show_info_request_input"]
    E -->|Yes| G["Have_youtube_url_or_audio?"]

    G -->|No| F
    G -->|Yes| H["Delete_existing_audio_m4a_if_any"]

    H --> I["Show_running_toast"]
    I --> J{"Youtube_URL_provided?"}

    J -->|Yes| K["Call_download_yt_audio_with_yt_dlp"]
    J -->|No| L["Write_uploaded_audio_bytes_to_audio_m4a"]

    K --> M["Check_audio_m4a_exists"]
    L --> M

    M -->|No| N["Show_error_audio_generation_failed"]
    M -->|Yes| O["Display_audio_player_for_audio_m4a"]

    O --> P["Call_model_transcribe_with_language_and_word_timestamps"]
    P --> Q["Call_postprocess_transcription_with_timestamp_option"]

    Q --> R{"Transcription_non_empty?"}
    R -->|No| S["No_text_to_display"]
    R -->|Yes| T["Store_in_session_state_and_show_in_expander"]

    T --> U["Delete_audio_m4a"]
    S --> U

    U --> V["End_request_wait_for_next_interaction"]

    F --> V

    subgraph Refresh_flow
        W["User_clicks_Refresh_App_button"] --> X["Delete_audio_m4a_if_exists"]
        X --> Y["Reload_model_via_load_model_cache"]
    end

File-Level Changes

Change	Details	Files
Implement Streamlit Whisper transcription app supporting YouTube downloads and file uploads with optional timestamped output.	Configure Streamlit page settings and sidebar options for timestamp inclusion and language selection. Add cached Whisper model loader using the base model variant. Implement YouTube audio download helper using yt-dlp with a 10-minute duration filter and ffmpeg-based audio extraction. Implement transcription post-processing to optionally render SRT-style timestamped segments. Wire UI interactions to download or save audio, invoke Whisper transcription, display audio player and results, and handle error/toast messaging and refresh logic.	`speech-to-text/app.py`
Document the app’s purpose and provide Docker-based deployment instructions (local and Docker Playground/cloud).	Describe project objective and public Streamlit deployment URL. Explain directory structure and components (app, packages, docker-compose). Provide step-by-step local Docker and Docker Playground deployment commands. List future improvement ideas such as SRT export and streaming transcription.	`speech-to-text/README.md`
Add Streamlit configuration and theming for the new app.	Define Streamlit server config placeholders and file watcher settings (mostly commented out). Set a dark theme base and font family in config.	`speech-to-text/.streamlit/config.toml`
Containerize the app and define a docker-compose setup for local development.	Create a Python 3.11-slim based Dockerfile that installs ffmpeg and Python dependencies, copies app code, exposes port 8501, and runs Streamlit. Create docker-compose service that builds the image, maps port 8501, mounts the source directory, and assigns an app profile and bridge network.	`speech-to-text/Dockerfile` `speech-to-text/docker-compose.yml`
Declare application Python dependencies and ancillary project files.	Add requirements for streamlit, pyperclip, yt-dlp, and openai-whisper (with commented torch/ffmpeg-python). Add empty or placeholder support files like packages.txt and .gitignore for the speech-to-text app.	`speech-to-text/requirements.txt` `packages.txt` `speech-to-text/.gitignore`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 6 issues, and left some high level feedback:

In docker-compose.yml, the command overrides the Dockerfile ENTRYPOINT and tries to execute /app/app.py directly; consider either removing command or invoking streamlit run app.py --server.port 8501 so the container starts correctly.
In postprocess_transcription, predictions.get("segments", {}) should default to a list (e.g. []) instead of a dict, and the timestamp formatting via str(0) + str(timedelta(...)) + ",000" is a bit opaque—using an explicit SRT-style formatter will be clearer and less error-prone.
There are unused imports/dependencies (e.g. pyperclip, torch and the commented torch requirement) which could be removed to keep the runtime image smaller and the code easier to maintain.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `docker-compose.yml`, the `command` overrides the Dockerfile `ENTRYPOINT` and tries to execute `/app/app.py` directly; consider either removing `command` or invoking `streamlit run app.py --server.port 8501` so the container starts correctly.
- In `postprocess_transcription`, `predictions.get("segments", {})` should default to a list (e.g. `[]`) instead of a dict, and the timestamp formatting via `str(0) + str(timedelta(...)) + ",000"` is a bit opaque—using an explicit SRT-style formatter will be clearer and less error-prone.
- There are unused imports/dependencies (e.g. `pyperclip`, `torch` and the commented `torch` requirement) which could be removed to keep the runtime image smaller and the code easier to maintain.

## Individual Comments

### Comment 1
<location path="speech-to-text/app.py" line_range="71" />
<code_context>
+    if not include_timestamps:
+        return predictions.get("text")
+    result = []
+    for segment in predictions.get("segments", {}):
+        startTime = str(0) + str(timedelta(seconds=int(segment["start"]))) + ",000"
+        endTime = str(0) + str(timedelta(seconds=int(segment["end"]))) + ",000"
</code_context>
<issue_to_address>
**issue (bug_risk):** Use a list as the default for `segments` to avoid iterating over a dict's keys.

If `segments` is missing, `predictions.get("segments", {})` returns a dict, so the loop iterates over its keys (strings) and `segment["start"]` will raise at runtime. Using an empty list as the default keeps the type consistent with Whisper’s output and avoids this failure:

```python
for segment in predictions.get("segments", []):
    ...
```
</issue_to_address>

### Comment 2
<location path="speech-to-text/docker-compose.yml" line_range="8-9" />
<code_context>
+    build:
+      context: .
+      dockerfile: Dockerfile
+    command: >
+      /app/app.py --server.port 8501
+    ports:
+      - "8501:8501"
</code_context>
<issue_to_address>
**issue (bug_risk):** The custom `command` conflicts with the Dockerfile ENTRYPOINT and likely breaks `streamlit` startup.

In the Dockerfile, `ENTRYPOINT` is `[
</issue_to_address>

### Comment 3
<location path="speech-to-text/docker-compose.yml" line_range="17-19" />
<code_context>
+    profiles:
+      - app
+
+networks:
+  app:
+    driver: bridge
\ No newline at end of file
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The declared `app` network is not attached to the `streamlit` service.

As written, `streamlit` will still use the default network because it never references `app`. If this network is needed, attach it explicitly under the `streamlit` service:

```yaml
  streamlit:
    ...
    networks:
      - app

networks:
  app:
    driver: bridge
```

If you don’t need a custom network, you can remove the `networks` section entirely.
</issue_to_address>

### Comment 4
<location path="speech-to-text/README.md" line_range="17" />
<code_context>
+
+## Code Structure / Services
+- `app` - Complete application code built in streamlit.
+- `packages` - List of linux dependencies required to deploy code on streamlit cloud.
+- `docker-compose` - Compose file which starts application.
+
</code_context>
<issue_to_address>
**suggestion (typo):** Capitalize proper nouns like "Linux" and brand names consistently.

For example, you could rephrase this bullet as: "List of Linux dependencies required to deploy the code on Streamlit Cloud."

Suggested implementation:

```
Application is deployed on Streamlit Cloud [here](https://transcribe-whisper.streamlit.app/).

```

```
- `app` - Complete application code built in Streamlit.
- `packages` - List of Linux dependencies required to deploy the code on Streamlit Cloud.
- `docker-compose` - Compose file which starts the application.

```
</issue_to_address>

### Comment 5
<location path="speech-to-text/README.md" line_range="24" />
<code_context>
+## Deployment
+- Local deployment
+    - Install Docker. Instructions available [here](https://docs.docker.com/engine/install/). Make sure docker is up and running before proceeding.
+    - Install Git. Instruction [here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).
+    - Clone repo and run compose
+    ```
</code_context>
<issue_to_address>
**nitpick (typo):** Use plural "Instructions" to match the linked content.

Change the text to: "Install Git. Instructions [here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)."

```suggestion
    - Install Git. Instructions [here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).
```
</issue_to_address>

### Comment 6
<location path="speech-to-text/README.md" line_range="31" />
<code_context>
+    git switch whisper && cd ./speech-to-text
+    docker compose --profile app up
+    ```
+    - `--profile app` will start on `localhost:8501` and `localhost:8501` ports.
+
+- Docker Playground Cloud Deployment
</code_context>
<issue_to_address>
**question (typo):** Duplicated port number and plural "ports" may be confusing.

This bullet lists `localhost:8501` twice but refers to "ports". If only one port is exposed, list it once and use "port". If multiple ports are intended, update the second value to the correct port number.

```suggestion
    - `--profile app` will start on `localhost:8501` port.
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-04-23T17:46:58Z

+    if not include_timestamps:
+        return predictions.get("text")
+    result = []
+    for segment in predictions.get("segments", {}):


issue (bug_risk): Use a list as the default for segments to avoid iterating over a dict's keys.

If segments is missing, predictions.get("segments", {}) returns a dict, so the loop iterates over its keys (strings) and segment["start"] will raise at runtime. Using an empty list as the default keeps the type consistent with Whisper’s output and avoids this failure:

for segment in predictions.get("segments", []): ...

sourcery-ai · 2026-04-23T17:46:58Z

+    command: >
+      /app/app.py --server.port 8501


issue (bug_risk): The custom command conflicts with the Dockerfile ENTRYPOINT and likely breaks streamlit startup.

In the Dockerfile, ENTRYPOINT is `[

sourcery-ai · 2026-04-23T17:46:58Z

+networks:
+  app:
+    driver: bridge


suggestion (bug_risk): The declared app network is not attached to the streamlit service.

As written, streamlit will still use the default network because it never references app. If this network is needed, attach it explicitly under the streamlit service:

streamlit: ... networks: - app networks: app: driver: bridge

If you don’t need a custom network, you can remove the networks section entirely.

sourcery-ai · 2026-04-23T17:46:58Z

+
+## Code Structure / Services
+- `app` - Complete application code built in streamlit.
+- `packages` - List of linux dependencies required to deploy code on streamlit cloud.


suggestion (typo): Capitalize proper nouns like "Linux" and brand names consistently.

For example, you could rephrase this bullet as: "List of Linux dependencies required to deploy the code on Streamlit Cloud."

Suggested implementation:

Application is deployed on Streamlit Cloud [here](https://transcribe-whisper.streamlit.app/).

- `app` - Complete application code built in Streamlit. - `packages` - List of Linux dependencies required to deploy the code on Streamlit Cloud. - `docker-compose` - Compose file which starts the application.

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

uditmanav17 added 21 commits August 4, 2024 16:26

whisper app v1

e178c6b

path update

abf9b39

update reqs

9dc9ea1

minor fix + package for streamlit

e3fd25a

readme update + streamlit config

0f033d6

minor fix

322ee4c

play audio

8d0c9ca

revert audio player

b7ee540

test

ce66374

update reqs

247ddc1

pkg update

ef45306

config update

165145a

test 2

43697fc

pkgs update

21ed027

update pkgs

cbf75cb

remove previous audio file

aadee4a

streamlit issue

1896743

streamlit issue fix

706fd98

yt-dlp rete limit fix

262fc20

disable cookies

f28aa04

removed stale code

cd66ca4

sourcery-ai Bot reviewed Apr 23, 2026

View reviewed changes

uditmanav17 and others added 2 commits April 23, 2026 23:19

Update speech-to-text/README.md

3797b29

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

Update speech-to-text/README.md

6aa46d3

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper#2

Whisper#2
uditmanav17 wants to merge 23 commits into
mainfrom
whisper

uditmanav17 commented Apr 23, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented Apr 23, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot Apr 23, 2026

Uh oh!

sourcery-ai Bot Apr 23, 2026

Uh oh!

sourcery-ai Bot Apr 23, 2026

Uh oh!

sourcery-ai Bot Apr 23, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

uditmanav17 commented Apr 23, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for Whisper transcription flow in Streamlit app

Class-style diagram for functional components in Whisper app.py

Flow diagram for main transcription logic in app.py

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

uditmanav17 commented Apr 23, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Apr 23, 2026 •

edited

Loading