Skip to content

Comments

feat: add chunk splitting, S3 storage, and session transcript copy#288

Open
animalnots wants to merge 2 commits intoOpenWhispr:mainfrom
animalnots:main
Open

feat: add chunk splitting, S3 storage, and session transcript copy#288
animalnots wants to merge 2 commits intoOpenWhispr:mainfrom
animalnots:main

Conversation

@animalnots
Copy link

PR: Add Chunk Splitting, S3 Storage & Session Transcript Copy

Summary

This PR adds three features:

  • size-based recording chunk splitting to avoid memory-only recording risk and provide incremental transcription,
  • S3-compatible temporary storage + presigned URL flow for large-file transcription,
  • full-session transcript aggregation with a "Copy Full Session" action in the UI.

Tested on Windows only.


What Changed

1. Feature: Size-Based Recording Chunk Splitting

Problem: Previously, recordings were kept entirely in memory until the user stopped recording. For long recordings (e.g., 2+ hours), this created:

  • Data loss risk — Browser crash or memory limit = entire recording lost
  • Large transcription delays — Users had to wait until the end to get any transcription results
  • Inefficient for large files — Single large blob difficult to upload/process

Solution: New chunk splitting feature that automatically splits recordings at a configurable size threshold (default: 24.5 MB):

  • Incremental disk saves — Each chunk is saved to disk immediately via new recordingStorage.js module
  • Incremental transcription — Chunks transcribe as they complete (live results during recording)
  • Size tracking — Added timeslice: 1000 to MediaRecorder.start() calls so ondataavailable fires every ~1s, enabling real-time cumulative size tracking
  • Configurable threshold — User can adjust chunk size in Settings → Recording

New files: src/helpers/recordingStorage.js
Modified: src/helpers/audioManager.js (new _startChunkSplitTimer, _splitRecordingChunk, _saveRecordingBackup methods)


2. Feature: S3-Compatible Cloud Storage for Large Files

Why: For large audio files (>25 MB), direct upload to transcription providers can fail with HTTP 413 errors. S3-compatible storage provides temporary hosting with presigned URLs that services like Groq can fetch directly.

Changes:

  • src/helpers/s3Storage.js — New S3StorageManager with support for any S3-compatible provider (AWS S3, Cloudflare R2, MinIO, Backblaze B2, etc.). Supports custom endpoint URL, region, forcePathStyle, presigned URLs, and a full connection test (write → read via presigned URL → delete).
  • main.js — Instantiates S3StorageManager and passes to IPC handlers.
  • src/helpers/ipcHandlers.js — S3 IPC handlers for config, test connection, upload, presigned URLs, and cleanup.
  • preload.js — S3 IPC channel exposures.
  • src/types/electron.ts — Type definitions for S3 config including endpointUrl, region, forcePathStyle, presigned URL on upload.
  • src/helpers/audioManager.js — S3 upload/cleanup methods, presigned URL passthrough for Groq large-file transcription.
  • src/components/SettingsPage.tsx — S3 settings UI with endpoint URL and region fields, plus a quick-start guide.
  • src/locales/en/translation.json — S3 storage i18n strings.

Reliability improvement: The connection test now verifies write + public read (via presigned URL) + delete, not just bucket access. This catches misconfigured CORS/permissions before the user starts recording.

Presigned URL passthrough: For files exceeding 25 MB, the app now passes a presigned S3 URL to Groq's API (url parameter) instead of uploading the blob directly, avoiding HTTP 413 errors.


3. Feature: Copy Full Session Transcript

Why: When chunk splitting is enabled, a long recording (e.g. 2 hours) produces many partial transcriptions. Users had no way to get the combined text for the entire session.

How it works:

  1. AudioManager tracks _sessionTranscripts[] — an ordered array of {partIndex, text} entries.
  2. Each chunk split and the final recording part push their transcription text into this array.
  3. When the final part's transcription completes and there are >1 parts, the combined text is:
    • Pasted into the active input field (e.g. Notepad++) as the full session text — not just the last part.
    • Broadcast via IPC (session-transcript-ready) to the ControlPanel window.
  4. The ControlPanel shows a dismissible banner with session timing (start → end, duration, character count) and a "Copy Full Session" button.

Files changed:

  • src/helpers/audioManager.js — Session tracking, combined text assembly, IPC broadcast with timing data.
  • src/hooks/useAudioRecording.js — Uses result.sessionText for paste at session end instead of just the final part's text.
  • src/helpers/ipcHandlers.js — New broadcast-session-transcript IPC handler.
  • preload.jsbroadcastSessionTranscript and onSessionTranscriptReady channels.
  • src/types/electron.ts — Type definitions for session transcript IPC (includes timing fields).
  • src/components/ControlPanel.tsx — Session transcript banner with Layers icon, timing info, Copy button, and dismiss (X) button.
  • src/locales/en/translation.json — i18n strings for session banner (controlPanel.session.*).

Files Modified

File Change Type
src/helpers/audioManager.js Chunk splitting + session tracking + S3 upload/cleanup
src/helpers/s3Storage.js New file
src/helpers/recordingStorage.js New file
src/helpers/ipcHandlers.js S3 IPC handlers, session transcript broadcast
preload.js S3 and session transcript IPC channels
src/types/electron.ts S3 config types, session transcript types
main.js S3StorageManager instantiation
src/components/SettingsPage.tsx S3 settings UI with endpoint/region fields
src/components/ControlPanel.tsx Session transcript banner + copy button
src/hooks/useAudioRecording.js Combined session paste at session end
src/locales/en/translation.json S3 + session transcript i18n strings

Testing Notes

  • Platform: Windows 11 only. macOS/Linux not tested.
  • Chunk splitting: Set chunk size to 1 MB in Settings → Recording, record for a few minutes, verify files in recordings folder are ~1 MB each and transcriptions appear incrementally.
  • Session transcript: After a multi-part recording, verify the banner appears in ControlPanel with correct part count, timing, and that "Copy Full Session" copies the combined text.
  • S3 storage: Configure any S3-compatible provider, run "Test Connection" — should show write/read/delete steps passing.
  • Presigned URL: Record a file >25 MB with Groq provider and S3 enabled — should use URL mode instead of direct upload.

Breaking Changes

None.

## Features
- Add size-based recording chunk splitting
  - Previously, recordings were kept entirely in memory (data loss risk for long recordings)
  - New: automatically split recordings at configurable threshold (default 24.5MB)
  - Save each chunk to disk immediately via new recordingStorage module
  - Transcribe chunks as they complete (incremental results during long recordings)
  - Added timeslice to MediaRecorder for real-time size tracking
- Refactor R2-specific storage to generic S3-compatible implementation
  - Support any S3-compatible provider (AWS S3, MinIO, Backblaze B2, etc.)
  - Add custom endpoint URL and region configuration
  - Enhanced connection test: write → read via presigned URL → delete
  - Presigned URL passthrough for large files (>25MB) to avoid 413 errors

- Add "Copy Full Session" for multi-part recordings
  - Track and combine all partial transcripts from a session
  - Show dismissible banner with timing info (duration, start/end times)
  - Paste combined text at session end (not just last part)
  - Copy button for full session transcript

## Breaking Changes
- none

## Testing
- Tested on Windows only
- Cross-platform testing recommended before merge
- Added a new setting `passAgentNameToWhisper` (default false) to control whether the agent name is added to the custom dictionary.
- Updated [syncAgentNameToDictionary] to respect the new setting and dynamically add or remove the agent name from the dictionary when the setting is toggled.
- Added a UI toggle in the Settings page under Voice Agent configuration.
- Added localization strings for the new setting across all supported languages (en, es, fr, de, it, ja, pt, ru, zh-CN).
- Added debug logging in [audioManager.js] to track when the custom dictionary is appended to the transcription prompt for local Whisper, OpenWhispr Cloud, and Cloud API providers (like Groq/OpenAI).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant