Last updated: 2026-02-19
The app is an MVP that successfully:
- Records screen + system audio via ScreenCaptureKit
- Extracts audio from the recording (MOV -> M4A)
- Transcribes audio offline using WhisperKit
- Saves organized meeting folders with
transcript.md,video.mov,audio.m4a - Allows the user to pick a save location (persisted via security-scoped bookmarks)
- Shows a voice visualizer during recording
Known issues:
- Live transcription is disabled during recording due to audio conflicts with ScreenCaptureKit
- No onboarding/permission walkthrough screens (app relies on system dialogs)
- UI is functional but not yet aligned with the neo-brutalist design vision
- Camera preview view exists but is unused
- No audio-only recording mode (always records screen)
- When WiFi is not available or the WhisperKit doesn't get loaded, not transcription.
- No Summaries
- Transcription doesn't label speakers.
- Currently doesn't allow to use webcam to take video of the user.
Allow recording just the microphone audio without screen capture. This is useful for in-person meetings or phone calls where screen recording isn't needed. This also sidesteps the ScreenCaptureKit audio conflict, enabling live transcription in audio-only mode.
Fix the audio conflict between ScreenCaptureKit and AVAudioEngine. Strategy: use post-recording transcription for screen recording mode, and live transcription (SFSpeechRecognizer) for audio-only mode. This gives the user real-time feedback when screen capture isn't active.
Build a first-launch walkthrough that guides users through granting:
- Screen recording permission
- Microphone access
- Speech recognition access
- Save folder selection
This should be a step-by-step flow as described in the CLAUDE.md overview. The app should detect which permissions are missing and only prompt for those.
Restyle the app to match the neo-brutalist design direction:
- Bold borders, high contrast, raw/blocky layout
- Monospace or display-weight typography
- Minimal color palette with accent colors
- Chunky buttons and clear visual hierarchy
- Remove the default macOS "soft" look
Add the ability to pause and resume a recording session rather than requiring a full stop and restart.
After recording stops and transcription completes, show the transcript in an editable text view so the user can correct errors before saving.
Let the user set a meeting title. As a fallback, auto-generate a name from the first few words of the transcript or the date/time.
Add a view that lists past recordings with the ability to open the meeting folder, re-read the transcript, or replay the audio/video.
Expose WhisperKit's multi-language support. Let the user choose a transcription language (or auto-detect) from the settings.
After transcription, generate a structured summary of the meeting: key topics, action items, decisions made. This should run on-device if possible (using a local model) to maintain the privacy-first approach.
Identify and label different speakers in the transcript (e.g., "Speaker 1:", "Speaker 2:"). WhisperKit may support this or a separate model can be used. Create a feature that lets the user label speakers by name.
Use EventKit to pull upcoming calendar events and auto-associate recordings with meetings. Pre-fill meeting titles from calendar event names.
Allow meeting folders to sync via iCloud Drive so recordings are backed up and accessible across devices.
Support exporting transcripts as PDF, plain text, or SRT (subtitle format for video).
Add a lightweight menu bar presence so the user can start/stop recordings without opening the full window.
Prepare for distribution: proper code signing, sandboxing review, App Store metadata, and privacy policy.
- Allow the user to batch process the transcripts.
- Integrate with Obsidian.
- Develop an iOS mobile application
Search bar in Library view that queries across all saved transcript.md files. Index transcripts using SQLite FTS5 or Core Data for sub-second results across thousands of recordings.
One-click export from Library detail view: PDF (formatted with title, date, speaker labels), DOCX, SRT (subtitle format synced to audio timestamps for video review), and plain text.
NSStatusItem presence for quick start/stop without opening the main window. Shows a recording timer in the menu bar and a mini popover with the live transcript snippet.
EventKit integration to pull today's calendar events. Pre-fill meeting title from the active event name. Show upcoming meetings in the recorder as one-tap shortcuts to name the recording.
Per-session stats: speaker talk-time pie chart, word count, speaking pace (words/min), and paragraph-level sentiment (positive / neutral / concern). Shown in the Library detail panel.
Export transcript as Obsidian-compatible markdown: YAML frontmatter (date, speakers, tags), wiki-links for speaker names ([[Alex]]), and a backlink to the audio file path. One-click export to the user's configured Obsidian vault.
Detect topic-shift boundaries in the transcript using an LLM and auto-insert chapter headers (## Chapter: Budget Discussion) with anchor links. Useful for long recordings.
Re-run WhisperKit on existing audio.m4a files in Library — useful after a model upgrade or when changing the transcription language. Background queue with progress indicator.
After each recording is saved, POST a JSON payload (title, date, transcript, summary) to a user-configured webhook URL. Enables automation to Slack, Notion, Airtable, Linear, and Zapier.
Full-text search index built in Rust (Swift-Rust FFI via SwiftRust or an XPC service) for sub-millisecond search across tens of thousands of meetings. Pre-planned for the Rust integration phase.