A set of Python classes for transcribing and summarizing audio and video files using OpenAI APIs via Whisper-1 and GPT-3.5 turbo, in which the latter is a large language model (LLM). I made these tools for our org, the UP Data Science Society (UP DSSoc), and I'm currently using these in our activities, specifically in taking notes from our meetings and mentorship sessions. This is also my first project wherein I've extensively used ChatGPT for assistance in creating markdowns and docstrings, and I intend to do so for my next open-source projects. Due to the 100mb limit of GitHub, I've added the complete files in this GDrive link. This is currently a work in progress which I hope would be continued incrementally through collaboration.
This repo contains a series of Jupyter Notebook tutorials on how I made these classes, building up on summarizing the answers that the panelists gave to my question at Destination Lakehouse Pilipinas panel interview. Next, I provided use cases on how this NoteTaker could be used to summarize recorded lectures such as our Data Ethics Mentorship Session at UP DSSoc, and the Third Biyahenihan Research Forum (livestreamed here), in which we presented our reseach paper entitled Empowering Citizens to Build Better Bike Lanes Through Open Contracting.
- Install dependencies first before installing the NoteTaker.
pip install -r requirements.txt
- If there are any errors pertaining to
failed to find libmagic, kindly refer to the installation section for more information.
- openai: provides access to the OpenAI Whisper-1 and GPT-3.5 APIs.
- tiktoken: BPE tokeniser for use with OpenAI's models.
- pydub: for working with audio files.
- python-magic-bin: Python interface to the libmagic file type identification library.
- Wave: module for working with WAV files.
- python-ffmpeg: wrapper around the FFmpeg command line multimedia framework.
git clone https://github.com/LanzLagman/API-LLM-NoteTaker.git
cd API-LLM-NoteTaker
- On your Jupyter Notebook, load your OpenAI API key saved on a .txt file.
with open('api-key.txt', 'r') as file:
api_key = file.read()
os.environ["OPENAI_API_KEY"] = api_key
openai.api_key = os.getenv("OPENAI_API_KEY")
- Import NoteTaker
from OpenAI_NoteTaker import OpenAI_NoteTaker
- Prepare
role_txt, the prompt in which you set the behavior of the NoteTaker.
role_txt = "You are a detail-oriented STEM student from the Philippines who wants to pursue a career as a data scientist who also specializes in science communication, which allows you to easily transcribe text to pure English."
- Initialize an instance of
OpenAI_NoteTakerasQnA_NoteTaker, in which the input file is the.mp4file of the Q&A portion of the recorded talk about Data Ethics.
QnA_NoteTaker = OpenAI_NoteTaker(input_dir='Data/Input/DSSoc Mentorship/Mentorship_Vid_Pt2.mp4')
- Summarize into 6 points using the
take_notes()method, withconvert2mp3=Trueto convert first to.mp3file, then define the destination of the.mp3file which will then be used for note-taking.
QnA_NoteTaker.take_notes(system_prompt=role_txt,
n_items=6,
convert2mp3=True,
export_mp3_dir='Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt2.mp3',
show_notes=True)
- Output
Output .mp3 file saved to Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt2.mp3
Conversion to mp3 successful.
Input file size: 2.3e+01 MB.
Duration: 1.5e+03 s
Input transcription tokens: 3894
NoteTaker's Summary in 6 points:
1. The speaker highlights the importance of ethical considerations in data science.
2. The speaker emphasizes the need for consent and privacy when collecting and analyzing data.
3. The ethical issues surrounding web scraping are discussed, and the importance of analyzing the purpose and content of web-scraped information is emphasized.
4. The speaker recommends the text "Raw Data is an Oxymoron" and the work of Luciano Floridi for those interested in learning more about data ethics.
5. The importance of developing domain-specific knowledge and interdisciplinary skills is stressed.
6. The speaker recommends learning the programming language R.
CPU times: total: 6.48 s
- Save raw transcription and summarized notes.
QnA_NoteTaker.save_notes(export_transcription_dir='Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt2 [Transcribed]',
export_summary_dir='Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt2 [Notes]')
- View total pricing.
QnA_NoteTaker.get_total_job_price()
- Output price
Transcript saved at: Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt2 [Transcribed].txt
Summarized note saved at: Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt2 [Notes].txt
Job Price Breakdown:
transcription_price: 0.15260 USD
summarization_price: 0.00816 USD
total_job_price: 0.16076 USD
- Add a separate class called
OpenAI_Interrogatorthat creates a chatbot using GPT-3.5 turbo that users can use to discuss about the summarization output. - Add an option for
OpenAI_NoteTakerto split either the input audio/video file or the output transcription. This will be useful if discussions are lengthy enough, though manual splitting of input files is still recommended. - Make an app out of this repository using StreamLit.