This project demonstrates the use of local AI to transcribe, diarize, and enrich transcripts created from videos
It uses:
You will need to download and install microsoft/Phi-3-medium-4k-instruct-onnx-directml and update the model_path below. If you do not have GPU or are not using Windows, see the Phi-3 docs and set yourself up accordingly.
Pyannote.audio is gated on huggingface and requires an account and access key. See https://huggingface.co/pyannote/speaker-diarization-3.1 for instructions.
Notebooks
Name | Description |
---|---|
DownloadFromYoutube.ipynb | Use pytube to download a video |
ExtractAudioFromVideo.ipynb | Extract mp3 from mp4 with moviepy |
TranscribeVideoWithWhisperLarge.ipynb | Create a transcript with whisper-large-v3 |
DiarizeWithPyannote.ipynb | Diarize with Pyannote |
Phi3-ONNX.ipynb | Use Phi-3 with ONNX to identify names and finalize transcript |
phi3test-transformers.ipynb | A test for comparing transformers to ONNX (spoiler ONNX is waaaaay faster) |
VideoToFullTranscriptWithWhisperPyannoteAndPhi3-medium.ipynb | Complete process in a single notebook |