Table of Contents

This project demonstrates the use of local AI to transcribe, diarize, and enrich transcripts created from videos

It uses:

You will need to download and install microsoft/Phi-3-medium-4k-instruct-onnx-directml and update the model_path below. If you do not have GPU or are not using Windows, see the Phi-3 docs and set yourself up accordingly.

Pyannote.audio is gated on huggingface and requires an account and access key. See https://huggingface.co/pyannote/speaker-diarization-3.1 for instructions.

Notebooks

Name Description
DownloadFromYoutube.ipynb Use pytube to download a video
ExtractAudioFromVideo.ipynb Extract mp3 from mp4 with moviepy
TranscribeVideoWithWhisperLarge.ipynb Create a transcript with whisper-large-v3
DiarizeWithPyannote.ipynb Diarize with Pyannote
Phi3-ONNX.ipynb Use Phi-3 with ONNX to identify names and finalize transcript
phi3test-transformers.ipynb A test for comparing transformers to ONNX (spoiler ONNX is waaaaay faster)
VideoToFullTranscriptWithWhisperPyannoteAndPhi3-medium.ipynb Complete process in a single notebook