This project demonstrates the use of local AI to transcribe, diarize, and enrich transcripts created from videos

It uses:

You will need to download and install microsoft/Phi-3-medium-4k-instruct-onnx-directml and update the model_path below. If you do not have GPU or are not using Windows, see the Phi-3 docs and set yourself up accordingly.

Pyannote.audio is gated on huggingface and requires an account and access key. See https://huggingface.co/pyannote/speaker-diarization-3.1 for instructions.

Notebooks

Name	Description
DownloadFromYoutube.ipynb	Use pytube to download a video
ExtractAudioFromVideo.ipynb	Extract mp3 from mp4 with moviepy
TranscribeVideoWithWhisperLarge.ipynb	Create a transcript with whisper-large-v3
DiarizeWithPyannote.ipynb	Diarize with Pyannote
Phi3-ONNX.ipynb	Use Phi-3 with ONNX to identify names and finalize transcript
phi3test-transformers.ipynb	A test for comparing transformers to ONNX (spoiler ONNX is waaaaay faster)
VideoToFullTranscriptWithWhisperPyannoteAndPhi3-medium.ipynb	Complete process in a single notebook

Table of Contents

This project demonstrates the use of local AI to transcribe, diarize, and enrich transcripts created from videos

Notebooks