Table of Contents

This project contains the companion code for GPT-4o versus Azure Document Intelligence and Azure Computer Vision OCR

Notebook Name Description
PdfToTextPages.ipynb Baseline, makes a text file for each page using pypdf to extract the text
PdfToPageImages.ipynb Given a folder of PDF files, converts each page of each PDF file into JPEG images using a resolution of 300 dpi, and saves them in a structured directory format.
DocIntelligencePipeline.ipynb C# Polyglot notebook with functions to convert an entire PDF to markdown and another that creates an OCR markdown file from each image created using PdfToPageImages.ipynb
turbo-2024-04-09.ipynb Azure Open AI using GPT-4 with vision to create a markdown file for each image created using PdfToPageImages.ipynb
v4omni.ipynb OpenAI using GPT-4o to create a markdown file for each image created using PdfToPageImages.ipynb
v4omni-image-plus-docIntelOcr.ipynb OpenAI using GPT-4o to create a markdown file for each image created using PdfToPageImages.ipynb grounded with OCR text created using DocIntelligencePipeline.ipynb
visionWithOcr.ipynb Azure Computer Vision GPT4-Vision OCR
visionWithOcrAndGrounding.ipynb Azure Computer Vision GPT4-Vision OCR and grounding