98% Accuracy Rate with ChatGPT-4o Vision OCR

Situation

A client reached out to us because they were interested in using AI to automatically extract data from paperwork. They had heard about ChatGPT-4o's OCR capabilities, and wanted to see if it could be used to automate out the heavy amounts of paperwork processing their business was used to.

The challenge was that the paperwork was not standardized. Different fields had different spellings, formats, and were often located in different places on each document. What further complicated the matter was that some documents scanned in weren't valid documents to process either.

They had 10 fields per document they wanted to extract from each valid document.

Task

We were tasked with developing a tool that could extract all of the set fields in documents automatically. We were to first attempt this using ChatGPT-4o and aim to achieve a 90% accuracy rate with all extractions. If this rate wasn't hit, we were to try different techniques and technologies until that rate was achieved.

Action

We first established a baseline for how well ChatGPT Vision API could extract the data from their paperwork on a first pass. We discovered that ChatGPT had a baseline accuracy rate between 80-90% for all documents with our first attempt. This indicated that it was a suitable base to build off of, and the rest of the work was handling the edge cases to bring the rate over 90%.

We then tried to collect all of the different formats and spellings for each field and passed in more description prompts to ChatGPT. Combined with other OCR technologies and techniques, we would take multiple scans through each document and compare results to capture all edge cases that the baseline Vision API could not capture.

Results

Our final delivered product managed to extract data 98% of the time, across hundreds of tested fields and documents. We delivered the initial prototype within 1 month, and spent the next month improving accuracy to get the rate up as high as possible. The client was pleased with the results and we are currently helping them onboard more users onto the tool.

Key Lessons

ChatGPT-4o Vision API performs extremely well and can capture most use-cases even in quick prototypes.
Smart prompting, and a combination of different OCR technologies together can help improve accuracy
The quality of the image is critical for higher extraction rates. We found that improving the initial image resolution, upscaling the image, and being more diligent on how the papers were scanned in were critical to improve the accuracy rate of the product.