written by
Lotfi Kobrosly

An Overview of Optical Character Recognition (OCR) in 2021

Computer Vision 8 min read , October 28, 2021
OCR in action

Introduction

Optical Character Recognition (OCR) is a powerful technology that has proven to be a key element to many companies. In fact, their digital transition requires the conversion of several images containing text instances into text documents. Thus, it is obvious that having a reliable OCR tool is crucial for information retrieval and communication.

Current OCR technologies are often quite powerful when it comes to documents that come in good conditions (well-oriented with enough light and contrast, no flaws in the image, easy-to-understand writing style and font size, etc). However, the reality is far from being perfect. Indeed, many challenges that OCR faces arise when these conditions don’t apply. As a result, there is a need for robust and well-performing tools across the spectrum of possibilities.

OCR is not always easy
Figure 1: On the left, an extract from a book where the text is perfectly aligned and characters appear in the same font and size. On the right, a street ad, with different colors and font sizes as well as shapes intersecting with some characters, which is more difficult for an OCR algorithm to detect and/or recognize.

Here at Sicara, our focus is to deliver our clients reliable solutions that use OCR. In business cases where documents appear in close-to-ideal conditions, some solutions are more relevant than others. That is why, in this article, we will be focusing on these solutions. For other unorthodox situations, you can refer to the article on OCR in the Wild on our blog.

What is OCR and how does it work?

OCR steps
Figure 2: The succession of steps of the OCR process.

OCR is when a machine converts an image containing text (typed or handwritten) into a text document. Generally, it occurs regardless of the language or the format. This task is performed in a two-step process: detecting text and recognizing the said text. However, in the face of adversity (the challenges we explained above), we can perform some preliminary actions to alleviate it. The most common ones are:

  • Skewing: re-aligning and rotating the document for a more standardized analysis
  • Despeckle: to eliminate possible parasite dots
  • Converting to grayscale or binarization
  • Deblurring and applying filters
  • Line deletion for boxes and elements that do not constitute characters (e.g: tables, images, separating lines, etc.)
  • Line detection
  • Pre-isolating the text box (or cropping)

First, we apply this preprocessing, and the result is an easier-to-digitize image. Second, text detection occurs, placing bounding boxes on the sentences or words. Then comes the identification of the text itself, which can either occur character by character or by whole words (which would make the algorithm language-specific and can thus be useful for certain use cases).

Last, another step can come later to post-process the output of the OCR algorithm to correct mistakes. E.g: if a word does not belong in the dictionary, we can replace it with a close word that requires changing a small number of characters.

What are the available OCR tools and how do we choose the most appropriate one?

Several OCR solutions are available, each with its strengths and specificities. Mainly, there are downloadable software and APIs. Let's discuss some of them here:

Cloud-based APIs

When working on a project, cost becomes part of the equation and may restrain the freedom of choice. As a consequence, it is essential to consider this factor since the APIs we will present in this section are not open-source. This is especially relevant when the use case does not require specific capabilities/ performances that are not freely available.

Google Cloud Vision

Google Cloud Vision

Being a complete package that is compatible with other Google services, this API offers an OCR service, among others. It automatically returns the bounding boxes surrounding the text and the text predicted if given an image.

Note: Google Docs also offers a free OCR tool to convert Pdf documents to text. However, it does not convert tables and footnotes.

Pros:

  • Set-up is easy
  • Generally better performance than other APIs

Cons:

  • Documentation not up-to-date
  • Installing several packages on the user’s local machine required
  • Non-customizable features

Pricing:

  • 1$50/1000 pages for 5 million pages or less
  • 0$60/1000 pages for more than 5 million pages

AWS Textract

Amazon Textract

The console interface (based on a Machine Learning algorithm) here also returns the bounding boxes and the text given an image.

Pros:

  • Flexible pricing
  • Ease of use after set-up

Cons:

  • Relatively tedious to set-up
  • Requires several steps (downloading packages and various files essentially)
  • Not suited for handwritten documents

Pricing:

  • 1$50/1000 pages for 1 million pages or less
  • 0$60/1000 pages for more than 1 million pages

Microsoft Azure Cognitive Services

Azure Cognitive Services

To use this API, one needs to create an account on the Artificial Intelligence tool of Azure: Cognitive Services. Fortunately, the implementation part that comes next to include the API usage in the code is rather easy. The resulting output from this implementation and the input image are also bounding boxes and the contained text.

Pros:

  • Easy implementation after set-up
  • Over 100 languages available
  • Compatible with Docker usage

Cons:

  • Requires a credit card addition for the free trial (privacy issue)

Pricing:

  • 1$/transaction for 1 to 1 million transactions
  • 0$65/transaction for 1 million to 10 million transactions
  • 0$60/transaction for 10 to 100 million transactions
  • 0$40/ transaction for more than 100 million transactions

IBM Datacap

IBM Datacap

This API has some interestingly attractive features. In particular, the scanning mechanism and the processing steps are rather easy. It also offers many customizable features, a strong OCR function, and compatibility with different platforms and devices. Yet, it is worth mentioning that it is slow and the support on the UI is not sufficient relative to its competitors.

Pros:

  • Simple scanning and processing mechanisms
  • Customizable features
  • Strong OCR function
  • Compatibility with different platforms and devices

Cons:

  • Slow processing
  • Insufficient support on the UI

Pricing: variable, depends on the use case (number of requests, bandwidth, etc.)


For further custom comparisons of the tools aforementioned, you can try with a few documents on this comparison platform.

Downloadable solutions

ABBYY Finereader

ABBYY Finereader

ABBYY has been providing companies with OCR tools for a long time. Although it has presented several software solutions to tackle it, we will only focus on Finereader here (the others may be previous versions or offer different features).

Pros:

  • Ergonomic interface
  • Keyboard-friendly correction feature
  • Buy-only-once software
  • Decent accuracy

Cons:

  • No merging of various documents
  • Outputs might require some post-processing.

Pricing: 199$ for the standard version for Windows and 129$ for MacOS.

Adobe Acrobat Pro DC

Adobe Acrobat

Adobe Acrobat has been unknowingly offering an OCR service for quite some time. It comes as one of the best ones overall for PDF solutions. However, it is only available as an additional feature for Adobe Acrobat PDF reader.

Pros:

  • Supports multiple formats (inputs and outputs)
  • Ease of use
  • Compatible with Acrobat’s PDF handling features

Cons:

  • Heavy on the system and the storage
  • Does not come separately from the Acrobat PDF reader

Pricing: 15$/month for the Standard Plan

Tesseract

Tesseract

It is by far the most popular open-source OCR library. Developed by Hewlett-Packard, it was later (and up to today) maintained by Google. For the open-source library, you can access this GitHub link.

Pros:

  • A large panel of languages
  • Various output formats
  • Long-Shot-Term-Memory based models
  • Trainable

Cons:

  • Might not be suited for specific client use cases

Pricing: Free

SimpleOCR

Simple OCR

SimpleOCR is a freeware destined for individual use that offers an SDK for developers as well as a wide dictionary to which custom words can be added. It also offers the possibility of processing several documents at the same time as well as a spelling check.

Pros:

  • Wide updatable dictionary (more than 120k words)
  • Ability to process many documents simultaneously

Cons:

  • Does not offer (in the free version) a command line interface
  • Cannot be deployed to several servers (for the free version)

Pricing: Free (paying versions also exist as a one-time-payment, starting from 25$)

Several other tools that are worth mentioning exist on the market, each with its strengths and weaknesses, such as Rossum, OmniPage, Klippa, Readiris, Docparser, Veryfi, and Hypatos.

Benchmarking various OCR technologies

We will here refer to Nanonets’ blog where the authors did a remarkably comprehensive job comparing several OCR tools based on rather interesting criteria. These criteria capture some of the important features of how to evaluate an OCR tool. Their final comparison table is the following:

https://nanonets.com/blog/ocr-software-best-ocr-software/#is-there-any-free-ocr-software
Figure 3: Comparison table of OCR solutions, taken from Nanonets’ blog tackling the same subject.

Of course, the evaluation criteria chosen in this table put their solution at the front seat (which is also an attempt to justify the gap in pricing between Nanonets and the rest). As a result, there is a need for additional elements to truly capture the differences between the OCR solutions. As an example, aside from the practical matters to consider (pricing, ergonomy, etc.), one can look at the accuracy of the solution, which can be measured via various metrics. In practice, this accuracy measure depends heavily on the use-case, but some general metrics exist. E.g: edit distances (Levenshtein distance, Damerau-Levenshtein distance (slightly different), Jaro-Winkler distance), Dynamic Time Warping, Hamming Distance, etc.

It is then crucial to set the proper criteria when confronting different OCR solutions, ones that are adequate to the situation at hand. The ones provided here can help serve as general directions, but the choice should respond to case-specific needs and constraints.

Conclusion

In conclusion, it is quite easy nowadays to find a good OCR solution that can answer a project’s needs. Some solutions can be more relevant than others, depending on the use case. It is thus important to keep in mind the true objective of using OCR in a given project and derive adequate evaluation metrics from it.

References

S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," in Neural Computation, vol. 9, no. 8, pp. 1735-1780, 15 Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.

Romain Karpinski, Devashish Lohani, Abdel Belaid. Metrics for Complete Evaluation of OCR Performance. IPCV’18 - The 22nd Int’l Conf on Image Processing, Computer Vision, & Pattern Recognition, Jul 2018, Las Vegas, United States. ffhal-01981731f

LEVENSHTEIN, Vladimir I., et al. Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet physics doklady. 1966. p. 707-710.

Prithiv S. Best OCR Software of 2021. Nanonets blog. October 13th, 2021. https://nanonets.com/blog/ocr-software-best-ocr-software/#is-there-any-free-ocr-software

Adam Enfroy. 10 Best OCR Software of 2021 (Free and Paid Tools). April 26th, 2021. https://www.adamenfroy.com/best-ocr-software

David Nield, Jonas P. DeMuro, Brian Turner. Best OCR software of 2021: free and paid options. October 11th, 2021. https://www.techradar.com/best/best-ocr-software

Eden AI. Optical Character Recognition (OCR): Which solution to choose? April 16th, 2020. https://edenai.medium.com/optical-character-recognition-ocr-which-solution-to-choose-cd4f829c4e5

OCR computer vision AI Data Science Sicara
Get our Battle-Tested Tutorials Delivered Straight Into Your Inbox Every Week
Sign up for our newsletter