Sicara and OCR

Sicara uses and implements OCR solutions

Sicara adapts to your business needs to deliver the solution you need and extract information from your documents. Depending on document type (either handwritten or printed), criticity of OCR in your project (central to the value, step to dataset creation,...) and delivery speed/security constraints, we leverage: - The use of specialized API (Microsoft Cognitive Services, Google Cloud Vision,...) - The implementation of custom solutions through our mastery in python libraries: Keras, PyTesseract and OpenCV

OCR - ENG Head
OCR magnifying glass 2

Optical Character Recognition (OCR) is a subdomain of Computer Vision, related to Pattern Recognition. This AI field corresponds to the rendering of physical documents into identified text. Such documents can contain handwritten and/or printed texts along with images. The main applications of OCR include documents digitization, information extraction (reading from official documents, license plates,...) or dataset creation for AI training.


1st online OCR

1 870

1st OCR


Release of Tesseract

Some Figures

Documents printed every year
Size of Scanned vs. Text Document

Early versions needed to be trained with images of each character, and worked on one font at a time. Advanced systems capable of producing a high degree of recognition accuracy for most fonts are now common, and with support for a variety of digital image file format inputs.

Wikipedia logo


How does it work?

Main steps in an OCR process

Main steps in OCR process

How does it work?

Main steps in an OCR process

1. Document Standardization (crop, rotate, format,...) 2. Text Detection 3. Text Interpretation 4. Text Intelligent Cleaning

Some Use Cases

Projects involving OCR

Mail sorting center

Some Use Cases

Projects involving OCR

OCR usages divide into 2 main categories. First, digitization is used for storage and future utilization purposes: indexing documents for search purposes, building datasets to feed Artificial Intelligence algorithms. The second category aims at replacing some tasks of a complete process with an OCR engine in order to improve productivity. Mail sorting is an illustration of such use. A mail sorting center can dispatch thousands of packages on a daily basis. From mail deposit to delivery, it will go through several routing steps, all based on the address it comes with. Automating the recognization process of addresses would improve both speed and quality in mail delivery. That is where OCR comes into action, enabling interpretation of written addresses.

Our OCR Experts

We have a Team of Experienced Computer Vision Specialists

startup, sicara, team, teamwork

Our OCR Experts

We have a Team of Experienced Computer Vision Specialists

As part of Computer Vision, our specialty, we develop OCR solutions.


Centrale Paris


Mines Paris, PhD



Picture of Raphaël


ENSTA, Polytechnique

Articles associés écrits par les Data Scientists Sicara (En Anglais)

blurry street

GAN with Keras: Application to Image Deblurring

A Generative Adversarial Networks tutorial applied to Image Deblurring with the Keras library.


Keras Tutorial: Content Based Image Retrieval Using a Denoising Autoencoder

How to find similar images thanks to Convolutional Denoising Autoencoder.

TensorFlow, AI, Docker, GPU

Set up TensorFlow with Docker + GPU in Minutes

Why Docker is the best platform to use Tensorflow with a GPU.