Sicara adapts to your business needs to deliver the solution you need and extract information from your documents. Depending on document type (either handwritten or printed), criticity of OCR in your project (central to the value, step to dataset creation,...) and delivery speed/security constraints, we leverage: - The use of specialized API (Microsoft Cognitive Services, Google Cloud Vision,...) - The implementation of custom solutions through our mastery in python libraries: Keras, PyTesseract and OpenCV
Optical Character Recognition (OCR) is a subdomain of Computer Vision, related to Pattern Recognition. This AI field corresponds to the rendering of physical documents into identified text. Such documents can contain handwritten and/or printed texts along with images. The main applications of OCR include documents digitization, information extraction (reading from official documents, license plates,...) or dataset creation for AI training.
1st online OCR
Release of Tesseract
Early versions needed to be trained with images of each character, and worked on one font at a time. Advanced systems capable of producing a high degree of recognition accuracy for most fonts are now common, and with support for a variety of digital image file format inputs.
OCR usages divide into 2 main categories. First, digitization is used for storage and future utilization purposes: indexing documents for search purposes, building datasets to feed Artificial Intelligence algorithms. The second category aims at replacing some tasks of a complete process with an OCR engine in order to improve productivity. Mail sorting is an illustration of such use. A mail sorting center can dispatch thousands of packages on a daily basis. From mail deposit to delivery, it will go through several routing steps, all based on the address it comes with. Automating the recognization process of addresses would improve both speed and quality in mail delivery. That is where OCR comes into action, enabling interpretation of written addresses.
Articles associés écrits par les Data Scientists Sicara (En Anglais)
GAN with Keras: Application to Image Deblurring
A Generative Adversarial Networks tutorial applied to Image Deblurring with the Keras library.
Keras Tutorial: Content Based Image Retrieval Using a Denoising Autoencoder
How to find similar images thanks to Convolutional Denoising Autoencoder.
Set up TensorFlow with Docker + GPU in Minutes
Why Docker is the best platform to use Tensorflow with a GPU.