Optical character recognition - Deep Learning Approach To Text Recognition

3 Deep Learning Approach To Text Recognition

3.6 Optical character recognition

OCR is an acronym of optical character recognition which is used for text recognition in multiple formats such as handwritten recognition, digital text recognition from various background. Humans can easily understand the content of an image or documents by looking into it, while machines or computers cannot understand the content of an image or documents in such away. Due to this reason, OCR being in existence. The aims and objectives of OCR tools are to recognize the digital text or handwritten text from an image or documents to automate the computerized system and encode these texts into computer-readable form. Such kind of software is used to recognize and translate the text of various spoken languages into machine-readable form. This OCR process consists of many subprocesses to process the image for getting possible and accurate results in the form of text. Firstly, the image is scanned from the camera and save in one of the image formats including JPEG, PNG, or in pdf format, etc.

Secondly, the image or documents is passed into pre-processed stages where the contrast and brightness of the image are controlled and managed. Thirdly, the localization process starts where the image is divided into different zones and focused on the targeted area where the required text has existed, and it must speed up to start the extraction process. Fourthly, the targeted area which contains the text is broken down into lines, character, and words where the software is applied to compare, recognize, and identified the text through various detection and recognition algorithms to produce final output (Filip, & Anuj, 2021).

Figure 3.6: Architecture of Optical character recognition (Filip, & Anuj, 2021)

Figure 3.6 indicating the OCR process where the input data consist of scanned documents, PDF documents, or simple Images given to the OCR software which has been processing these documents and extracting the text documents to store them into the database.

3.6.1 Uses of optical character recognition

Nowadays OCR has been used widely in different areas for various purposes to automate and digitalized the system for saving human effort and time. Previously, for digitalization purposes and holding record history, the data have been typed manually into the computer. While using the OCR system the data captured and stored in digital form by just scanning the image documents that can extract the data from the scanned documents and convert it into an editable text document, no need for extra manual works. It has a lot of usage in various departments, here are listed some of them where OCR is used for text recognition.

• Airports used OCR technology for passport identification.

• OCR is used for document processing such as degree certificates, driving licenses, identity documents, etc.

• The banking sector used OCR to detect customer information and details from deposit slips, invoices, and other documents.

• Smart Parking management system used OCR to recognize vehicle number plate for classified parking space for different categories of vehicles, for example, ambulances and VIPs, (Joshi, et al., 2015).

• OCR is used in the shopping mall to recognize item prices through barcodes.

3.6.2 Types of OCR Software

Since last decades there are several OCR software that has been used for text reading, identification, and recognition from different ground, especially from an image. Most of them implemented for the recognition of printed text or handwritten text from the scanned image or documents to classify the required text. Some of them are open source and free OCR software such as Tesseract, Calamari, and Kraken, and a few of them are paid services such as google vision API and ABBYY FineReader, etc. All of them have some difference in their results but no one has 100% accuracy because of image resolution. Here is the list of useful OCR tools for printed and hand-written text mining.

• Tesseract OCR.

• OCR opus.

• Calamari.

• Kraken

• Microsoft Azure Computer Vision.

• Google Cloud Vision.

• ABBYY FineReader.

• Amazon textract.

• Swift OCR.

• Attention OCR.

Apart from the above-mentioned list of OCR software four of them are very common and popular including Tesseract OCR, Google Cloud Vision, ABBYY FineReader, and Amazon textract. Here are the comparison results and acceptance ratio of these four OCR software in tabular form. This information gets from the experiment which is based on the implementation of the various images containing text data of printed text as well as hand-written text.

Table 3.9: Acceptance criteria of OCR tools (Fabian, 2020)

From the whole experiment, the main takeaway in words is follows as, if you deal with machine-written and good scanned image then the Tesseract OCR will do a great job. When you deal with hand-written character recognition then google cloud vision is the best choice. If the resolution of the document is bad and you deal with tabular data then ABBYY FineReader is the best option for all of them (Fabian, 2020).

36 3.6.2.1 Tesseract OCR

Tesseract OCR is a very famous and useful text recognition software tool because it's open-source and free for any usage. It has developed by HP at an early age between 1985-1995. Over time a lot of improvements and changes come into it that has increased its popularity. Currently, it's capable to recognize text in various languages including French, English, Arabic, Dutch, and German, etc, and it's managed and maintained by Google. This OCR only works through the command line for image processing (Patel, et al., 2012).

In this experiment, this tool has been implemented to a simple test image that contains text given to the OCR through the command line which has converted the text into outpu_file which is in editable text form.

Figure 3.7: Experimental results of a simple image using Tesseract OCR

Figure 3.7 shows the results of experiments in which the image (test.png) is given to the OCR through the command line which produced the results into an editable text file (output_file) that contain the same text which was already in the image(test.png).

In document Features Extraction of Tax Card by Using OCR Based DeepLearning Techniques (sivua 39-42)