You are currently viewing a new version of our website. To view the old version click .
Information
  • Editor’s Choice
  • Article
  • Open Access

26 May 2023

Artificially Intelligent Readers: An Adaptive Framework for Original Handwritten Numerical Digits Recognition with OCR Methods

,
,
,
,
and
1
Edward J. Bloustein School, Rutgers University, Piscataway, NJ 08854, USA
2
Department of Mathematics and Computer Science, University of Cagliari, 09124 Cagliari, Italy
3
School of Computer Science and Mathematics, Liverpool John Moores University, Liverpool L3 2AF, UK
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Advances in Machine Learning and Intelligent Information Systems

Abstract

Advanced artificial intelligence (AI) techniques have led to significant developments in optical character recognition (OCR) technologies. OCR applications, using AI techniques for transforming images of typed text, handwritten text, or other forms of text into machine-encoded text, provide a fair degree of accuracy for general text. However, even after decades of intensive research, creating OCR with human-like abilities has remained evasive. One of the challenges has been that OCR models trained on general text do not perform well on localized or personalized handwritten text due to differences in the writing style of alphabets and digits. This study aims to discuss the steps needed to create an adaptive framework for OCR models, with the intent of exploring a reasonable method to customize an OCR solution for a unique dataset of English language numerical digits were developed for this study. We develop a digit recognizer by training our model on the MNIST dataset with a convolutional neural network and contrast it with multiple models trained on combinations of the MNIST and custom digits. Using our methods, we observed results comparable with the baseline and provided recommendations for improving OCR accuracy for localized or personalized handwritten text. This study also provides an alternative perspective to generating data using conventional methods, which can serve as a gold standard for custom data augmentation to help address the challenges of scarce data and data imbalance.

1. Introduction

We are seeing a rapid increase in the breadth and depth of adaptive artificial intelligence (AI) solutions to improve performance in areas such as computer vision for character recognition, which is receiving significant industry attention [,]. Optical character recognition (OCR) is an AI method that mimics the human-intelligence capability of visual recognition for the computational identification of machine and handwritten text and digits from a broad range of images containing text [,]. This includes identification of text or digits from images of typed text, handwritten text, or printed text into machine-encoded text, either from a scanned document as a pdf file, a picture of a piece of paper in png or jpeg format, or a scene photo with text in it, such as text on a coffee cup, text on the cover page of a book, or the license number on number plates. The number of researchers, academic centers and labs, and companies researching and developing computer vision and OCR solutions has risen significantly over the past several years []. Hence, an array of OCR solutions and tools are available widely. We discuss a few OCR applications in the context of our research below.
OCR is increasingly used for automation and enhancement of workflow and business processes. OCR is used by organizations to manage handwritten and printed text and document types identification and to parse data according to complex rules. OCR also has social and societal functional capabilities; for example, OCR can be used to provide an assistive solution for those with vision impairment by converting text images into text-to-speech []. One of the significant challenges faced by hospitals is the loss of patient medical information. OCR technologies can support electronic medical records and health information capture. Additionally, OCR can be used to automate information extraction from test results for input into hospital information management systems (HIMS), supporting the effective representation of patients’ medical histories. OCR tools such as Scayne and Expensify are used in finance, accounting and business process management for functions such as receipt recognition []. OCR is also used in loan processing, security and fraud detection, government sector services such as scanning legacy identification documents and automated license plate detection, and for supporting cultural and historical heritage initiatives, while fairly robust solutions are available for typed or printed machine-readable texts, automated handwriting recognition systems are yet to provide high levels of accuracy due to the challenges of variations in handwriting styles and composition of visual structure. Resolving these challenges remains an open problem, and improving handwriting OCR can lead to very valuable insights for decision support through the inclusion of handwritten data in the evaluation of public sentiment and other textual analytics solutions [,].
It is, thus, evident that OCR is an important technology and AI capability. It is, therefore, important to improve the accuracy of OCR tools with handwritten text, especially for specific handwriting styles, such as the handwriting of a particular doctor or a handwriting style from a particular region. Our work contributes to the body of knowledge of OCR methods by introducing the idea of fine-tuning MNIST with training data from a custom dataset belonging to a particular style to improve the accuracy for that specific style of handwritten digits. The model, method, and process used for OCR with such custom personalized or localized data can be extended to any handwritten style. Our work can also provide an alternative mechanism for generating custom data and conversion to MNIST format using conventional methods and can serve as a standardized mechanism for data augmentation to address the challenges associated with scarce data and data imbalance. The main contributions of this paper are as follows:
  • We create a new handwritten custom dataset from a single contributor to ensure overall consistency of style.
  • We train and apply a convolutional neural network to MNIST data for validation and then extend the method with custom data.
  • Furthermore, we provide insights into the effects of training the model with variations in the proportions of custom data with and without rotations
  • We provide a framework for adaptive OCR applicable to handwritten documents from a single source or belonging to a particular style of writing.
  • Finally, we provide a comprehensive discussion of how the adaptive framework could be implemented and potential improvements with alternate technological solutions.
The rest of the paper is organized as follows. Section 2 presents the literature background of notable OCR methods. Section 3 mentions the problem statements and provides details of the dataset, and preprocessing strategies used for the experiments. Section 4 presents the stages involved in OCR and the architecture of the employed deep learning (DL) approaches. Section 5 sums up the total experiments performed and presents the experimental outcomes. Section 6 presents the analyses of the obtained results and the limitations of the employed methods. Finally, Section 7 provides the concluding remarks and the future research direction.

3. Problem Formulation, Datasets, and Preprocessing

This section explains the problem statements tackled, the experimental dataset used, the process involved in generating Custom dataset, and preprocessing strategies implemented to prepare the input data for our experiments.

3.1. Problem Formulation

In this work, we explored the idea of developing an adaptive OCR framework and provide reports from our OCR experiments in working with custom data of a unique but consistent style from a single source in addition to baseline MNIST data. Thus, we contribute to addressing the challenge of improving the accuracy of OCR mechanisms with adaptive methods achieved through localized training of models with the inclusion of custom data in the training process.

3.2. Datasets

For our work, we have used two datasets, namely the Modified National Institute of Standards and Technology database (MNIST) and our own generated dataset. For ease of understanding, we termed our dataset as the Custom dataset.
  • MNIST: The MNIST is a benchmarking [] of 60,000 train handwritten digits and a test set of 10,000 (10 class labels) with each example represented as an image of 28 × 28 gray-scale pixels. Images are normalized to fit in a 20 × 20 pixel box while maintaining their aspect ratio. Images are centered in a 28 × 28 image. One column for each of the pixels because there are 28 × 28 pixels, there are 784 columns, and +1 for the labels. Pixel values are between 0 and 255, where 0 means black and 255 means white.
  • Custom: Custom dataset consists of 240 training digits, 40 testing digits, and 20 validation digits, ensuring that the model is trained on a diverse set of data and can generalize well to new data points.
    The Custom dataset is also made available on our Githubrepository [].

3.3. Custom Dataset Collection

For data collection, we have to create images of the digits. The following steps were taken to accomplish this. Since writing digits on paper and taking a photograph of them can be a challenging and time-consuming process, Microsoft Word version 10 was utilized to make images of the digits. The “draw with trackpad” option in the drawing tab of Microsoft Word was used for this purpose. Subsequently, screenshots of each digit were taken and grouped into batches of 10. Each batch contained digits from 0 to 9, with 10 digits per batch, and the name of the image was the gold label of what the digit represented.
  • Preparing the Custom Dataset: Our Custom dataset consists of 300 handwritten digits collected using the method described in the data collection section. Two users wrote these digits, and we divided the dataset into three parts, each containing 100 digits. We then performed a stratified split on each of the three datasets to create training and testing sets. Specifically, we divided each set into 80 training images and 20 testing images, ensuring each category had an equal number of labels. This stratified split helped to prevent any bias in the model. Next, we concatenated the training images, creating a set of 240 training digits, and concatenated the testing images, creating a set of 60 digits. We then performed a stratified fold on the testing images, splitting them into sets of 40 and 20. The 20 digits were used as a validation dataset.

3.4. Preprocessing the Custom Dataset

To ensure consistency with the MNIST dataset, which has images of a fixed size of 28 × 28 pixels, we need to standardize our images to the same size. When digitizing a handwritten number, we followed several steps to ensure the accuracy and clarity of the resulting image. Firstly, the handwritten number image is loaded, and then, it is converted to a binary image format using the “L” mode, which only allows each pixel to be black or white. A white canvas of 28 × 28 is created the standard size for all MNIST images. The width or height of the image is then determined, and the image is resized such that the greatest dimension is 20 pixels, while the smaller dimension is scaled proportionally. To improve the image quality, the antialias function of pillow Library is applied, which smoothens jagged edges by averaging the colors of the pixels at the boundary and sharpening it. Subsequently, the resized image is pasted onto the 28 × 28 pixel white canvas, with 4 pixels from the top or side of the largest dimension used to center the picture. The smallest dimension is placed halfway between the original size of 28 and the scaled picture to center the image. After obtaining the pixel values of the new image, the values are adjusted to a range of 0 to 1, where 1 represents black and 0 represents white, using a formula to invert and normalize the values. As a result of these steps, the resulting image of the handwritten number is clear and accurate, making it easier to analyze and process. The original image (Figure 1b) was transformed into the MNIST-style image shown in Figure 1c using the process mentioned above. Figure 1a and our transformed digit Figure 1c are quite similar in appearance. However, our transformation process resulted in a sharper image compared to the original figure.
Figure 1. Preprocessing the Custom dataset. (a) MNIST image from the MNIST Dataset. (b) Original custom image. (c) Transformed into MNIST format.

4. Materials and Methods

This section mentions the computational resource used to conduct the experiments, the phases involved in performing OCR, and the architecture of the employed CNN-based DL model. The computational resource used to perform the experiments is mentioned in Table 2.
Table 2. Hardware resource specifications.

4.1. Major Phases of Performing an OCR

The OCR workflow combines multiple phases, as described by [].
  • Source data input—this phase involves using an external optical device, such as a scanner or camera, to generate images that contain relevant alphanumeric text or symbols.
  • Data preparation—covers various preprocessing operations, such as noise removal, to enhance source image quality and alignment of the image with the required standardized input format.
  • Partitioning—multiple characters in the image are split into individual items in this phase so that a recognition engine can process the partitioned characters appropriately.
  • Feature extraction—the partitioned characters are further “processed to extract different features.” The characters are then ‘recognized’ based on their alignment with these features.
  • Classification—based on partitioning and feature extraction, the target image’s features are mapped to several categories and classified to appropriate values.
  • Options—characters within source files can be classified using multiple techniques for pattern recognition using fundamental and statistical approaches.
  • Validation and improvement—upon the completion of initial classification exercises, it may be observed that the results are rarely perfect. This is especially true for handwritten content with multiple authors, multifaceted fonts, and intricate languages. Quantitative, conceptual, linguistic, and probabilistic remedial approaches and fine-tuning can be performed to improve the accuracy of OCR systems.

4.2. Architecture of CNN-Based DL Model

CNNs have become popular for image classification tasks due to their exceptional ability to extract features from images []. Our study utilized the MNIST dataset, which comprises 70,000 handwritten digits from 0 to 9 represented in a 28-by-28-pixel grayscale format. To further enhance the dataset, we added our data to the original MNIST dataset, resulting in an augmented dataset used to train the CNN. The trained model was then evaluated on our custom test data to assess its accuracy and generalizability. The process flowchart with Custom data is shown in Figure 2.
Figure 2. Process flowchart with Custom data.
For our experiments, we built a CNN Architecture, as shown in Figure 3, with the following parameters:
Figure 3. Model flowchart.
  • In our model, we stack multiple Conv2D layers to extract increasingly complex features from an input image.
  • Each Conv2D layer has parameters such as the number of filters, kernel size, padding, and activation function, which are tuned to optimize the performance of the network. The Conv2D layers are used to build feature maps from the data.
  • The kernel size of the filters is usually (3 × 3), and we use padding to ensure that the filters fit the input image. We also use the ReLU activation function to introduce nonlinearity in our model. Max-Pooling is used to reduce dimensionality. In Max-Pooling, the output value is just the maximum input value in each patch (for example, the maximum pixel in a span of three pixels).
  • The next step is to flatten out the output from the last pooling layer because the input of the fully-connected layer is not a 2D vector but a 1D vector. To flatten means to convert a 2D matrix into a 1D matrix.
  • A fully-connected hidden layer is added to perform classification. The fully-connected layers combine the features extracted from the convolutional layers to create a model. Between fully-connected layers, dropout layers are added to remove specific neurons’ contributions to the following layer while leaving the rest intact. Dropout layers are applied to reduce overfitting.
  • Finally, an activation function as softmax is used to classify the outputs as digits as 0, 1, 2 9.
The model information is presented in Table 3.
Table 3. Model summary.

5. Experiments and Results

In this section, we describe the details of our experiments, the process, and the outcomes. For ease of understanding, we have grouped each experiment and its result. We have used the Accuracy and F-1 score [] metrics below to measure the performance of our employed classification models given by the following equations:
Accuracy = TP + TN TP + TN + FP + FN
F 1 = 2 × Precision × Recall Precision + Recall
where TP, FP, TN, and FN are True positive, False positive, True Negative, and False Negative, respectively.

5.1. Experiment 1

In this experiment, we trained the CNN Model on 60,000 MNIST images and tested the model on 10,000 MNIST test data points. The output from this experiment is not shown in Table 4 because it was a standard experiment with MNIST data, and we only used it to validate our base process.
Table 4. Summary of experimental results with 40 Custom test digits.
Results: Overall, the model performed well and generalized well to different data points, with very few mispredicted labels, as shown in Figure 4a. Furthermore, Figure 4b depicts the Loss vs. Epoch and Loss vs. Accuracy graph, demonstrating that the validation accuracy and validation loss were consistent with the training accuracy and training loss. The small gap between training and validation accuracy indicates that the model does not overfit and can generalize well. This model will serve as our baseline for comparison to other models. The model achieved an accuracy of 97.90% and a validation accuracy of 99.18%.
Figure 4. Experiment 1 outcome.

5.2. Experiment 2

In this experiment, the model was trained on a dataset of 60,000 images from the MNIST dataset and was then tested on our Custom test dataset consisting of 40 images. The MNIST test set, which consists of 10,000 images, was used as the validation dataset for this model.
Results: To evaluate the model’s performance on our Custom dataset, we used the same model as before and tested it on 40 images. The results showed that five of the 40 digits were misclassified, as seen in Figure 5. Notably, the digit “2” had the lowest accuracy of classification.
Figure 5. Confusion matrix of experiment 2.

5.3. Experiment 3

The model was trained on a combined dataset comprising 60,000 images from the MNIST dataset and 240 images from our Custom dataset. To prevent overfitting and ensure that the model generalizes well, we used the MNIST test set, which contains 10,000 images, along with 20 digits from our Custom validation dataset, as the validation dataset for this model. After the training phase, we evaluated the model’s performance on our Custom test dataset of 40 images.
Results: The evaluation of the model’s performance on our Custom test dataset revealed that two of the 40 digits were misclassified, as shown in Figure 6a. Notably, the digits “4” and “5” were the ones that were misclassified. The graph shown in Figure 6b indicates that there is only a tiny gap between training and validation accuracy and loss, indicating that the model has maintained its generalizability and can fine-tune itself over our Custom dataset, achieving a low number of misclassification. The model achieved an accuracy of 97.07% on the combined dataset and a validation accuracy of 98.91%. These results suggest that the model performs well on both the MNIST and our Custom datasets, indicating that it has learned relevant common features across different datasets. These findings demonstrate the effectiveness of using a combined dataset and diverse validation data to improve the model’s performance on new data points.
Figure 6. Experiment 3 outcome.

5.4. Experiment 4

This model was trained on a combined dataset of 60,000 images from the MNIST dataset, 240 images from our Custom dataset, and 720 rotated digits. The rotated digits were generated by applying a random rotation angle between −15 degrees to 15 degrees to our 240 digits. For validation, we used the MNIST test set, which contains 10,000 images, and added 20 digits from our Custom validation dataset. Following the training phase, we evaluated the model’s performance on our Custom test dataset of 40 images.
Results: The evaluation of the model’s performance on our Custom test dataset revealed that two of the 40 digits were misclassified, as shown in Figure 7a. As seen in the graph presented in Figure 7b, there is a slightly increased gap between the training and validation accuracy, which was expected due to the addition of our large Custom dataset. Although the model’s generalizability decreased slightly, it still performed well on the Custom test dataset. The model achieved an accuracy of 96.30% on the combined dataset and a validation accuracy of 98.83%.
Figure 7. Experiment 4 outcome.

5.5. Experiment 5

This model was trained on a combined dataset of 60,000 images from the MNIST dataset, 240 images from our Custom dataset, and 9600 rotated digits. The rotated digits were generated by applying a random rotation angle between −15 degrees to 15 degrees to our 240 digits. For validation, we used the MNIST test set, which contains 10,000 images, and added 20 digits from our Custom validation dataset. Following the training phase, we evaluated the model’s performance on our Custom test dataset of 40 images.
Results: The evaluation of the model’s performance on our Custom test dataset revealed that two of the 40 digits were misclassified, as shown in Figure 8a. As seen in the graph presented in Figure 8b, there is a drastic increased gap between the training and validation accuracy, which was expected due to the addition of our larger Custom dataset. Although the model’s generalizability decreased slightly, it still performed well on the Custom test dataset. The model achieved an accuracy of 85.30% on the combined dataset and a validation accuracy of 98.96%.
Figure 8. Experiment 5 outcome.

5.6. Experiment 6

The model used in this experiment is based on the transformer architecture and was first introduced in the paper by Li et al. (2021) []. It was trained on the IAM dataset [], and the specific model used in this study is a pre-trained model fine-tuned on the MNIST dataset accessible via []. On the MNIST dataset, the model achieved an accuracy of 99.52%. However, when tested on our dataset, it misclassified two digits, as seen in Figure 9.
Figure 9. Confusion matrix of experiment 6.

5.7. Experimental Summary

We have provided the consolidated experimental outcomes of experiments 2 to 6 in Table 4.

6. Discussion and Limitations

Computer vision and digital image processing are crucial in multimedia, artificial intelligence, and robotics. Image analysis includes segmentation, feature extraction, and classification techniques. Human–computer interaction can make things easier for users, and optimal results with less computation time and multilingual character segmentation and recognition are possible. A segmentation-free approach using Deep Neural Network (DNN) is also possible in OCR, and this work may bridge the knowledge gap in automatic interaction between human–system and system–system interactions [].
We use a CNN to recognize an image containing a single character. Text of arbitrary length is a sequence of characters, and such problems are solved using Recurrent Neural Networks, and LSTM [] is a popular form of RNN []. Modernization of the Tesseract tool involved code cleaning and adding a new LSTM model. The input image is processed in boxes, line by line, and fed into an LSTM model. Even after a lot of training, Tesseract performs better, but it still needs to be improved to work on handwritten text and weird fonts. Additionally, rotated or skewed text may cause the Tesseract to malfunction.
One major limitation we faced during the model training was the scarcity of data, as we only produced about 300 digits. This limited our ability to make significant improvements to the model, which was already trained on a much larger dataset of 60,000 digits. To address this limitation and make further progress, we would need to increase our dataset size to at least 10,000 digits.
Additionally, although we used a CNN model for our digit recognition task, we acknowledge that transformer models such as TrOCR have demonstrated comparable accuracy, as seen in the above table. With the potential to capture more global representations of images compared to CNN models, fine-tuning a TrOCR-based model could yield even better results. Furthermore, while our smart-OCR research concept is demonstrated using digits, the final production-level application will be expected to extend the concept to handwritten textual artifacts, full texts, and documents corpora, including historical texts [,].

7. Conclusions and Future Work

In this paper, we demonstrated the viability of a simple schema to develop an OCR mechanism for creating an adaptive framework for custom digit recognition models, and its logical implication of flexibility of OCR models to specific writing styles. Once developed further with state-of-the-art neural networks, such as transformers for computer vision, this approach can be applied to a variety of industry-level use cases. This could include solutions in healthcare where individual medical practitioners have different writing styles, and in fraud detection to match or distinguish handwriting styles with greater accuracy. Given the tremendous potential for adaptive OCR applications, it is advisable to move adaptive OCR research to the forefront. We are hopeful that such adaptive OCR solutions would be an important part of the rapidly advancing artificial intelligence ecosystems worldwide. Currently, most NLP research and practice use machine-readable typed data and associated textual data distributions []. It would be very useful to develop OCR solutions for handwritten documents to create a seamless integration with NLP solutions, such as sentiment analysis and NLP-based socioeconomic modeling [,,]. OCR is a mature discipline with industry-level solutions for identifying and ‘reading’ images of machine-printed text. However, due to the high degree of variations, OCR for handwriting recognition needs additional work. Based on the early-stage success of TrOCR, we believe there is significant potential for improving OCR solutions for handwritten text with transformer-based applications. We intend to explore the potential of fine-tuning and limited shot learning with pretrained transformer models to cater to user-specific digit recognition needs. Furthermore, the additional use of AI methods and tools to mimic human intelligence’s capability to identify text in varying colors, mixed sizes, and styles and other complex forms holds great promise. As an applied direction for future research, it is possible to use OCR methods to generate data for domains such as heritage culture and preservation from images of historical texts. One of the most important research areas to build upon would be the capability of an OCR application to be ‘flexible’ with custom handwriting styles. Incorporating the on-demand flexibility of OCR models would be a powerful way to advance the effectiveness of OCR with Custom data and variations in font styles. We also aim to incorporate world knowledge in the form of triples to address the domain adaptation challenges in identifying subtle sentiments [].

Author Contributions

Conceptualization, P.H.J., V.K., J.S., S.S., A.M. and R.A.; methodology, P.H.J., V.K., J.S., S.S., A.M. and R.A.; software, P.H.J. and J.S.; formal analysis, P.H.J., V.K. and J.S.; investigation, P.H.J., V.K. and J.S.; resources, P.H.J., J.S. and A.M.; data curation, P.H.J., J.S. and A.M.; writing—original draft preparation, P.H.J., V.K. and J.S.; writing—review and editing, P.H.J., V.K., J.S., S.S., A.M. and R.A.; visualization, P.H.J., V.K. and J.S.; supervision, V.K. and J.S.; project administration, J.S.; funding acquisition, J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the NJ State Policy Lab (https://policylab.rutgers.edu/ (accessed on 11 April 2023)) and the Public Informatics program at Bloustein School, Rutgers University (https://bloustein.rutgers.edu/graduate/public-informatics/mpi/ (accessed on 11 April 2023)).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The MNIST dataset is available at (https://archive-beta.ics.uci.edu/dataset/683/mnist+database+of+handwritten+digits (accessed on 11 April 2023)) and our Custom dataset is available at the Github repository (https://github.com/ay7n/OCR-RUCILDigits-4 (accessed on 11 April 2023)).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
CNNConvolution Neural Network
DLDeep Learning
MLMachine Learning
OCROptical Character Recognition
LSTMLong Short-Term Memory
NLPNatural Language processing
RNNRecurrent Neural Network
NERNamed-Entity Recognition
ReLURectified Linear Unit
LSTM-RLong Short-Term Memory plus Relation-aware pointer network
HMMHidden Markov Models
DNNDeep Neural Network

References

  1. Samuel, J.; Kashyap, R.; Samuel, Y.; Pelaez, A. Adaptive cognitive fit: Artificial intelligence augmented management of information facets and representations. Int. J. Inf. Manag. 2022, 65, 102505. [Google Scholar] [CrossRef]
  2. Thorat, C.; Bhat, A.; Sawant, P.; Bartakke, I.; Shirsath, S. A detailed review on text extraction using optical character recognition. In ICT Analysis and Applications; Springer: Singapore, 2022; pp. 719–728. [Google Scholar]
  3. Singh, S. Optical character recognition techniques: A survey. J. Emerg. Trends Comput. Inf. Sci. 2013, 4, 2009–2015. [Google Scholar]
  4. Samuel, J. A call for proactive policies for informatics and artificial intelligence technologies. Scholars Strategy Network, 19 December 2021. [Google Scholar]
  5. Srivastava, N.; Singh, S. Netra: Smart Hand Gloves Comprises Obstacle Detection, Object Identification & OCR Text to Speech Converter for Blinds. In Proceedings of the 2018 5th IEEE Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), Gorakhpur, India, 2–4 November 2018; pp. 1–4. [Google Scholar]
  6. Januszewski, A.; Kujawski, J.; Buchalska-Sugajska, N. Benefits of and obstacles to RPA implementation in accounting firms. Procedia Comput. Sci. 2021, 192, 4672–4680. [Google Scholar] [CrossRef]
  7. Samuel, J.; Rahman, M.M.; Ali, G.M.N.; Samuel, Y.; Pelaez, A.; Chong, P.H.J.; Yakubov, M. Feeling Positive About Reopening? New Normal Scenarios From COVID-19 US Reopen Sentiment Analytics. IEEE Access 2020, 8, 142173–142190. [Google Scholar] [CrossRef]
  8. Ali, G.M.N.; Rahman, M.M.; Hossain, M.A.; Rahman, M.S.; Paul, K.C.; Thill, J.C.; Samuel, J. Public perceptions of COVID-19 vaccines: Policy implications from US spatiotemporal sentiment analytics. Healthcare 2021, 9, 1110. [Google Scholar] [CrossRef]
  9. Manwatkar, P.M.; Yadav, S.H. Text recognition from images. In Proceedings of the 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), Coimbatore, India, 19–20 March 2015; pp. 1–6. [Google Scholar]
  10. Srihari, S.N.; Shekhawat, A.; Lam, S.W. Optical character recognition (OCR). In Encyclopedia of Computer Science; Wiley: London, UK, 2003; pp. 1326–1333. [Google Scholar]
  11. Srihari, S.N.; Kuebert, E.J. Integration of hand-written address interpretation technology into the united states postal service remote computer reader system. In Proceedings of the Fourth International Conference on Document Analysis and Recognition, Ulm, Germany, 18–20 August 1997; Volume 2, pp. 892–896. [Google Scholar]
  12. Kornai, A. An experimental HMM-based postal ocr system. In Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, 21–24 April 1997; Volume 4, pp. 3177–3180. [Google Scholar]
  13. Ifhaam, M.; Jayalal, S. Sinhala handwritten postal address recognition for postal sorting. In Proceedings of the 2019 International Research Conference on Smart Computing and Systems Engineering (SCSE), Colombo, Sri Lanka, 28 March 2019; pp. 134–141. [Google Scholar]
  14. Patel, M.; Reddy, S.L. An impact of grid based approach in offline handwritten Kannada word recognition. In Proceedings of the 2014 International Conference on Contemporary Computing and Informatics (IC3I), Mysore, India, 27–29 November 2014; pp. 630–633. [Google Scholar]
  15. Nagy, G. Disruptive developments in document recognition. Pattern Recognit. Lett. 2016, 79, 106–112. [Google Scholar] [CrossRef]
  16. Faizullah, S.; Ayub, M.S.; Hussain, S.; Khan, M.A. A Survey of OCR in Arabic Language: Applications, Techniques, and Challenges. Appl. Sci. 2023, 13, 4584. [Google Scholar] [CrossRef]
  17. Al-Hadhrami, A.A.; Allen, M.; Moffatt, C.; Jones, A.E. National characteristics and variation in Arabic handwriting. Forensic Sci. Int. 2015, 247, 89–96. [Google Scholar] [CrossRef]
  18. Bhagyasree, P.; James, A.; Saravanan, C. A proposed framework for recognition of handwritten cursive english characters using DAG-CNN. In Proceedings of the 2019 1st International Conference on Innovations in Information and Communication Technology (ICIICT), Chennai, India, 25–26 April 2019; pp. 1–4. [Google Scholar]
  19. Bhunia, A.K.; Khan, S.; Cholakkal, H.; Anwer, R.M.; Khan, F.S.; Shah, M. Handwriting transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1086–1094. [Google Scholar]
  20. Van Strien, D.; Beelen, K.; Ardanuy, M.C.; Hosseini, K.; McGillivray, B.; Colavizza, G. Assessing the impact of OCR quality on downstream NLP tasks. In Proceedings of the 12th International Conference on Agents and Artificial Intelligence, Valletta, Malta, 22–24 February 2020. [Google Scholar]
  21. Wang, J.; Tang, J.; Yang, M.; Bai, X.; Luo, J. Improving OCR-based image captioning by incorporating geometrical relationship. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 1306–1315. [Google Scholar]
  22. Nguyen, T.T.H.; Jatowt, A.; Coustaty, M.; Doucet, A. Survey of post-OCR processing approaches. ACM Comput. Surv. (CSUR) 2021, 54, 1–37. [Google Scholar] [CrossRef]
  23. Vijayarani, S.; Sakila, A. Performance comparison of OCR tools. Int. J. UbiComp (IJU) 2015, 6, 19–30. [Google Scholar]
  24. Suissa, O.; Elmalech, A.; Zhitomirsky-Geffet, M. Optimizing the Neural Network Training for OCR Error Correction of Historical Hebrew Texts; iConference 2020 Proceedings; iSchools Inc.: Grandville, MI, USA, 2020; pp. 1–10. [Google Scholar]
  25. Hegghammer, T. OCR with Tesseract, Amazon Textract, and Google Document AI: A benchmarking experiment. J. Comput. Soc. Sci. 2022, 5, 861–882. [Google Scholar] [CrossRef]
  26. Smith, R. An overview of the Tesseract OCR engine. In Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil, 23–26 September 2007; Volume 2, pp. 629–633. [Google Scholar]
  27. Ramiah, S.; Liong, T.Y.; Jayabalan, M. Detecting text based image with optical character recognition for English translation and speech using Android. In Proceedings of the 2015 IEEE Student Conference on Research and Development (SCOReD), Kuala Lumpur, Malaysia, 13–14 December 2015; pp. 272–277. [Google Scholar]
  28. Breuel, T.M. The OCRopus open source OCR system. Proc. Doc. Recognit. Retr. SPIE 2008, 6815, 120–134. [Google Scholar]
  29. Kittinaradorn, R. EasyOCR. 2020. Available online: https://github.com/JaidedAI/EasyOCR/tree/master (accessed on 11 April 2023).
  30. Awalgaonkar, N.; Bartakke, P.; Chaugule, R. Automatic license plate recognition system using ssd. In Proceedings of the 2021 International Symposium of Asian Control Association on Intelligent Robotics and Industrial Automation (IRIA), Goa, India, 20–22 September 2021; pp. 394–399. [Google Scholar]
  31. Troller, M. Practical OCR System Based on State of Art Neural Networks. Bachelor’s Thesis, Czech Technical University in Prague, Dejvice, Czech Republic, 2017. [Google Scholar]
  32. Alrasheed, N.; Prasanna, S.; Rowland, R.; Rao, P.; Grieco, V.; Wasserman, M. Evaluation of Deep Learning Techniques for Content Extraction in Spanish Colonial Notary Records. In Proceedings of the 3rd Workshop on Structuring and Understanding of Multimedia heritAge Contents, Virtual, 20 October 2021; pp. 23–30. [Google Scholar]
  33. Chen, Y.; Yang, J. Research on scene text recognition algorithm basedon improved CRNN. In Proceedings of the 2020 4th International Conference on Digital Signal Processing, Chengdu, China, 19–21 June 2020; pp. 107–111. [Google Scholar]
  34. Shi, B.; Bai, X.; Yao, C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 2298–2304. [Google Scholar] [CrossRef]
  35. Baek, Y.; Lee, B.; Han, D.; Yun, S.; Lee, H. Character region awareness for text detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9365–9374. [Google Scholar]
  36. Fenniak, M. PyPDF2·PyPI. Available online: https://pypi.org/project/PyPDF2/ (accessed on 4 April 2023).
  37. Kekare, A.; Jachak, A.; Gosavi, A.; Hanwate, P. Techniques for Detecting and Extracting Tabular Data from PDFs and Scanned Documents: A Survey. Tabula 2020, 7, 415–417. [Google Scholar]
  38. Mindee. docTR: Document Text Recognition. Available online: https://github.com/mindee/doctr (accessed on 4 April 2023).
  39. Batra, P.; Phalnikar, N.; Kurmi, D.; Tembhurne, J.; Sahare, P.; Diwan, T. OCR-MRD: Performance Analysis of Different Optical Character Recognition Engines for Medical Report Digitization. 2023. Available online: https://www.researchsquare.com/article/rs-2513255/v1 (accessed on 4 April 2023).
  40. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
  41. Li, M.; Lv, T.; Cui, L.; Lu, Y.; Florencio, D.; Zhang, C.; Li, Z.; Wei, F. Trocr: Transformer-based optical character recognition with pre-trained models. arXiv 2021, arXiv::2109.10282. [Google Scholar]
  42. Smith, R.W. The Extraction and Recognition of Text from Multimedia Document Images. Ph.D. Thesis, University of Bristol, Bristol, UK, 1987. [Google Scholar]
  43. Ai, J. EasyOCR. Available online: https://github.com/JaidedAI/EasyOCR (accessed on 4 April 2023).
  44. Le Cun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  45. Lab, R. OCR-RUCILDigits-4. 2022. Available online: https://github.com/ay7n/OCR-RUCILDigits-4 (accessed on 4 April 2023).
  46. Eikvil, L. Optical Character Recognition. 1993. Available online: http://home.nr.no/~eikvil/OCR.pdf (accessed on 4 April 2023).
  47. Agrawal, A.K.; Shrivas, A.; kumar Awasthi, V. A Robust model for handwritten digit recognition using machine and deep learning technique. In Proceedings of the 2021 2nd International Conference for Emerging Technology (INCET), Belagavi, India, 21–23 May 2021; pp. 1–4. [Google Scholar]
  48. Kumar, V.; Recupero, D.R.; Riboni, D.; Helaoui, R. Ensembling Classical Machine Learning and Deep Learning Approaches for Morbidity Identification From Clinical Notes. IEEE Access 2021, 9, 7107–7126. [Google Scholar] [CrossRef]
  49. Cheng, L.; Bing, L.; He, R.; Yu, Q.; Zhang, Y.; Si, L. IAM: A Comprehensive and Large-Scale Dataset for Integrated Argument Mining Tasks. arXiv 2022, arXiv:2203.12257. [Google Scholar]
  50. Aico. TROCR Digit. Available online: https://huggingface.co/spaces/aico/TrOCR-digit (accessed on 4 April 2023).
  51. Karthick, K.; Ravindrakumar, K.; Francis, R.; Ilankannan, S. Steps involved in text recognition and recent research in OCR: A study. Int. J. Recent Technol. Eng. 2019, 8, 2277–3878. [Google Scholar]
  52. Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef] [PubMed]
  53. Williams, G.; Baxter, R.; He, H.; Hawkins, S.; Gu, L. A comparative study of RNN for outlier detection in data mining. In Proceedings of the 2002 IEEE International Conference on Data Mining, Maebashi City, Japan, 9–12 December 2002; pp. 709–712. [Google Scholar]
  54. Mueller-Gastell, J.; Sena, M.; Tan, C.Z. A Multi-Digit OCR System for Historical Records (Computer Vision). Available online: http://cs230.stanford.edu/projects_spring_2020/reports/38792124.pdf (accessed on 4 April 2023).
  55. Goodfellow, I.J.; Bulatov, Y.; Ibarz, J.; Arnoud, S.; Shet, V. Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv 2013, arXiv:1312.6082. [Google Scholar]
  56. Samuel, J.; Palle, R.; Soares, E.C. Textual Data Distributions: Kullback Leibler Textual Distributions Contrasts on GPT-2 Generated Texts, with Supervised, Unsupervised Learning on Vaccine & Market Topics & Sentiment. arXiv 2022, arXiv:2107.02025. [Google Scholar]
  57. Rahman, M.M.; Ali, G.M.N.; Li, X.J.; Samuel, J.; Paul, K.C.; Chong, P.H.; Yakubov, M. Socioeconomic factors analysis for COVID-19 US reopening sentiment with Twitter and census data. Heliyon 2021, 7, e06200. [Google Scholar] [CrossRef]
  58. Samuel, J.; Ali, G.G.M.N.; Rahman, M.M.; Esawi, E.; Samuel, Y. COVID-19 public sentiment insights and machine learning for tweets classification. Information 2020, 11, 314. [Google Scholar] [CrossRef]
  59. Bhandari, A.; Kumar, V.; Thien Huong, P.T.; Thanh, D.N. Sentiment analysis of COVID-19 tweets: Leveraging stacked word embedding representation for identifying distinct classes within a sentiment. In Artificial Intelligence in Data and Big Data Processing, Proceedings of ICABDE 2021, Ho Chi Minh City, Vietnam, 18–19 December 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 341–352. [Google Scholar]
  60. Kumar, V.; Reforgiato Recupero, D.; Helaoui, R.; Riboni, D. K-LM: Knowledge Augmenting in Language Models Within the Scholarly Domain. IEEE Access 2022, 10, 91802–91815. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.