In the last few decades, Arabic handwritten alphanumeric character recognition has become one of the challenging areas of research in the field of document image processing. While the recognition of handwritten Latin has been extensively investigated using various techniques, little work has been done on handwritten Arabic recognition, and none of the existing techniques are accurate enough for practical application. Recognizing an Arabic character or text is a complicated task due to the unlimited variation in human handwriting, the large variety of Arabic character shapes, the presence of ligature between characters, and overlapping of the components. The main distinction between Arabic and other Roman-based languages is that Arabic words and characters within words are written from right to left, as opposed to English words which are written from left to right. Nevertheless, the digits of an Arabic number are written from left to right [1
The existing approaches to recognizing handwriting can be divided into two groups: the handcrafted approach and the unsupervised/supervised learning approach. For the first group, the most commonly used methods are scale-invariant feature transforms (SIFT) [2
] and Gabor features [5
]. The second group is known as the group of deep learning approaches. This approach has acquired a reputation for solving many computer vision problems, and its application to the field of handwriting recognition has been shown to provide significantly better results than traditional methods [7
]. Reviewing the literature, several methods have been proposed for the recognition of Arabic handwritten digits, characters, and words. For example, Zaiz et al. [8
] proposed a technique that uses a Support Vector Machine (SVM) classifier to recognize Arabic handwritten words in the IFN/ENIT database [9
] based on two passes, horizontal and vertical. Then, a post-processor based on a Puzzle algorithm applied to improve the recognition rate, especially for ambiguous characters. Amrouch et al. [10
] present a work that aims to compare learning features with Convolutional Neural Networks (CNN) and handcrafted features. Experiments have been performed on the benchmark IFN/ENIT database. The authors conclude that the obtained results with the CNN features surpass those achieved using the handcrafted features. Maalej et al. [11
] proposed a system for Online Arabic Handwriting Recognition using Deep Bidirectional Long Short-Term Memory (DBLSTM). The system was tested on the ADAB database [12
]. The result achieved by the proposed system exceeds 99.98%.
Considering the recognition of Arabic handwritten digits, Al-Omari and Al-Jarrah [13
] presented a recognition system for the online handwritten Arabic digits one to nine. The system skeletonizes the digits, and the geometrical features of the digits are extracted. Probabilistic neural networks (PNNs) are used for recognition. The developed system is translation, rotation, and scaling invariant. Abdelazeem [14
] studied the performance of a different set of classifiers for Arabic digit recognition. Different features were used, and different combinations of features and classifiers were analyzed. Gradient features with an SVM gave the best results of 99.48% for the ADBase database. Parvez and Mahmoud [15
] utilized a polygonal approximation of the contour of a character and a classifier based on turning functions for Arabic alphanumeric character recognition. The authors obtained 97.17% accuracy for the ADBase database of Arabic digits.
Many researchers have tried to use a deep learning approach for handwritten Arabic characters. In this vein, a Deep Belief Network (DBN) was used to extract image features for raw data inputs, and proceed with a greedy layer-wise unsupervised learning algorithm for recognizing offline Arabic characters in [7
]. The experimental results achieved a recognition accuracy equal to 96.36% when applied to the Handwritten Arabic Characters Database (HACDB) [16
] with 66 class labels. Elleuch et al. [17
] suggested an offline Arabic handwritten character recognition system using a Convolutional Neural Network (CNN) for extracting features information from the raw images and a Support Vector Machine (SVM) functions as the recognizer. This system was evaluated using the HACDB database, and the result was an accuracy of 94.17%. In [18
], the authors presented two models for the recognition of handwritten Arabic characters. The first is based on deep learning. The second model focuses on handcrafted feature extraction. These two models were tested on the HACDB database; the result for the first was 91.36% and the result for the second was 87.60%. Coefficients of Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) techniques were used with Neural Networks (NN) and the Hidden Markov Model (HMM) for the classification of the shapes of characters by Lawgali in [19
]. The experiments were applied on the HACDB database. The results have demonstrated that the DCT yields the higher recognition rate (equal to 75.31%). In spite of the previous works, using Deep architectures is relatively scarce on Arabic handwriting character recognition comparative to other languages. Deep Neural Networks (DNN) have brought about new breakthrough technology for Latin and Handwritten Chinese Character Recognition (HCCR) with great success. For example, the HCCR-GoogLeNet is designed very deeply yet slim, with a total of 19 layers. Experiments on the ICDAR 2013 offline HCCR competition database show that a single HCCR-GoogLeNet is superior regarding both accuracy and storage performance [20
In 2014, Simonyan and Zissermaanother presented VGGNet [21
]. VGGNet is a very deep architecture that has achieved a high classification accuracy of the massive Imagenet database [22
]. Nevertheless, this network has a high number of parameters compared to GoogLeNet, which makes it computationally more expensive to evaluate and requires a significant amount of memory for optimizing the learning parameters.
In this paper, inspired by the success of VGGNet, we propose Alphanumeric VGG net for Arabic handwritten alphanumeric recognition. Alphanumeric VGG net is straightforward to implement and shows effectiveness in improving classification performance. Moreover, it reduces the overall complexity of VGGNet while keeping the same good performance of the net. We experiment using the Mean Square Error (MSE) function and cross-entropy error (CEE) function. To prevent Alphanumeric VGG net from overfitting, two regularization methods are adopted; namely, dropout and augmentation. We provide an in-depth performance evaluation of these two critical factors. Because of the lack of one alphanumeric benchmark database, we conducted our experiments on two different databases: the ADBase database (a database of Arabic handwritten digits from 0 to 9) and the HACDB database (a database of handwritten Arabic characters). After several experiments and parameters adjustments, we achieved two state-of-the-art recognition accuracies equal to 99.57% for the ADBase database and 97.32% for the HACDB database.
The remainder of the paper consists of the following. Section 2
introduces Arabic handwriting characteristics and challenges. Section 3
presents the standard VGGNet and our proposed Alphanumeric VGG
net architectures. Regularization methods are also addressed in this section. We demonstrate the results and analyze the architecture’s performance in Section 4
. Finally, conclusions are drawn in Section 5
4. Experimental Results and Performance Analysis
We utilize two databases for our experimentations: a database of Arabic handwritten numerals, and a database of Arabic handwritten characters. With this set of experiments, we experiment on the effects of error criteria, dropout, and data augmentation on the overall classifier performance.
4.1. Training Method
In this study, for each database, we perform an experiment with two main parts: the first part uses the CEE function, and the second part uses the MSE function. For CEE, we use the Adam optimizer to minimize the categorical cross entropy. Adam is a first-order gradient-based algorithm developed for the optimization of stochastic objective functions with adaptive weight updates based on lower-order moments. The Adam optimizer has four parameters: the learning rate, the exponential decay rates (beta_1) for the moving averages of the gradient, the squared gradient (beta_2), and the smoothing term (epsilon). After related experiments, we left the parameters to their default values, with the learning rate equal to 0.001, the decay rates equal to 0.9, the squared gradient equal to 0.999, and the smoothing term equal to 1 × 10−8. For MSE, we use the RMSprop optimizer to minimize the Minimum Square Error, as it was found through experiment that the Adam optimizer was hard to train with the MSE function. RMSprop is a gradient-based optimization technique that normally has three parameters: the learning rate, the decay rate (alpha), and the smoothing term (epsilon). We left the parameters to their default values, with the learning rate equal to 0.001, the decay rates equal to 0.9, and the smoothing term equal to 1 × 10−8.
The dropout regularization for the first two fully connected layers was set to a dropout ratio of 0.5. The type of nonlinearity used was Rectified Linear Unit (ReLU). Alphanumeric VGG net was trained for 100 epochs. The training procedure took 2 hours on a desktop personal computer (PC) with an Intel i7 3770 processor, an NVidia GTX780 graphics card, and 16 gigabytes of onboard random access memory (RAM). VGGNet was trained on four NVidia Titan Black graphical processing units (GPUs) for two to three weeks.
The first two fully connected layers in standard VGGNet are regularized by a 0.5 dropout ratio. Dropout means setting to zero the output of each hidden neuron with probability of 0.5. If the neurons in the CNN are dropped out, they do not contribute to the forward pass and do not participate in backpropagation. During testing, we use all the neurons but multiply their outputs by 0.5. However, Alphanumeric VGG
net suffered from overfitting even when it dropped out values for the first two fully connected layers. To prevent Alphanumeric VGG
net from overfitting, the training was regularized by dropout regularization for the first two fully connected layers and the two max-pooling layers. The dropout ratio was set to 0.5. The Alphanumeric VGG
net with dropout regularization is depicted in Figure 5
Data augmentation: For the HACDB database that has 6600 images for 66 classes, if we do not adopt data augmentation, our network will suffer from overfitting. By doing augmentation, for each original character image in the database, we can generate ten samples for each image; therefore, the number of possible training samples would be 66,000. The augmentation improves the results significantly.
4.3. ADBase Database
The ADBase contains 70,000 digits written by 700 writers. Each writer wrote each digit (from “0” to “9”) ten times. The database is partitioned into two sets: a training set (which contains 60,000 digits: 6000 images for each class) and a test set (which contains 10,000 digits: 1000 images for each class). Sample images of digits (0–9) from the ADBase database are shown in Table 3
. The ADBase database is available on the website [31
]. We want to mention here that no data augmentation applied to this database, as it includes enough samples to achieve a high result.
4.3.1. The Impact of CEE Function
In this subsection, we will compare the results obtained without adopting dropout and after adopting dropout regularization. The experiments were conducted using the CEE function. The experimental results are presented in Table 4
. Without dropout, we achieved a classification accuracy equal to 98.75% on the validation set that does not hold on the test set. After adopting dropout, we achieved a classification accuracy equal to 99.57%. Figure 6
a shows the improvement in the performance of Alphanumeric VGG
net when using the dropout regularization method (the green and the orange lines).
4.3.2. The Impact of MSE Function
From Table 4
, without dropout, we achieved a classification accuracy equal to 98.83% on the validation set that does not hold on the test set. With dropout, we achieved classification accuracy equal to 99.66%. The experiments were conducted using the MSE function. The CEE function was slightly better than MSE when we trained the model without dropout. However, using dropout, the MSE function showed better results. Figure 6
b shows the improvement in the performance of Alphanumeric VGG
net when using dropout regularization method (the green and the orange lines).
4.3.3. Comparison with the State-of-the-Art
Unlike Latin, the task of Arabic handwritten digits recognition suffers from the lack of a benchmarking database. To the best of our knowledge, this is the first work to incorporate a deep learning approach for recognizing ADBase database digits. Table 5
compares our best results with two previous works on the ADBase database. Particularly, our best performance is noticeably higher than the result achieved using a Fuzzy Turning Function [8
] and the result achieved using an SVM [7
]. We achieved state-of-the-art results for the ADBase database.
4.4. HACDB Database
The HACDB database [16
] (shown in Table 6
) contains the 52 basic shapes of characters, 6 different styles for only certain characters, and 8 shapes of overlapping characters for a total of 66 shapes of Arabic characters written by 50 people. Each person wrote each character twice. The number of character shapes collected totaled 6600 shapes of unconstrained handwritten Arabic characters.
4.4.1. The Impact of MSE Function
The experiment results of using MSE function are presented in Table 7
. When no dropout and no data augmentation is adopted, Alphanumeric VGG
net had a classification accuracy of 90.61% on the validation set that does not hold on the test set. After adopting dropout, the classification accuracy improved slightly to 92.42%. We repeated the previous experiment, this time augmenting the original data 10-fold with each image. We see that data augmentation increases the validation set accuracy dramatically, i.e., from 90.61% to 95.95%. Figure 7
a shows the improvement in the performance (the blue line) of Alphanumeric VGG
net. However, we obtained the first state-of-the-art result, which is equal to 96.58%, when adopting the dropout and data augmentation methods together. Figure 7
a shows the improvement in the performance (the red line) for Alphanumeric VGG
using the MSE function.
4.4.2. The Impact of CEE Function
In this set of experiments, the CEE function outperformed the MSE function in all cases. The results are tableted in Table 7
, and the details are as follows:
Without dropout and data augmentation, Alphanumeric VGG
net had a classification accuracy of 91.97% on the validation set that does not hold on the test set. After applying dropout, the classification accuracy improved slightly to 93.48%. Figure 7
shows the improvement in the performance of Alphanumeric VGG
net when using the dropout regularization method (the green and the orange lines).
We repeated the previous experiment, this time with augmented data. We observe an improvement in classification accuracy from 91.97% to 96.42%. However, we got the second state-of-the-art result, which is equal to 97.32%, when we adopted data augmentation and dropout together. Figure 7
b shows the overall improvement in the performance (the red line) of Alphanumeric VGG
net when using the dropout and data augmentation methods together.
4.4.3. Comparison with the State-of-the-Art
The existing Arabic handwritten character recognition methods can be categorized into two groups: handcrafted-based methods and deep learning-based methods. In this subsection, to evaluate the effect of our proposed deep model, we compared the performance of the model with those of the deep learning-based methods. Table 8
compares our two state-of-the-art results with the previous best works on the HACDB database. According to these results, our Alphanumeric VGG
net outperforms other deep learning methods, such as Deep Believe Network (DBN), and obtains the best-published results on the HACDB database to date. Very recently, Elleuch et al. [7
] obtained a good result on the HACDB database. However, this result is obtained by using ensemble methods, such as Convolutional Neural Network (CNN) and Support Vector Machine (SVM). In contrast, our model is straightforward and generic to apply, so it may also work well with the handwritten characters of other languages, such as Latin and Chinese.
In this paper, we proposed Alphanumeric VGG
net for the Arabic handwritten alphanumeric (character/digit) recognition task. Alphanumeric VGG
net is an optimized version of the very popular VGGNet. We show incremental improvements of the alphanumeric recognition comparable to approaches that use Deep Belief Network (DBN) or Recurrent Neural Network (RNN). Alphanumeric VGG
network improved the classification accuracy and reduced the overall complexity of VGGNet by a factor of 8. With different network parameters, dropout, and augmentation we improved the overall performance as follows:
Regarding the ADBase database: without dropout, we achieved classification accuracies equal to 98.83% using the MSE function and 98.75% using the CEE function on the validation set that does not hold on the test set. With dropout, we achieved classification accuracies equal to 99.57% and 99.66 using CEE and MSE, respectively. The MSE function achieved better results over the CEE function.
Regarding the HACDB database: without dropout and augmentation, we achieved 90.61% and 91.97% classification accuracy using MSE and CEE, respectively, on the validation set that does not hold on the test set. With dropout, the classification accuracy improved slightly to 92.42% and 93.48% using MSE and CEE, respectively. We repeated the previous experiment, this time augmenting the original data 10-fold with each image. We see that data augmentation increases the validation set accuracy dramatically, i.e., from 90.61% to 95.95% using the MSE function and from 91.97% to 96.42% using CEE. However, we got the two state-of-the-art results when we applied data augmentation and dropout together. The two state-of-the-art results are 96.58% using the MSE function and 97.32% using the CEE function.
Our best results might not be statistically significant compared to the previous state-of-the-art; however, our model is quite simple and generic to apply, so it may also work well with the isolated handwritten characters of other languages. As for future work, we plan to discover the performance of other deep networks like AlexNet, GoogLeNet, and ResNet on the two databases. Moreover, we will experiment with the applicability of VGGNet on the IFN/ENIT Arabic handwritten words database.