A Fusion‑Based Hybrid‑Feature Approach for Recognition of Unconstrained Offline Handwritten Hindi Characters

: Hindi is the official language of India and used by a large population for several public services like postal, bank, judiciary, and public surveys. Efficient management of these services needs language‑based automation. The proposed model addresses the problem of handwritten Hindi character recognition using a machine learning approach. The pre‑trained DCNN models namely; InceptionV3‑Net, VGG19‑Net, and ResNet50 were used for the extraction of salient fea‑ tures from the characters’ images. A novel approach of fusion is adopted in the proposed work; the DCNN‑based features are fused with the handcrafted features received from Bi‑orthogonal discrete wavelet transform. The feature size was reduced by the Principal Component Analysis method. The hybrid features were examined with popular classifiers namely; Multi‑Layer Perceptron (MLP) and Support Vector Machine (SVM). The recognition cost was reduced by 84.37%. The model achieved significant scores of precision, recall, and F1‑measure—98.78%, 98.67%, and 98.69%—with overall recognition accuracy of 98.73%.


Introduction
The increasing demand for the automation of language-based systems is high due to the associated vast application field. It includes digitalization and preservation of the manuscripts of historic significance, computerized editing of handwritten documents, automatic processing of cheques in the bank, recognition of postal address written on mails, parcels, etc. and their address-wise sorting through computer vision, translation of road safety-instructions written in the local language on roadside boards, computerized recognition of medical-aids as mentioned in handwritten prescription, and many more related applications. The machine-based recognition of handwritten scripts is much more difficult than that of printed ones due to inherent unconditional variation in shape, size, skewness, and degree of connectedness between various characters. Countries like India, China, Saudi Arabia, and the United Arab Emirates are developing automation systems in country-specific languages to serve its advantage to the mass of the people as large populations of these countries have not adopted English as their first language.
Many advancements have been reported for English language-based automation systems due to their global acceptance. Extra attention is needed for systems based on languages like Hindi (Devnagari), Chinese, Urdu, Farsi, etc., as they are in a developing 1.
To introduce the novel approach of handwritten Hindi character recognition with the benefits of features received from pre-trained DCNN models, accompanied by the reduced computational loads of the classifiers.

2.
The majority of previously reported works have been solely based on either handcrafted features or CNN-based features. No work has been reported yet in which both types of features are used in a single feature-vector for the recognition of handwritten Hindi characters. CNN-based features and handcrafted features have their own advantages-the former are auto-generative and the latter are rich in customization. 3.
The model performance for each character-class of Hindi script has also not been covered in detail, such as character-wise correct and incorrect predictions; the amount of false-positive and false-negative predictions out of the total incorrect predictions and development of a confusion-matrix for all the 36 classes of Hindi consonants, etc. 4.
The examination of the effectiveness of individual feature-types and their all-possible combinations is also a novel approach in relation to handwritten Hindi characters.

Contribution
The discussed limitations have been considered as an opportunity in the proposed work and efforts have been made to address them. The main contributions are as follows: 1.
The scheme exploited pre-trained DCNN models, namely Inception-Net, VGG-NET, and Res-Net for feature extraction from the handwritten character-images; due to their excellent performance in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) annual competition [34]. 2.
The model experimented with the fresh approach of feature-fusion, where the features received from pre-trained DCNN models were fused with the features received from handcrafted methods. In the proposed scheme, Bi-orthogonal Discrete Wavelet Transform (BDWT) was the natural choice for handcrafted features because of its properties like flexibility, separability, scalability, and transformability with the power of multi-resolution analysis. A very limited work has been reported on the use of DWT for the recognition of handwritten Hindi characters [27,35]. The effective PCA method was implemented in the proposed work for dimensionality reduction of feature-vectors by preserving most of the important details. This helped in achieving a low computational cost. 3.
The proposed scheme has thoroughly investigated the performance of the model for individual character-class. The number of performance metrics like precision, recall, and F1-measure was evaluated for the test samples of each character class to determine the amount of correct and incorrect recognitions. A confusion matrix was generated for precise character-wise result analysis. 4.
The strength of individual feature-types present in the hybrid-feature-vector and their all-possible combinations were evaluated for recognition accuracy. 5.
The proposed features were examined with two popular classifiers, namely MLP and SVM. This was done to examine the performance of proposed features over ANNbased and kernel-based approaches, respectively, for the given multiclass problem. 6.
Various timings, related to feature-extraction and character-recognition, were estimated in the proposed work.
The rest of the paper is organized as follows: Section 2 contains the preliminary; Section 3 contains the methodology (method); Section 4 contains the results and discussions; Section 5 contains the conclusions.

Preliminary
This section will cover a relevant theoretical base of the techniques implemented in the proposed work.

Transfer Learning
Transfer learning is a technique of conveying knowledge received in one domain to another related domain. With respect to DCNN models, transfer learning is a method in which the weights learned by a model on a certain dataset can be reused for other similar kind of datasets. Such techniques are useful in receiving the advantages of deep learning-based models with limited data and less training time for pattern recognition problems [36]. The transfer learning approach has been successfully implemented in several recent studies based on medical applications [37][38][39]. In the proposed work, the transfer learning approach was implemented on pre-trained DCNN models, namely VGG19-Net, InceptionV3-Net, and ResNet-50 for feature extraction from the character images. The key properties and architecture of these models are discussed below.

VGG-19Net
The VGG19-Net has a total of 19 trainable layers, out of which 16 are convolutional layers and 3 are dense layers. The network has 5 Max-pooling layers, one after each convolutional block to handle feature redundancy. The input size is fixed as 224 × 224 × 3. The size of the feature-vector is 4096 [40].

Inception V3-Net
The InceptionV3-Net has three types of inception modules that are basically welldesigned convolution modules. These modules are used for generating distinct features with a reduced number of parameters. Several convolutional layers and pooling layers are arranged in parallel in each inception module. The input size is fixed as 299 × 299 × 3. The size of the feature vector is 2048 [41].

ResNet-50
ResNet was the winner of the ILSVRC competition in 2015 [42]. ResNet-50 is a 50layer deep network with 49 convolutional layers and 1 dense layer. It is similar to the other networks having convolution, pooling, activation, and dense layers in cascade, but with the presence of an identity connection between the layers to skip one or more layers. The model assumes that residual learning is more effective than stacked learning. The input size is fixed as 224 × 224 × 3. The size of the feature vector is 2048. Table 1 summarizes the architecture of pre-trained networks used in the proposed work.

Handcrafted Features: Bi-Orthogonal Discrete Wavelet Transform
For Multi-Resolution Analysis (MRA) [43] of an image, Bi-orthogonal wavelets are preferable over orthogonal ones, due to their flexible nature of transformation-they can be invertible without being necessarily orthogonal. The Bi-orthogonal wavelets have inbuilt properties of symmetry, linear-phase, compact-support, and de-noising. The symmetry and linear phase ensure distortion-less wavelet coefficients; compact support is a guarantee of capturing minute details of a given pattern; the potential of de-noising can improve the recognition ability [44].
Bi-orthogonal wavelets can be used to develop symmetrical wavelet functions [45], which are useful in the effective edge-representation of images. Since edge representation is a significant aspect in the recognition of handwritten characters, the BDWT becomes the natural choice for handcrafted feature extraction in the proposed scheme.
The symmetry property needs exact reconstruction of the Bi-orthogonal filter-banks design [46]. The design carries analysis and synthesis filter banks. The scaling and transformation functions associated with the analysis filter bank are given by Equations (1) and (2): Here, L(k) and H(k) are responses of filters associated with scaling and transformation functions. For the synthesis-filter bank, the scaling and transformation functions are given by (3) and (4): Here, L(k) and H(k) are responses of filters associated with scaling and transformation functions of the synthesis-filter bank. The (3) and (4) are dual of (1) and (2) [47]. For exact reconstruction, the filter responses must satisfy the conditions given in (5), (6) and (7) [48]: Referring to Equations (1)- (7), all the equations are linear in nature, ensuring a simple design and less complexity.

Principal Component Analysis
The PCA is an efficient method of minimizing data-redundancy; it can eliminate multiple collinearities present in the data. It offers flexibility in the selection of the desired number of reduced features in such a way that they carry most of the information of original datasets. The dimensionality reduction provides the benefits of fast processing, ease in visualizing and analyzing the data, and low computational cost of the model [49]. All these benefits make the PCA a genuine choice for the proposed work. The mathematics involved in finding principal components is as follows [13].
Suppose we have a dataset with N number of samples as given in (8): where each sample carries F number of features as given by (9).
The covariance matrix for given dataset is determined by (10): Here d j is the mean of data. With the help of covariance-matrix Eigen, values and Eigen-vectors are computed as given in (11): where C is a covariance matrix, V is Eigen vector and λ is Eigen value.

Multi Layer Perceptron
It is a class of Feed Forward Neural Networks (FFNN) with three types of layers, namely Input, Hidden, and Output. Depending on the applications, the network may have more than one hidden layer. The hidden units run on a nonlinear activation function and play an important role in solving the problems of classification and regression. The network uses backpropagation algorithms for its learning. The MLP network has a tremendous ability to solve pattern recognition problems [50]. The network architecture with one hidden layer is shown in Figure 2. In the proposed work, Rectified Linear Unit (ReLU) activation function was used in hidden units of the MLP network, due to its low computational cost over Sigmoid and tanh functions. For hidden-unit k, the ReLU-activation function is given by (12): Here x is input to hidden-unit k.
In the proposed work, the Adam optimizer was used because of its ability to deal with sparse gradient and non-stationary objectives. These abilities are adopted from AdaGrad and RMSProp optimizers. The Adam solver is used to update the weight parameters in the following manner [51].
In the first step, gradient (g) is computed w.r.t. time instant t as: where, θ represents weights and biases parameters. The g represents dw and db, respectively. In step 2, the first moment m t and the second moment v t are updated as: where, β 1 and β 2 are exponential decay rates ranging between 0 and 1. In step 3, biascorrection is applied on m t and v t as:m In the final step, the parameter update is given by: These steps are repeated till the final convergence.

Support Vector Machine
An SVM algorithm is used to find the best possible hyper-plane in K-dimensional feature space to separate the samples of different classes. The SVM classifier is popular because of its potential in solving for the global minimum. It can impressively control the over-fitting problem due to its feature generalization ability [52]. SVM can efficiently solve the classification problem of non-linearly separable data with kernel approach. All these benefits of SVM make it the prime choice for classification tasks in the proposed work. The mathematical modelling of the Kernel approach is given below: The optimum hyperplane derived by the SVM classifier can be given by Equation (19): where, N represents the number of samples, α represents a Lagrange multiplier, and y i represents the labelled value of i th support vector, given as y i ∈ {1, −1} and b represents bias. To classify the nonlinearly-separable data using a linear decision-plane, the kernel approach can be used in the following manner [53].
With the help of Basis function φ, the given data is transformed to higher dimensions. By using Equation (19), the hyper plane in a higher dimension can be given as: Equation (20) can be rewritten as: Here, K (x, x i ) is kernel function, which is defined as: In the proposed work, the Linear kernel was used because of its ability to make SVM learn fast, the least risk of over-fitting, and the need for zero hyper parameters [54]. It is given by:

Research Method
The design of the proposed scheme is displayed in Figure 3. The character image was supplied for individual feature-extraction schemes. In the DWT-based scheme, the BDWT was applied on a given image to collect a feature-vector in the form of wavelet coefficients. The same image was given to individual pre-trained DCNN models (VGG-19Net, ResNet-50, and InceptionV3-Net) to receive respective feature-vectors. At the end of the featureextraction stage, we received four different feature-vectors related to the given image; they are labelled as F1, F2, F3, and F4. Next is the dimensionality reduction stage, where the PCA method was used to reduce the size of individual feature-vectors; they were reduced to equal sizes and marked as W, V, R, and I. The reduced-sized feature-vectors were supplied to the feature-fusion stage, where hybrid feature-vectors were developed with all possible combinations of W, V, R, and I. At this stage, 15 new datasets (D1 to D15) were created by collecting respective hybrid feature-vectors for all the images of input dataset. The performance of individual datasets was estimated over MLP and SVM classifiers. The respective results were collected and analyzed in the output stage.
The experimental framework was developed in a Python environment. Several opensource libraries were used for the task: open-cv, python-imaging-library, py-wavelet, keras, numpy, pandas, scikit-learn, seaborn, python-time-module, etc. The experiments were simulated on the powerful Google Co-laboratory platform. Co-laboratory is supported with tesla k-80, 2496 CUDA cores, 12 GB GDDR5-VRAM GPU, hyper-threaded Xeon processor, 12.6 GB RAM, and 33 GB storage capacity.

Dataset Pre-Processing
A dataset of Devnagari characters was prepared by collecting handwritten documents from a wide variety of individuals belonging to different age groups and professions. Individual characters were scanned and cropped manually. The whole dataset was prepared and made available in the public domain by Acharya, Pant, and Gyawali [55]. A dataset of size 18,000 was prepared for the proposed work; it included all the 36 consonants. All the images were checked for a uniform size of 32 by 32 and converted into a grey-level format.

Feature Extraction
In the proposed scheme, multiple approaches were applied for the extraction of features from the character images. The approaches are described in the following sub-sections.

Discrete Wavelet Transform
The Bi-orthogonal Discrete Wavelet Transform (BDWT) was applied to individual images for receiving salient features in terms of wavelet coefficients. The two-dimensional coefficients, including approximation, horizontal, vertical, and diagonal details, were transformed into a feature-vector, as shown in Figure 4. This feature-vector was marked as F1. In the proposed work, Bi-orthogonal-1.3 wavelet (Bior-1.3) was used up to second-level decomposition of character image. The kernel size of the Bior-1.3 wavelet is summarized in Table 2.  Referring to Table 2, the level 1 decomposition produced the feature vector size of 1296 (i.e., 4×(18 × 18)), including all the four types of coefficients (approximation, horizontal, vertical, and diagonal), while the feature-vector size of 484 (i.e., 4×(11 × 11)) for level-2 decomposition offered a promising reduction in feature dimensionality. This made the level 2 decomposition a logical choice for the proposed work.

Pre-Trained Deep Convolutional Network
In the proposed scheme, three pre-trained DCNN models, namely VGG-19Net, ResNet-50, and InceptionV3-Net were individually used for feature extraction from character images. The trainable layers of individual models were frozen. The input images were resized as per the need of the models (refer to Table 1). The feature-vector for each image was collected at the final global average pooling layer of the respective models. The corresponding feature-vectors are shown in Figure 5 and marked as F2, F3, and F4.

Feature-Vector Size Reduction
The Principal Component Analysis method was used for reducing the size of various feature vectors. The trial-and-error method was adopted to decide the number of PCA components. Initially, 20 PCA components were estimated from individual featurevectors (i.e., from F1, F2, F3 and F4). The respective principal components were fused into a single feature-vector. A sub-dataset of 3600 such fused feature-vectors (i.e., 100 samples from each character-class) was prepared for the purpose. The sub-dataset was used to train and test the proposed classifiers. The process was repeated with the sub-dataset by increasing the number of PCA components up to 60 in the step of 10. It was observed that recognition accuracy improved in a good amount by increasing PCA components in the range of 20 to 40, but no significant improvements were noticed for the range of 40 to 60. This made the 40 PCA components an optimum choice. In the proposed work, the size of individual feature-vectors (F1, F2, F3, and F4), were reduced to 40 and the resultant vectors were marked as W, V, R, and I, respectively.

Fusion of Features
Several hybrid feature-vectors were developed with all the possible combinations of reduced feature-vectors W, V, R, and I. The 11 combinations were based on fusion of two or more feature-types, while 4 combinations have no fusion (refer to Table 3). The format of all hybrid feature-vectors is given in Figure 6. The respective hybrid feature-vectors were derived for all the character images of the input dataset. At this stage, 15 new datasets were produced; they are summarized in Table 3. Fusion of W, V and I 120 D14 Fusion of V, R and I 120

D15
Hybrid type (fusion of four types of features) Fusion of W, V, R and I 160 Figure 6. The format of hybrid feature-vectors; (1) to (4) are feature-vectors without fusion and they are related to the new datasets-D1 to D4, respectively; (5) to (15) are hybrid feature-vectors with fusion of 2 or more feature-types and they belong to new datasets, D5 to D15, respectively.

Classification
The newly created datasets (D1 to D15) were examined with MLP and SVM classifiers. Both the classifiers were trained and tested over these datasets individually. The k crossvalidation scheme was adopted for the classification task to get generalized results and to avoid over-fitting problems that might be caused a moderate-sized dataset. In this scheme, each sample of the dataset was used one times as a test sample and k-1 times as a training sample. The general rule and empirical experience suggested the most preferable values of k to be 5 or 10 [56]. We selected k as 5 i.e., a 5-cross validation scheme in the present work. In the Result and Discussion section, the mean value of the results received from the five cross-validation schemes has been mentioned by default. The major specifications of both the classifiers used in the proposed scheme are summarized in Tables 4 and 5.  Stopping-criterion tolerance 0.001 6 No. of iteration −1 (no limit) 7 Break ties False 8 Probability True 9 Shrinking True The number of hidden units was selected by experimenting with the values in the range of 30 to 100 for different input sizes, as mentioned in Table 4. Concerning optimal results obtained, the hidden units were selected as 36, 58, 74, and 85 for respective input sizes.

Performance Metrics
The results were compiled in terms of recognition accuracy, precision, recall, and F1measure. These terms were determined with the help of True Positive (TP), True Negative (TN), False Negative (FN), and False positive (FP) predictions, made by the classifiers. The effectiveness of the proposed feature-extraction techniques was visualized with the help of the Kernel Density Estimation (KDE) algorithm.
The mathematical formulations used for the estimation of the mentioned metrics are given in Equations (24)-(27):

Results and Discussions
The various results of the proposed scheme are compiled in this section with a detailed discussion on critical observations.

Results
The results of the proposed scheme are summarized in Table 6 in terms of overall recognition accuracy achieved by the model for all the 15 types of datasets. It can be observed from Table 6 that both the classifiers produced the highest recognition accuracy for Dataset 15. Dataset 15 was developed by fusion of features received from Wavelets, VGG-19Net, InceptionV3-Net, and ResNet-50. The best results related to the individual hybrid-feature-vector categories (refer to Table 3) are bold-faces in Table 6 for the purpose of critical discussions. Figure 7 shows a comparative analysis of the results, achieved by the two classifiers from 15 types of datasets, as mentioned in Table 6. The individual character-wise results in terms of precision, recall, and F1 score are compiled in Tables 7 and 9-12. For readability purposes, the values were rounded off to two decimal places. For each character class in Tables 9 and 12, the cell with the highest score was shaded for further analysis.
It has been observed from Tables 7 and 9-12 that the proposed scheme achieved optimum results from Dataset 15 (i.e., D15). The overall recognition accuracy achieved with Dataset 15 was 98.73% and 98.18% in relation to the MLP and SVM classifiers, respectively. The mean values of precision, recall, and F1 score for Dataset 15 were 98.78%, 98.67%, and 98.69% on the MLP classifier; and 98.11%, 98.22%, and 98.25% on the SVM classifier. The Dataset 15 has been considered for various comparative analysis, generation of confusion matrix, and visualization of feature-separation in the upcoming sections. Figure 8 shows a character-wise comparative analysis of the two classifiers for Dataset 15 in terms of precision, recall, and F1 score.      Table 9. Character-wise F1 score produced by an MLP classifier for the proposed datasets.        Table 12. Character-wise F1 score produced by an SVM classifier for the proposed datasets.

Cls.
No.   To determine the correct and incorrect predictions for individual character classes, the confusion matrix was generated by the two classifiers for Dataset 15. These are shown in Figure 9.

Feature Visualization
The Kernel Density Estimate (KDE) is a useful technique to visualize the distribution of observations in the given dataset. It is equivalent to a histogram with the difference of using a continuous probability density curve with one or more dimensions. The proposed scheme employed the KDE plot to visualize the separation of the features related to different character classes. For this purpose, the features were transformed from multidimensions to two-dimensions with the help of the PCA method. Figure 10a was plotted with the features received directly from images of the handwritten character dataset, while Figure 10b was plotted with fusion-based hybrid features of Dataset 15. A clear distinction can be made between these two. In Figure 10a, the separation of various character-classes in the feature space was so poor that some of the classes were completely overlapped by the others, and were either poorly or not visible; the examples were the classes 4(ङ), 5(च), 13(ढ), 14(ण), 15(त), and 23(भ). On the contrary, Figure 10b shows a promising separation between all the character classes, except 28(व) and 32(ह). Figure 10 justifies the effectiveness of features collected from the fusion-based approach.

Estimation of Computational Time
Various timings were estimated in relation to feature-extraction and classification for the top performing dataset (i.e., Dataset 15). The mean timings per character w.r.t. feature extraction and dimensionality reduction are compiled in Table 13. Referring to Table 13, the net time required was 4.27 msec. for the construction of a hybrid feature-vector from an input image. The timings related to character-classification through a feature-vector are summarized in Table 14. This section covers a detailed discussion on the strength of features related to various datasets, as mentioned in Table 6.
Referring to Table 6, it can be observed that in the individual category (Dataset 1 to 4), the features received from VGG-19Net (Dataset 2) and ResNet-50 (Dataset 3) produced a comparatively good result over the rest of the features, using MLP and SVM classifiers. The features collected from VGG-19Net were able to produce recognition accuracy of 86.85% on an MLP classifier, and the features received from ResNet-50 were produced 82.66% recognition accuracy on an SVM classifier. These scores were highest in the category.
In the hybrid category consisting of two types of features (Datasets 5 to 10), the combination of features received from VGG-19Net and ResNet-50 (Dataset 8) outperformed the rest of the combinations. This combination produced the best results over both the classifiers in this category; the MLP and SVM classifiers produced 95.37% and 93.92% recognition accuracy, respectively, with Dataset 8.
In Dataset 15 has a feature-vector size of 160, which is 84.37% less than that of the character image (i.e., 32 × 32 = 1024). This, in turn, reduced the recognition cost of the classifiers significantly.

Classifier-Wise Discussion
Referring to Table 6, Datasets 11, 12, 13, 14 and 15 responded well to both the classifiers and the results were almost comparable. It can be observed that all the datasets produced slightly higher results on the MLP classifier w.r.t. the SVM classifier. This shows the slight upper hand of the non-linear functional approach of the MLP network over the linear separation approach of SVM to solved the proposed multi-class classification problem.

Character-Wise Discussion
Since Dataset 15 was the best performer, we considered it for the character-wise analysis of the model.
It is interesting to know that both the classifiers responded in an excellent manner to the characters क (Ka), च (Cha), झ (Jha), ठ (Thha), ण (Adna), and फ (Pha). This gives a clear indication of uniqueness in the shape of these characters.
The MLP classifier scored satisfactory but had a comparatively low F1 score (refer to Table 9) for the character class य (Ya). The same is true with the SVM classifier (refer to Table 12) for character classes घ (Gha), छ (Chha), द (Da), ध (Dhha), and य (Ya). This might be due to the resemblance of their shapes with some other characters of the complete dataset.
For precise result-analysis of the proposed model, the Results section incorporated individual character-class-wise scores of various performance metrics for the proposed datasets and the proposed classifiers (refer to Tables 7 and 9-12). To relate different values of precision, recall, and F1 score with their practical aspect, let us take one example.
Referring to Table 7, for Dataset 15, the precision value of character य is 0.96 i.e., 96%, which shows that 96% of the total predictions made for the character were correct and for the remaining 4% prediction, the other characters were falsely classified as character य (Ya). This can also be verified from the confusion matrix given in Figure 8a; on the axis of "Predicted value", the total 503 predictions can be observed in the column of character य (class no. 25), out of which 483 were correctly predicted as य; and some other characters such as ग (class no. 2), घ (class no. 3), च (class no. 5), ड (class no. 12), थ (class no. 16), प (class no. 20), म (class no. 24), व (class no. 28), and स (class no. 31) were falsely classified as character य for 3, 4, 1, 1, 6, 2, 1, 1, and 1 times, respectively, i.e., total 20 false-positive predictions. It is interesting to know that most of the false predictions were made by character थ (class no. 16), which has a shape resembling the character under test i.e., character य (Ya). The higher the precision value, the lower were the false-positive predictions.
Referring to Table 12, for Dataset 15, the recall value of character य is 0.97 i.e., 97%, which shows that 97% of the total test samples of the character were correctly predicted and for the remaining 3% samples, the character य was falsely classified as some other character. It can be verified from the confusion matrix given in Figure 8a on the axis of "Actual value" that the total 500 samples can be observed in the row of character य (class no. 25), out of which 483 samples were correctly classified as य; and some of the samples of character य were falsely classified as ग (class no. 2), घ (class no. 3), ण (class no. 14), थ (class no. 16), प (class no. 20), व (class no. 28), and त्र (class no. 34) for 2, 5, 1, 5, 1, 2, and 1 times, respectively, i.e., total 17 false-negative predictions. The higher the recall value, the lower were the false-negative predictions.
Referring to Table 9, for Dataset 15, the F1 score value of character य is 0.96 i.e., 96%, which represents a single score for corresponding precision and recall values (refer to Equation (27)). A higher value of the F1 score ensures lower false-positive and false-negative predictions, which is desirable.
Referring to Tables 9 and 12, the cells with the highest F1 score were highlighted for each character class to see the highly efficient dataset in recognition of the given character. For example, Table 12 shows that Datasets 11,12,14, and 15 responded excellently to the SVM classifier in recognition of character ख.
Similarly, we can analyse the precision, recall, and F1 score values produced by the MLP (Tables 7, 9 and 12, resp.) and SVM (Tables 10-12, resp.) classifiers for various character classes associated with different datasets.
The average value of precision, recall, and F1 score were estimated on the basis of all the 15 datasets. The top 10 and bottom 10 entries w.r.t. F1 score value (refer to Tables 9 and 12) are summarized in Tables 15 and 16. The characters are arranged in descending order of their mean F1 score.    The list of the bottom 10 characters recognized by both the classifiers is the same. This list has a number of resembling character shapes, such as घ(Gha)-ध(Dha), ब(Ba)-व(Wa), and थ(Tha)-य(Ya)-स(Sa). This might be the reason for their comparatively low mean scores.
In feature visualization, although the KDE plots in Figure 10 were drawn with two dimensions only, instead of the total 160 dimensions in Dataset 15, the provision was successful in visualizing the usefulness of the proposed hybrid-features scheme.

Conclusions
The proposed scheme provided a comprehensive analysis of the results obtained from individual-based and hybrid-based approaches of feature-extraction using Wavelets, VGG-19Net, ResNet-50, and InceptioNet-V3. The features were examined with two classifiers, namely MLP and SVM. The scheme applies the benefits of deep convolutional neural networks with low cost of recognition. The feature-vector size was successfully reduced by 84.37% with respect to the character image. The scheme managed to score a maximum of 98.73% recognition accuracy with mean character recognition time of 10.65 ms. Table 17 summarizes some excellent work accomplished earlier in the field of handwritten Hindi characters recognition; it includes state-of-the-art models related to the feature-based classification. Here, it is reasonable to mention that no standard dataset of handwritten Hindi script is available in the public domain, restricting a comparison of related models on a common platform [3]. However, one can gain some idea about the performance of the proposed work by comparing it to existing methods.  [58] the classification error was limited to 1.27% only. The presented work will be helpful for further development of recognition models related to handwritten Hindi characters.
In future, the proposed scheme could be extended by including some more advanced pre-trained deep convolutional networks like InceptionV4-net, ResNeXt, Squeeze-Excitation network (SE-Net), etc. for feature extraction. Other classification algorithms like KNN, RNN, LSTM, Random-forest, and Naïve-Bayes could also be analyzed for the extended version.