An AI Based Approach for Medicinal Plant Identification Using Deep CNN Based on Global Average Pooling

: Medicinal plants have always been studied and considered due to their high importance for preserving human health. However, identifying medicinal plants is very time-consuming, tedious and requires an experienced specialist. Hence, a vision-based system can support researchers and ordinary people in recognising herb plants quickly and accurately. Thus, this study proposes an intelligent vision-based system to identify herb plants by developing an automatic Convolutional Neural Network (CNN). The proposed Deep Learning (DL) model consists of a CNN block for feature extraction and a classiﬁer block for classifying the extracted features. The classiﬁer block includes a Global Average Pooling (GAP) layer, a dense layer, a dropout layer, and a softmax layer. The solution has been tested on 3 levels of deﬁnitions (64 × 64, 128 × 128 and 256 × 256 pixel) of images for leaf recognition of ﬁve different medicinal plants. As a result, the vision-based system achieved more than 99.3% accuracy for all the image deﬁnitions. Hence, the proposed method effectively identiﬁes medicinal plants in real-time and is capable of replacing traditional methods.


Introduction
Medicinal plants have been used in traditional medicine practices for a long time because of their nutrients and medicinal properties [1].Due to their bioactive compounds, such as phenolic, carotenoid, anthocyanin, and other bio-active components, they are known for their antioxidant, anti-allergic, anti-inflammatory, and antibacterial properties [2].Different species of plants are recognized as having medical properties; they can be trees, shrubs or herbs.Their single diffusion depends on the environmental conditions they adapted to over time.According to the statistics, about 14-28% of all plants have medicinal uses [3].Furthermore, about 3-5% of patients in developed countries, and over 80% in the rural population in developing countries and about 85% of the population in the Southern Sahara use medicinal plants to treat their diseases because of their properties [4].Moreover, in developed countries, some people have turned to traditional medicines prepared from medicinal plants to treat and control illnesses and diseases after considering the harmfulness and side effects of chemical drugs [5,6].In addition to their medicinal uses, these plants can be used as food, beverages, and even in the form of cosmetics [7,8].Unfortunately, many counterfeit, low-quality, damaged, or not perfectly preserved medicinal plants are being manufactured and distributed worldwide, which may harm their users [9].
Botanists have long identified various species of medicinal plants using traditional and experience-based methods.Nevertheless, visually and manually identifying medicinal plants from other similar plants can be extremely challenging and time-consuming for inexperienced people [10][11][12].
Plants are generally classified based on their various organs, including roots, flowers, and leaves.The leaves, one of the most important organs of plants, largely differ among species and varieties in colors, shapes, and texture characteristics.However, in some cases, there have always been challenges associated with identifying medicinal plants due to the apparent similarities of their leaves.Furthermore, the leaf color cannot be considered a suitable option for plant classification due to their similarities and, most importantly, their variability during the growing period.
Several real-time vision systems have recently been developed using machine vision applications and computational methods to identify medicinal plants [13][14][15][16].Deep learning (DL) techniques have largely been applied since they can simultaneously handle feature extraction and image selection.As a result, DL has gained great popularity in various agriculture automation applications over the last few years, where object detection and image classification are required [17][18][19][20].
Convolutional Neural Network (CNN) is one of the most successful DL methods in which it has been associated with excellent efficiency in image segmentation and pattern recognition [21].The CNN uses expertly trained layers that have been adopted in several studies for plant identification and classification [22][23][24][25] and plant disease identification [26][27][28][29].CNN consists of a hierarchy of self-learning properties, in which low-level features such as colors, corners, and edges are learned from the first layer and high-level features such as textures and objects within the image are learned from the deep layer.Automatic feature learning has reduced the sensitivity of CNNs to environmental variables such as light changes.These models combine the learning of extracted features with the classification, which are both crucial steps in the image processing process.Therefore, unlike conventional machine learning algorithms, the manual (hand-crafted) extraction of features is not required.
Nasiri et al. [30] showed that the DL algorithm applied to imaging recognition in the visible area (400-700 nm) could discriminate grape leaves among 6 different cultivars with 99% accuracy.Paulson and Ravishankar [31] identified 64 types of medicinal plants using digital images adopting three DL models of CNN, VGG16, and VGG19, achieving an accuracy rate of 95.7, 97.8, and 97.6%, respectively.Grinblat et al. [32] used a 3-layer CNN model to identify three different plants based on their leaf vein patterns and attained an accuracy recognition rate of 92%.Hu et al. [33] developed a Multi-Scale Function (MSF) MSF-CNN model to classify plant leaves by integrating multi-scale features with CNNs.MSF-CNNs consist of multiple learning branches with different scales of learning.They performed experiments using two public datasets of plant leaves, MalayaKew (MK) and LeafSnap.Their results demonstrated that the proposed MSF-CNN method is more effective in recognizing plant leaves than state-of-the-art methods.
It is still difficult to distinguish medicinal herbs from other plants, even with visionbased systems that have significantly improved their ability to extract complex features and select the most important ones via conventional Machine Learning (ML).Therefore, the main objective of the current study was to develop a real-time automatic vision system for identifying medical plants using a proposed DL algorithm coupled with a machine vision technique.To put it more clearly, this research aimed to improve the automatic identification of medicinal plants as their popularity is growing and they are increasingly being used for artisanal and industrial purposes.

Sampling Protocol
The leaves of five medicinal plants, Lemon Balm (Melissa officinalis L.), Stevia (Stevia rebaudiana Bertoni), Peppermint (Mentha balsamea Wild), Bael (Aegle marmelos) and Tulsi (Ocimum sanctum L) were collected in the Northern area of Iran, in the cities of Salmas (38  04 ′ 03 ′′ E).The characteristics of these plants are summarized in Appendix A. One hundred and fifty leaves for each medicinal plant (a total of 750 samples) were used for the investigation.After the collection, the leaves were immediately wrapped in zipped plastic bags and transferred to the laboratory for the following investigation operations.
The leaves' images were captured in an imaging box.The imaging box consists of a camera, lighting box, and computer and is equipped with a ring of LED lamps that emit low-intensity infrared light (450 lm) with adjustable light intensity on 3 levels (low, medium, and high).Refer to Azadnia and Kheiralipour [13] for more details on the imaging box.The distance for imaging the leaf samples was set at 250 mm.The images were captured with a smartphone (Galaxy A8, SAMSUNG Corporation, Suwon, Korea, 16 MP camera).The mobile camera settings were adjusted to achieve high-quality images (f/1.2, 9.8 mm).In addition, for each leaf sample, an RGB image with 3456 × 4608 pixels size was taken and stored as a jpeg file on a personal computer.

Image Pre-Processing
Plant leaf images were pre-processed to remove backgrounds using a code written in the Python programming software environment (3.10v).The images were automatically uploaded by the program and processed.Open CV2 library, Python Image Library (PIL), and Numpy library were exploited to build the code to process the images.The optimal T value (threshold value for removing the background) was selected and applied to separate the leaves from the backgrounds.This study used Otsu's threshold to identify the optimal threshold value [34].
The pre-processing operation for background removal was performed via the following steps, while the results of the steps are shown in Figure 1: Step 1: The acquired images were re-sized.
Step 2: The images were optimized by Otsu's threshold method.
Step 3: Suppression of the empty pixels inside the leaf images via dilation operation through the morphology method.
Step 4: Reversing the binary mask made in the previous step.
Step 5: Replacement of the reversed mask by the pixels related to the main images of the plants.The speed and accuracy of the processing of the images through the DL network are affected mainly by pixel image sizes [35].To be effective, automatic vision systems must be fast and accurate.For this reason, the images were re-sized in three definition levels, 256 × 256, 128 × 128, and 64 × 64 pixels, to enhance the speed and investigate the classification accuracy rate at the different image definitions.Generally, it is not necessary to pre-process images to train deep learning algorithms.We explored the possible reduction of the computational time and increase of the recognition accuracy by pre-processing the images.

Data Augmentation (DA)
Data Augmentation (DA) is a technique to effectively increase the amount of data for network training, addressed at enhancing the accuracy of models and significantly preventing its over-fitting [36].For the investigation, in addition to the original images, five DA multiform transformation methods were adopted, as reported in Figure 2: four angles rotations, 45 • , 90 • , 135 • , and 180 • , and a color manipulation.From the total number of data (13,500 images), 80% (10,800 images) and 20% (2700 images) were randomly selected for training and testing the proposed network, respectively (Table 1).

Architecture of the Proposed CNN Model
CNN is an architecture of DL algorithms containing a collection of non-linear transformation functions.These networks are an advanced version of artificial neural networks (ANNs) that can perform image processing operations such as object identification, segmentation, and classification.CNN models are made up of several layers producing an output from an input.Convolutional, Pooling, and Fully Connected layers are among the most common algorithm of a CNN network implemented to process RGB images.Convolutional layers use the local correlation of information in the images to extract the features: in other words, they are applied to the input data to extract low-level features such as edges, corners and blobs, and generate a feature map by using a set of filters, i.e., the so-called kernels [37].Pooling layers select features from the top-layer feature map using sampling and holding the model steady when performing operations, such as rotation and scaling.These two layers are employed to reduce the network parameters and training time and prevent overfitting [38].The two most common types of pooling layers are maximum and average pooling layers.The Fully Connected layers find an application after the convolutional and pooling layers have been implemented.These layers include neurons, biases, and weights.
In the Fully Connected layers, each neuron is connected to the upper neuron to convert the integrated multidimensional features into one-dimensional features for classification and identification operations [39].
Figure 3 summarizes the architecture of the model adopted for image recognition.It consists of 5 convolutional blocks and a classifier block.In the 5 convolutional blocks, the output of each block is the input of the next one.In each convolutional block, 2 convolutional layers are used to extract the important features, including shape, color, and texture.Stride and Padding of convolutional layers were 3 × 3 kernels, equivalent to 1 pixel.An activation function was considered for each ReLU layer.Each convolutional layer is followed by a batch normalization layer, which causes the CNNs to become deeper, thus reducing the number of training iterations required.After batch normalization layers, max-pooling layers of 2 × 2 with Stride 2 were utilized to reduce the dimensions of the feature map.Finally, a dropout layer with a value of 0.1 was used in each block to prevent overfitting.The output of the convolutional blocks is the input of the following classifier block.The architecture of the classifier is described in Figure 4.It contains 4 layers: Global Average Pooling (GAP), dense (ReLU), dropout, and softmax layers.
In this study, we adopted the GAP layer instead of a Fully Connected layer as reported in similar experiments [40].The main advantage of the GAP layer is to prevent the overfitting phenomenon by reducing the number of parameters and complexity of network computing.The main difference between GAP and fully connected layers is that the output of the flatten layer must be given to the fully connected layer to obtain the required features (Figure 5).However, the output of the GAP layer has appropriate dimensions, so the fully connected layer is not required.As a result, when fully connected layers are used, many parameters are added to the model, resulting in a more complex and a higher probability of over-fitting.Conversely, in the GAP layer there is no need for the parameters to be optimized; therefore, the model reduces the number of parameters and has a low degree of complexity.
Feature extraction of the GAP layer from the images is summarized in Figure 6.The GAP layer reduces the spatial dimension of a tensor (N × M × D) to a tensor with the dimension of 1 × 1 × D.  The ReLU function in the classifier block performs mathematical operations (Equation ( 1)) on the input data, while the softmax function calculates the numerical value of the normalized probability for each neuron based on (Equation ( 2)) [41].
where a i is the input of the softmax function for node i (Class i).
Figure 7 summarizes the classification model adopted to classify the leaf images.It includes a GAP layer, 3 dense layers with 64, 128, and 256 neurons, a dropout layer, and a softmax classifier.Finally, the classifier block with three different dense layers were tested to select the model with the best performance.

𝑎 𝑖
The most common way to visually assess the model's performance is to extract the

Performance Metrics
A confusion matrix was utilized to evaluate the predictive performance of the test data after the classification.The confusion matrix is a table layout that allows for visualizing the performance of a supervised algorithm.Thus, the performance metrics, including accuracy, precision, sensitivity, specificity, and Area-Under-the-Curve (AUC) were measured based on the confusion matrix elements.

Results
The most common way to visually assess the model's performance is to extract the features using a visual response filter [42].The proposed CNN model extracts the low-level features such as edges, blobs and corners from the images by the initial layers, while high-level features such as textures and objects are identified by the deep layers.Figure 8 summarizes how the proposed CNN model extracts low-level features from the images.
features contain efficient information which the proposed model extracts in the initial The result of the extraction of the features and the following classification of the images of the medicinal plants through the CNN network adopted for the study is presented in the confusion matrix reported in Figure 9.The proposed CNN model has been able to correctly classify the images of the medicinal plants with overall accuracy rates of 99.66, 99.32, and 99.45% when the input images are 64 × 64, 128 × 128, and 256 × 256 pixels, respectively.Finally, according to the results, the confusion matrix of 64×64 pixels achieved the best performance with an overall accuracy of 99.66%.Moreover, in the confusion matrix of 64 × 64 pixels, the Peppermint class had a mistaken image, which is confused with plants in the Tulsi class.The Bael and Stevia classes had three misclassified images.The Tulsi class had two misclassified images, which was confused with the Peppermint and Stevia classes.This could be attributed to the similarities in shape features among them.In particular, the Lemon Balm leaves, in all three pixel dimensions, were classified correctly by the CNN model without any confusion in all images (100% accuracy).However, the high similarities among the leaves of medicinal plants caused the proposed model to incorrectly classify several images of the other 4 species involved in the study.Therefore, although the medicinal plant images were similar, the model nearly predicted the correct classifications.[43].The incorrect adjustment of parameters, such as learning rate, in the network, causes the model to be over-fitted and fail to reach convergence.
Table 2 summarizes the accuracy, precision, sensitivity, specificity, and AUC of the three sizes of images extracted by the proposed CNN confusion matrix.All of them are at 100% for the Lemon Balm images.The highest average per class value of accuracy (99.8%) was achieved for the image size of 64 × 64 pixels.Moreover, the average values of accuracy for 128 × 128 and 256 × 256 pixel images were 99.7%.The results also indicate that the highest and lowest precision metric values were obtained for the image size of 64 × 64 and 128 × 128 pixels with 99.6 and 99.2%, respectively.According to Table 2, the highest average per class value for AUC was obtained for images of 64 × 64 pixels with an accuracy of 99.7%.The values of this parameter for images with the sizes of 128 × 128 and 256 × 256 pixels were slightly lower at 99.5 and 99.6%, respectively.However, the specificity values were 99.8% for the three sizes of images.' ' ' (a)

Discussion
According to the results, the proposed CNN algorithm has been proven to be effective in identifying medicinal plants with similar leaves compared to models developed by previous studies for a similar purpose.
Amuthalingeswaran et al. [44] implemented a DL model to identify medicinal plants.Their model was trained on 800 images of 4 types of medicinal plants and could identify medicinal plants in the field with 85% accuracy.Anubha Pearline et al. [45] identified different plants using DL methods and conventional ML algorithms.The plant classification was performed by conventional learning methods extracting texture, shape and color features using LBP and Haralick algorithms, the HU moments, and various color channels, respectively.The extracted features were then classified by Linear Discriminant Analysis (LDA), Logistic Regression (LR), K-Nearest Neighbor (KNN), Classification and Regression Tree (CART), and Random Forest (RF).The best identification performance, 82.38%, was obtained with the RF algorithm.Furthermore, the VGG16 network identified plants with 97.14% accuracy.Zhu et al. [46] proposed a two-way attention model based on the DL network to identify plants from the Flower 102 dataset.The recognition accuracy of their model was 97.2%.Munner and Fati [47] recognized Malaysuan herbs using an automated classification system.They extracted the shape and texture features of plant images and classified them using a DL model.The performance metrics of accuracy, precision and sensitivity achieved were 98, 93 and 85%, respectively.Moreover, Reddy et al. [48] proposed an optimized CNN model consisting of four convolutional layers, followed by two fully connected layers and a softmax layer.Their suggested CNN model was focused on the color images of leaves.The accuracy, precision and sensitivity metrics obtained were 97.6, 93.4 and 95.2%, respectively.Moreover, Zhang et al. [49] proposed a 7-layer CNN approach to classify 32 plant species.They employed the DA technique to increase the dataset images, achieving a 94.7% accuracy rate in plant identification.
The results of the study highlight that, unlike conventional ML methods, the proposed DL model can automatically extract important features from medicinal plant leaves and classify them with an accuracy up to 99%.Hence, there is no need to extract the features manually.Furthermore, experimental results indicated that our approach outperformed previous studies for identifying medicinal plants in terms of precision, sensitivity, specificity, and AUC by 99.6, 99.6, 99.8, and 99.7%, respectively.
Nonetheless, using primary CNN models has been associated with challenges, such as high computational cost, complexity, and a long-time running process.To overcome this problem, we adopted two strategies: (1) increasing the number of images and reducing their sizes during image processing operations and (2) replacing the fully connected layer with the GAP layer, which significantly decreased the number of parameters and model complexity.Such solutions, finally, have been proven to increase the model's accuracy and computational speed and prevent the occurrence of the over-fitting phenomenon.In addition, it is noteworthy that the prediction time for the collected medicinal plant leaves database using the proposed model was 40.81 s, which is lower than the time, 44.1 s, for the same operation reported in the study conducted by Roopashree and Anitha [50].
According to the results we obtained from the study, the method we adopted can be used to design an automated vision-based recognition system to identify various herb plants with high accuracy and speed.Such an application can improve the public's interest and use of medical herbs, supporting the demand for healthier and safer food.Furthermore, the model developed by the study will be tested in future studies, with possible adjustments, to identify less common medicinal plants.

Conclusions
Identifying medicinal plants separately from other non-edible plants is essential in botany and food industries.The traditional methods of identifying medicinal plants are time-consuming, complex, and require experienced and trained people.The automatic real-time vision-based system exploited to identify broadly used medical herbs with similar leaves has given positive results.The proposed method includes an improved CNN network consisting of convolutional and classifier blocks.The classifier had a Global Average Pooling (GAP) layer, dense layer, dropout layer, and softmax layer.Compared with previous studies' results, this solution reduces the number of parameters and increases the speed and accuracy of the model.The proposed CNN model identified the medicinal plant images in three levels of image definition, 64 × 64, 128 × 128, and 256 × 256 pixels, with overall accuracy rates of 99.66, 99.32, and 99.45%, respectively.Therefore, combining image processing and the proposed CNN algorithm is an efficient alternative to traditional methods.Future works will be carried out to improve the model's performance in the classification of further species of medicinal plants to confirm the effectiveness to a larger extent of the solution we developed.The model will be adopted to develop an intelligent mobile-based application for the real-time identification of medicinal plants.
The current study aimed to improve the automatic identification of medicinal plants due to their growing popularity and increased requests for artisanal and industrial uses and applications.Therefore, the proposed DL algorithm and image processing technique can have a special place in plant science and even industrial markets for identifying and classifying varied medicinal plants separately from other non-edible plants.

Mentha balsamea Wild Peppermint
The Mentha balsamea plant is a multipurpose herb plant from the Lamiaceae family.This plant is used in Iran as an antiviral, invigorating, stimulant and antifungal agent [56].The extracted oil from Mentha balsamea is also used as a medicine to treat cancer, colds, sore throats, nausea, toothaches and muscle soreness [57].Aegle marmelos is a fruit with purely medicinal properties.The leaves of this fruit are feverish and expectorant and are used for bleeding, diarrhea and intestinal disorders.It is also used to treat urinary problems, regulate heart rate and treat stomach ache [58,59]. erties.

- Figure 1 .
Image pre-processing steps to remove the background from an original image (see text for 1-5 descriptions).

Figure 2 .
Figure 2. The DA method using image rotation and color manipulation.

Figure 3 .
Figure 3.The architecture of the CNN model for image recognition.

Figure 4 .
Figure 4.A block diagram of the proposed CNN model.

Figure 5 .Figure 6 .
Figure 5. Differences in the performance of GAP vs. fully connected layers in imaging classification.

Figure 7 .
Figure 7.A summary of the proposed model for the identification of medicinal plant species.

Figure 8 .
Figure 8.The first activation layer of each channel related to the test images.

Figure 10
indicates the CNN model's prediction accuracy and loss function for the train and test leaf images with three different image sizes.The accuracy and loss values are obtained for each epoch.Each epoch goes through a cycle to update its weight during total data training.The loss values determine how much the proposed model reacts after optimizing each iteration.In other words, the loss values train the sum of errors created for each image to find the model's best weights.The downward trend in the training and testing diagrams shows that the proposed model has excellent performance in classifying the medicinal plants' images.Convergence of the proposed CNN model was obtained after 100 epochs; in other similar models, with hundreds of classes and millions of images, it could be close to 1000 epochs
to the Lamiaceae family.It grows mostly in warm regions and is used as an antiseptic, antiemetic, anti-flatulence, sedative, treatment of gastrointestinal disorders, rheumatism and skin disorders[60,61].

Table 1 .
Data splitting for training and testing in the proposed CNN.

Table 2 .
The performance results of medicinal plant classification obtained from the proposed CNN model.