The Classiﬁcation of Medicinal Plant Leaves Based on Multispectral and Texture Feature Using Machine Learning Approach

: This study proposes the machine learning based classiﬁcation of medical plant leaves. The total six varieties of medicinal plant leaves-based dataset are collected from the Department of Agriculture, The Islamia University of Bahawalpur, Pakistan. These plants are commonly named in English as (herbal) Tulsi, Peppermint, Bael, Lemon balm, Catnip, and Stevia and scientiﬁcally named in Latin as Ocimum sanctum, Mentha balsamea, Aegle marmelos, Melissa ofﬁcinalis, Nepeta cataria, and Stevia rebaudiana, respectively. The multispectral and digital image dataset are collected via a computer vision laboratory setup. For the preprocessing step, we crop the region of the leaf and transform it into a gray level format. Secondly, we perform a seed intensity-based edge/line detection utilizing Sobel ﬁlter and draw ﬁve regions of observations. A total of 65 fused features dataset is extracted, being a combination of texture, run-length matrix, and multi-spectral features. For the feature optimization process, we employ a chi-square feature selection approach and select 14 optimized features. Finally, ﬁve machine learning classiﬁers named as a multi-layer perceptron, logit-boost, bagging, random forest, and simple logistic are deployed on an optimized medicinal plant leaves dataset, and it is observed that the multi-layer perceptron classiﬁer shows a relatively promising accuracy of 99.01% as compared to the competition. The distinct classiﬁcation accuracy by the multi-layer perceptron classiﬁer on six medicinal plant leaves are 99.10% for Tulsi, 99.80% for Peppermint, 98.40% for Bael, 99.90% for Lemon balm, 98.40% for Catnip, and 99.20% for Stevia.


Introduction
Living things on earth depend on the oxygen produced by plants. There are many different types of plants, all of them playing an important role in maintaining the earth's biodiversity by providing air and water to living humans [1]. Medicinal plants are plants used in the treatment and prevention of certain diseases and conditions that affect humans [2]. There are many different types of herbal remedies and they can vary from place to place, resulting in a similar pattern of "size" and "shapes" [3]. These plants have excellent medicinal properties from roots to leaves. The leaves of some herbs such as Karpooravalli (Coleus ambonicus), Podina (Mentha arvensis), Neem (Adidirachta indica), Thudhuvalai 57 multi-features datasets, then optimized these features into 15 features via the ML approach. For the classification purpose, they employed various classifiers and it has been observed that MLP gives a higher performance which is 98.14% on a region of interest size of (256 × 256).

Contribution
The main aim of this study is to propose a framework for the classification of medicinal plant leaves based on multispectral and texture features using a ML approach. This study contains six steps which are given below: • Collect multi-spectral and digital image dataset via computer vision laboratory setup. • Crop exactly leaf region, and transform into the gray level format with (800 × 800) resolution. • Employ seeds intensity-based edge/line detection utilizing Sobel filter. • Draw 5 regions of observation on each image and extract fused features from the dataset.

•
Optimize fused features dataset using chi-square feature selection approach. • Apply machine learning based classifiers for observing medicinal plant leaves classification.

Materials and Methods
We recall that the medicinal plant leaves were collected from the Department of Agriculture, The Islamia University of Bahawalpur, Pakistan located at 29 • 23 44" N and 71 • 41 1" E [21]. The foundation of the dataset holds six types of medicinal plants leaves commonly English named as (herbal) Tulsi, Peppermint, Bael, Lemon balm, Catnip, and Stevia and scientifically named as (in Latin) Ocimum sanctum, Mentha balsamea, Aegle marmelos, Melissa officinalis, Nepeta cataria, and Stevia rebaudiana, respectively. These plants are described in Figure 1.
Agronomy 2020, 10, x FOR PEER REVIEW 3 of 15 images and extracted 57 multi-features datasets, then optimized these features into 15 features via the ML approach. For the classification purpose, they employed various classifiers and it has been observed that MLP gives a higher performance which is 98.14% on a region of interest size of (256 × 256).

Contribution
The main aim of this study is to propose a framework for the classification of medicinal plant leaves based on multispectral and texture features using a ML approach. This study contains six steps which are given below: • Collect multi-spectral and digital image dataset via computer vision laboratory setup.
• Crop exactly leaf region, and transform into the gray level format with (800 × 800) resolution. • Employ seeds intensity-based edge/line detection utilizing Sobel filter.
• Draw 5 regions of observation on each image and extract fused features from the dataset. • Optimize fused features dataset using chi-square feature selection approach.
• Apply machine learning based classifiers for observing medicinal plant leaves classification.

Materials and Methods
We recall that the medicinal plant leaves were collected from the Department of Agriculture, The Islamia University of Bahawalpur, Pakistan located at 29°23'44" N and 71°41'1" E [21]. The foundation of the dataset holds six types of medicinal plants leaves commonly English named as (herbal) Tulsi, Peppermint, Bael, Lemon balm, Catnip, and Stevia and scientifically named as (in Latin) Ocimum sanctum, Mentha balsamea, Aegle marmelos, Melissa officinalis, Nepeta cataria, and Stevia rebaudiana, respectively. These plants are described in Figure 1.  In the experimentation, we used 50 fresh leaves of medicinal plants for each variety. Firstly, 100 digital images (50 front and 50 back) are taken for each variety. So, the size of digital image dataset of 100 × 6 = 600 colored images of pixel-dimension 1280 × 1024 where In the experimentation, we used 50 fresh leaves of medicinal plants for each variety. Firstly, 100 digital images (50 front and 50 back) are taken for each variety. So, the size of digital image dataset of 100 × 6 = 600 colored images of pixel-dimension 1280 × 1024 where size of individual pixel is 0.26 mm was developed to perform further experiments. The multi spectral dataset was collected via a multi-spectral radiometer (MSR5) with 4 feet height. They extract five bands with range of 460 nm to 1560 nm known as "Red" (R), "Blue" (B), "Green" (G), "Near Infrared" (NIR) and "Spectral Bands Shortwave Infrared" (SWIR). In this regard, a total of 1200 samples were acquired, 600 samples of digital image, and 600 samples of multispectral dataset per each type of medicinal plant leaves utilizing computer vision laboratory setup as explained in Figure 2.
Agronomy 2020, 10, x FOR PEER REVIEW 4 of 15 size of individual pixel is 0.26 mm was developed to perform further experiments. The multi spectral dataset was collected via a multi-spectral radiometer (MSR5) with 4 feet height. They extract five bands with range of 460 nm to 1560 nm known as "Red" (R), "Blue" (B), "Green" (G), "Near Infrared" (NIR) and "Spectral Bands Shortwave Infrared" (SWIR). In this regard, a total of 1200 samples were acquired, 600 samples of digital image, and 600 samples of multispectral dataset per each type of medicinal plant leaves utilizing computer vision laboratory setup as explained in Figure 2. The collected dataset is noise free due to the computer vision laboratory setup. For image pre-processing, the medicinal leaves digital image dataset is examined in OpenCV, computer vision library [22]. Also, the Sobel filter was employed for edge/line detection as shown in Figure 3.   The collected dataset is noise free due to the computer vision laboratory setup. For image pre-processing, the medicinal leaves digital image dataset is examined in OpenCV, computer vision library [22]. Also, the Sobel filter was employed for edge/line detection as shown in Figure 3. size of individual pixel is 0.26 mm was developed to perform further experiments. The multi spectral dataset was collected via a multi-spectral radiometer (MSR5) with 4 feet height. They extract five bands with range of 460 nm to 1560 nm known as "Red" (R), "Blue" (B), "Green" (G), "Near Infrared" (NIR) and "Spectral Bands Shortwave Infrared" (SWIR). In this regard, a total of 1200 samples were acquired, 600 samples of digital image, and 600 samples of multispectral dataset per each type of medicinal plant leaves utilizing computer vision laboratory setup as explained in Figure 2. The collected dataset is noise free due to the computer vision laboratory setup. For image pre-processing, the medicinal leaves digital image dataset is examined in OpenCV, computer vision library [22]. Also, the Sobel filter was employed for edge/line detection as shown in Figure 3.    After that, we crop exactly the leaf region with (800 × 800) resolution, all the digital color images being transformed into the 8-bit gray-level format, and draw five regions of observation (ROO's) on each sample image. The procedure of taking ROOs is divided into two steps; the first step, we take the size of ROOs is (220 × 220) and, in the second step, we take the size of ROOs is (280 × 280) and finally get the different datasets for experimentations. A total of 1000 (5 × 200) ROO's have been generated for each medicinal plant leaf as represented in Figure 4. In this manner, a total of 6000 (6 × 1000) ROO's has been generated on 6 varieties of medical plant leaves.
Agronomy 2020, 10, x FOR PEER REVIEW 5 of 15 After that, we crop exactly the leaf region with (800 × 800) resolution, all the digital color images being transformed into the 8-bit gray-level format, and draw five regions of observation (ROO's) on each sample image. The procedure of taking ROOs is divided into two steps; the first step, we take the size of ROOs is (220 × 220) and, in the second step, we take the size of ROOs is (280 × 280) and finally get the different datasets for experimentations. A total of 1000 (5 × 200) ROO's have been generated for each medicinal plant leaf as represented in Figure 4. In this manner, a total of 6000 (6 × 1000) ROO's has been generated on 6 varieties of medical plant leaves.

Proposed Methodology
The proposed methodology for the medicinal plant leaves classification is described below. In the first step, all the images acquired in the dataset were examined in OpenCV, computer vision software library [22]. Then, the Sobel filter was employed for edge/line detection. This process is based on the seed intensity (pixel threshold value) of connected pixels of an image; if the threshold value is greater than six then mark the region called a region of observation (ROO's). The graphical representation of the proposed

Proposed Methodology
The proposed methodology for the medicinal plant leaves classification is described below. In the first step, all the images acquired in the dataset were examined in OpenCV, computer vision software library [22]. Then, the Sobel filter was employed for edge/line detection. This process is based on the seed intensity (pixel threshold value) of connected pixels of an image; if the threshold value is greater than six then mark the region called a region of observation (ROO's). The graphical representation of the proposed methodology for the classification of medicinal plant leaves based on fused features using machine learning techniques is described in Figure 5. methodology for the classification of medicinal plant leaves based on fused features using machine learning techniques is described in Figure 5.

Fused Features Extraction
The OpenCV, computer vision software library [22] was used for the fused feature extraction process that holds texture, spectral, and gray level run length matrix features. A total of 65 fused features were extracted from each ROO's, which is grouped as 40 texture features, 5 multi-spectral features, and 20 run-length matrix features as described below. The extracted dataset has a large features vector space (FVS) size of 390,000 (6000 × 65) for medicinal plant leaves varieties classification.

Fused Features Extraction
The OpenCV, computer vision software library [22] was used for the fused feature extraction process that holds texture, spectral, and gray level run length matrix features. A total of 65 fused features were extracted from each ROO's, which is grouped as 40 texture features, 5 multi-spectral features, and 20 run-length matrix features as described below. The extracted dataset has a large features vector space (FVS) size of 390,000 (6000 × 65) for medicinal plant leaves varieties classification.

Texture Feature
The texture features are based on GL co-occurrence matrix [23][24][25], which is calculated via 4 dimensions (0, 45, 90, 135) degrees and distance between seeds. In this study, we used Agronomy 2021, 11, 263 7 of 15 5 average features known as energy (ξ), inertia (τ), entropy (ψ), inverse difference (IDE), and correlation (ϕ). First, energy is defined by where u and v are the spatial coordinates and ρ uv is gray level values. The correlation is specified by Also, the formula of the entropy is the following: The IDE can be defined as Finally, the inertia is obtained as

Spectral Features
The frequency domain features, known as spectral features, are used in texture analysis. These features are calculated as power of different areas (A) also known as rings [26]. The numerical explanation is given as where η(u, v) is the frequency domain.

Gray Level Run-Length Matrix (GLRLM)
Galloway [27,28] introduced the Gray Level Run Length Matrix (GLRM), a section of gray also known as run length. It can be described as a linear multitude of continuous pixels with the same gray level in a particular direction. The basics on this approach is recalled below. Let η p be the number of seeds in the image, η r be the number of discrete run lengths in the image, ψ(v 1 , v 2 |θ) be the run length matrix for an arbitrary direction θ, η r (θ) be the number of runs in the image along angle θ and η g be the number of discrete intensity values in the image. Then, the short run emphasis is described as Long run emphasis is given by Gray level non-uniformity corresponds to Run percentage can be defined as Low gray level run emphasis is described as follows: High gray level run emphasis is obtained as Short run low gray level emphasis is defined by Long run low gray level emphasis is determined as Finally, run length variance is presented below:

Feature Selection
The feature selection (FS) process is the most important part of the ML based classification. This process aims to select the most valid and remove the extra features with no importance in the classification process [29]. In this study, we observe that a total of 65 fused features dataset has been extracted from each ROO's with a large FVS size of 390,000 (6000 × 65) for medicinal plant leaves varieties classification that takes too much time in classification. The feature selection should be to identify the minimum number of columns/features from the data source that are significant in building a model [30]. Our goal in this research is to achieve better accuracy in less time. It is observed that without feature selection, the Multi-Layer Perceptron (MLP) classifier gives 98.81% accuracy results where the size of ROO is 280 × 280, but it takes a lot of time (4.83 Seconds) due to large number features. But when we go with selected features, we obtain higher accuracy in less time. There many ML based features selection approaches such as PCA technique provided excellent results on linearly separated dataset, also used in the selection of features [31]. The PCA method is an unsupervised approach [32], but the medicinal plant leaves varieties dataset is labeled, and the PCA results were not as promising on the labeled data. To solve this problem, ML based supervised feature selection techniques, namely, chi-square attribute evaluator with ranker search method [33] were used to select optimize features from the large FVS. This approach was better compared to PCA and was able to obtain the sub-dataset with the optimal characteristics for this large dataset. The chi-square attribute evaluator with ranker search method is used in ML to rank the independence of two discrete properties [34]. In FS, we specifically check whether the presence of a particular term and the presence of a particular class is independent. Formally, when a document is given to η, we estimate the following amounts for each term and rank them according to their scores through the following formula: where N is the observed frequency and E is the expected frequency, if the document contains the terms i and zero, then the value of N γ l is 1 and if the document is in class j and zero, the value of Eγ l is 1. The chi-square feature selection technique deployed on the medicinal plant leaves dataset reduces the FVS, and gives 14 optimized features with FVS size of 84,000 (6000 × 14) for medicinal plant leaves classification. Figure 6 shows the three-dimensional (3D) representation of the optimized features dataset within six classes using PCA. The MDF1, MDF2, and MDF3 are three different dimensions (like x, y, z) of most discriminant features.
ronomy 2020, 10, x FOR PEER REVIEW 9 of 15 where N is the observed frequency and E is the expected frequency, if the document contains the terms i and zero, then the value of N is 1 and if the document is in class j and zero, the value of is 1. The chi-square feature selection technique deployed on the medicinal plant leaves dataset reduces the FVS, and gives 14 optimized features with FVS size of 84,000 (6000 × 14) for medicinal plant leaves classification. Figure 6 shows the threedimensional (3D) representation of the optimized features dataset within six classes using PCA. The MDF1, MDF2, and MDF3 are three different dimensions (like x, y, z) of most discriminant features. The fused optimized features for the classification of medicinal plant leaves dataset are shown in Table 1.

Classification
Five machine learning classifiers named as multi-layer perceptron (MLP), LogitBoost (LB), Bagging (B), Random Forest (RF), and Simple Logistic (SL) are deployed on the medicinal plant leaves dataset. It is observed the MLP performed well as compared to the other implemented classifiers [35]. The mathematical foundations of the MLP are given below. The production of input weight and bias are summed using the summation function (δ ) defined by The fused optimized features for the classification of medicinal plant leaves dataset are shown in Table 1.

Classification
Five machine learning classifiers named as multi-layer perceptron (MLP), LogitBoost (LB), Bagging (B), Random Forest (RF), and Simple Logistic (SL) are deployed on the medicinal plant leaves dataset. It is observed the MLP performed well as compared to the other implemented classifiers [35]. The mathematical foundations of the MLP are given below. The production of input weight and bias are summed using the summation function (δ n ) defined by where I i is the input variable I, k is the number of inputs, η ij is the weight, and θ i is the bias term. The activation functions of MLP is chosen as The output of neuron j can be obtained as The medicinal plant leaves classification MLP framework with all regulation parameters are shown in Figure 7. The deployed MLP classifier with all parameters is defined in Table 2. where is the input variable I, k is the number of inputs, is the weight, and is the bias term. The activation functions of MLP is chosen as The output of neuron j can be obtained as The medicinal plant leaves classification MLP framework with all regulation parameters are shown in Figure 7. The deployed MLP classifier with all parameters is defined in Table 2.   Table 2. It depends on threshold point values which are selected manually. In this study, the experiments were performed with different values but, after a large number of tests, these selected values bring complete satisfaction. If we increase or decrease these values, our accuracy will be disturbed.   The deployed MLP classifier with all parameters is defined in Table 2. It depends on threshold point values which are selected manually. In this study, the experiments were performed with different values but, after a large number of tests, these selected values bring complete satisfaction. If we increase or decrease these values, our accuracy will be disturbed.

Results and Discussion
Five ML classifiers namely multi-layer perceptron (MLP), LogitBoost (LB), Bagging (B) with REPTree, Random Forest (RF), and Simple Logistic (SL), deployed on fused optimized features dataset for the classification of medicinal plant leaves. The foundation of the dataset holds six types of medicinal plant leaves named as Tulsi, Peppermint, Bael, Lemon Balm, Catnip, and Stevia. The medicinal leaves classification based on fused features is performed using cross-validation (10-fold) data splitting approach. Different testing parameters such as "Receiver Operating Characteristic" (ROC), "Kappa Statistics", "False Positive (FP), "Recall" (R), "True Positive" (TP), and "F-Measure" is observed [27]. Firstly, an experiment performed on ROO's size (220 × 220) for the classification of medicinal plant leaves and observed a well-organized accuracy which is 95.87%, 95.04%, 94.21%, 93.38%, and 92.56% using MLP, LB, B, RF, and SL, respectively, as shown in Table 3. Table 3. Classification of medicinal plant leaves using five ML classifiers with ROO's size of (220 × 220).

Classifiers
Kappa It is observed that the MLP performs efficiently insisted of other employed classifiers when the size of ROO's 220 × 220, as shown in Figure 8. Lemon Balm, Catnip, and Stevia. The medicinal leaves classification based on fused features is performed using cross-validation (10-fold) data splitting approach. Different testing parameters such as "Receiver Operating Characteristic" (ROC), "Kappa Statistics", "False Positive (FP), "Recall" (R), "True Positive" (TP), and "F-Measure" is observed [27]. Firstly, an experiment performed on ROO's size (220 × 220) for the classification of medicinal plant leaves and observed a well-organized accuracy which is 95.87 %, 95.04 %, 94.21%, 93.38 %, and 92.56% using MLP, LB, B, RF, and SL, respectively, as shown in Table 3. It is observed that the MLP performs efficiently insisted of other employed classifiers when the size of ROO's 220 × 220, as shown in Figure 8. For improvement in classification result, the proposed approach employed on medicinal plant leaves dataset where the size of ROO's is (280 × 280). We observe very promising results which are 99.01 %, 98.01 %, 97.02%, 96.03%, and 95.04% using MLP, LB, B, RF, and SL, respectively, as shown in Table 4.  For improvement in classification result, the proposed approach employed on medicinal plant leaves dataset where the size of ROO's is (280 × 280). We observe very promising results which are 99.01%, 98.01%, 97.02%, 96.03%, and 95.04% using MLP, LB, B, RF, and SL, respectively, as shown in Table 4. It is observed that MLP effectively emphasizes the other classifiers when the size of ROO is 280 × 280, as shown in Figure 9.  Table 5. The distinct classification accuracy of six medicinal plant leaves, named as Tulsi, Peppermint, Bael, Lemon balm, Catnip, and Stevia, were 99.10%, 99.80%, 98.40%, 99.90%, 99.40%, and 99.20%, respectively, on ROO's size 280 × 280 as shown in Figure 10. We started our experimentations with the size of ROOs 220 × 220. After that, we gradually increased the size of ROOs to achieve better accuracy. Finally, at the size of ROOs   Table 5.  Figure 10.  Table 5.  Figure 10. We started our experimentations with the size of ROOs 220 × 220. After that, we gradually increased the size of ROOs to achieve better accuracy. Finally, at the size of ROOs  We started our experimentations with the size of ROOs 220 × 220. After that, we gradually increased the size of ROOs to achieve better accuracy. Finally, at the size of ROOs 280 × 280, we observe the promising accuracy because it covers maximum useful information. Further increase in the size of ROOs was causing a decrease in the accuracy due to speckle noise. Lastly, the comparative analysis performs for the classification of medicinal plant leaves with the sizes of ROO's 220 × 220 and 280 × 280, respectively, as shown in Figure 11.
Agronomy 2020, 10, x FOR PEER REVIEW  13 of 15 280 × 280, we observe the promising accuracy because it covers maximum useful information. Further increase in the size of ROOs was causing a decrease in the accuracy due to speckle noise. Lastly, the comparative analysis performs for the classification of medicinal plant leaves with the sizes of ROO's 220 × 220 and 280 × 280, respectively, as shown in Figure 11. The methodology proposed is comparatively reliable and efficient from that described previously [13,14,[16][17][18][19][20]. Furthermore, it is consistent, satisfactory, and better from the existing medicinal plant leaves classification. A comparative analysis of the proposed methodology with existing works is shown in Table 6.

Conclusions
In this study, we develop a machine learning (ML) based medical plants leaves classification utilizing multispectral and texture dataset. The main objective is to collect a refined and standardized dataset, edge/line detection, fused features extraction, optimized extracted features, and select the most valuable feature and select the efficient ML classifiers. The fused (multispectral + texture) feature dataset holds six types of medicinal leaves named Tulsi, Peppermint, Bael, Lemon Balm, Catnip, and Stevia collected via computer vision laboratory setup. Due to the complex laboratory setup, the collected dataset is very refined and standardized. The chi-square feature selection approach provides the 14 most worthful features that are useful to obtain better classification results. A total of five AIbased classifiers are considered, named as multi-layer perceptron (MLP), LogitBoost (LB), Bagging (B), Random Forest (RF), and Simple Logistic (SL  The methodology proposed is comparatively reliable and efficient from that described previously [13,14,[16][17][18][19][20]. Furthermore, it is consistent, satisfactory, and better from the existing medicinal plant leaves classification. A comparative analysis of the proposed methodology with existing works is shown in Table 6.

Conclusions
In this study, we develop a machine learning (ML) based medical plants leaves classification utilizing multispectral and texture dataset. The main objective is to collect a refined and standardized dataset, edge/line detection, fused features extraction, optimized extracted features, and select the most valuable feature and select the efficient ML classifiers. The fused (multispectral + texture) feature dataset holds six types of medicinal leaves named Tulsi, Peppermint, Bael, Lemon Balm, Catnip, and Stevia collected via computer vision laboratory setup. Due to the complex laboratory setup, the collected dataset is very refined and standardized. The chi-square feature selection approach provides the 14 most worthful features that are useful to obtain better classification results. A total of five AI-based classifiers are considered, named as multi-layer perceptron (MLP), LogitBoost (LB), Bagging (B), Random Forest (RF), and Simple Logistic (SL). Firstly, an experiment is performed on ROO's size (220 × 220) for the classification of medicinal plant leaves dataset. It is obtained a well-organized accuracy which are 95.87%, 95.04%, 94.21%, 93.38%, and 92.56% for MLP, LB, B, RF, and SL, respectively. Secondly, the same approach is employed on a medicinal plant leaves dataset where the size of ROO's is (280 × 280). We obtain very promising results which are 99.01%, 98.01%, 97.02%, 96.03%, and 95.04% respectively. In addition, we observe that the MLP classifier performed well as compared to other implemented AI-based classifiers. This study opens a new horizon in the field of medicinal plant leaves classification. Also, it can be very helpful for pharmacists to recognize the correct medical plant and will help in the process of making medicine.

Limitation and Future Works
This study is limited to six medicinal plant leaves while there are millions of types of medicinal plant/herbs in the world. This is a pixel-based approach and in the future, we may wish to use an object-based approach. In future this proposed approach deployed on other medicinal plant leaves also proposed approach can be improved using hyper spectral and 3D digital image dataset. Data Availability Statement: Data available on request due to restrictions eg privacy or ethical. The data presented in this study are available on request from the corresponding author. The data are not publicly available due to it is self-generated dataset.