Bamboo Plant Classification Using Deep Transfer Learning with a Majority Multiclass Voting Algorithm

: Bamboos, also known as non-timber forest products (NTFPs) and belonging to the family Poaceae and subfamily Bambusoideae, have a wide range of flowering cycles from 3 to 120 years; hence, it is difficult to identify species. Here, the focus is on supervised machine learning (ML) and deep learning (DL) as a potential automated approach for the identification and classification of commercial bamboo species, with the help of the majority multiclass voting (MajMulVot) algorithm. We created an image dataset of 2000 bamboo instances, followed by a texture dataset prepared using local binary patterns (LBP) and gray-level cooccurrence matrix (GLCM)-based methods. First, we deployed five ML models for the texture datasets, where support vector machine (SVM) shows an accuracy rate of 82.27%. We next deployed five DL-based convolutional neural network (CNN) models for bamboo classification, namely AlexNet, VGG16, ResNet18, VGG19, and GoogleNet, using the transfer learning (TL) approach, where VGG16 prevails, with an accuracy rate of 88.75%. Further, a MajMulVot-based ensemble approach was introduced to improve the classification accuracy of all ML-and DL-based models. The ML-MajMulVot enhanced the accuracy for the texture dataset to 86.96%. In the same way, DL-MajMulVot increased the accuracy to 92.8%. We performed a comparative analysis of all classification models with and without K-fold cross-validation and MajMulVot methods. The proposed findings indicate that even difficult-to-identify species may be identified accurately with adequate image datasets. The suggested technology can also be incorporated into a mobile app to offer farmers effective agricultural methods.


Introduction
The automated identification of plant species is an essential field in ecology to maintain a healthy atmosphere and is becoming a powerful and frequently used method to keep track of species' presence over a vast taxonomic range [1,2].In the case of any plant, the identification keys are primarily based on floral characteristics.Although much research has been done on automated plant identification in general, most of the studies have used plant leaves and flowers for classification [3].Since bamboo has an excessively long reproductive cycle ranging from 3 to 120 years, it is challenging to identify a bamboo plant based on its reproductive structure.As a result, the emphasis on bamboo identification has changed from reproductive to vegetative features [4,5].Notably, bamboo species belong to the Poaceae family, one of the most prominent plant groups in the world.Bamboo is a large gramineous plant with woody stalks up to 40 m in height and up to 20 cm in diameter, distributed across the globe, except in extremely cold areas, such as the Atlantic and some European regions.Bamboo has about 1707 species from 128 genera, out of which many fall under commercial categories [6][7][8].It is one of the fastest-growing timber plants on Earth, with 1 m per day; one of them is Bambusa oldhamii, which grows nearly 4 feet per day [9].Bamboo forest areas across the globe cover an approximately 36-million-hectare area (MHA), among which India contains 16 MHA, followed by China with 6.5 MHA.These two countries have a vast range of bamboos, with more than 1000 species [10,11].Bamboo has mainly two species, Arundinarieae and Bambuseae; among the two, most of the world is covered by the Bambuseae species [12,13].Besides a few exceptions, each species of Bambuseae has a distinctive set of features that allows for the quick and simple identification of any given specimen as belonging to this bamboo family.However, the lack of blooming and the consistent visual appearance of species in the bamboo group pose significant challenges for image-based plant species identification [14].The variations between species and even genera might be difficult to see without a close examination, particularly if there is no flower present.In the case of bamboo, is too difficult to identify or recognize its species accurately because most of the bamboo species are grown in a group or a bunch.Every bamboo species has different features that are used for identification [10,15].As a result, determining the bamboo species and correctly identifying them is becoming an increasingly challenging area of research.There are several difficulties, including distortion, noise, and segmentation faults.Earlier researchers and biologists were confined to identifying the bamboo in physical environments only using the culm sheath and leaf [16].The majority of research focuses on ground-based objects.Despite this, there has been tremendous growth in the need to identify and recognize commercial plant species.The world's population is increasing rapidly, so there should be equal growth in the agricultural field, which can provide food and other essential items to this growing population.According to the Food and Agriculture Organization (FAO), about 40% of the population, i.e., approximately 2.5 billion people, is economically dependent on bamboo, and it also provides shelter to one billion people in the form of the traditional bamboo house [17][18][19].In addition, the international trade value of bamboo is estimated to be 68.8 billion US dollars, and approximately 80% of the world's population is dependent on NTFPs for their day-to-day needs.Many experts claim it will be a replacement for wood and plastic in the near future [10,20].Bamboo has unique and excellent physical, chemical, and mechanical properties, due to which it is very well known for its commercial application.This superior property makes it very useful and there are more than 1500 documented applications from different sectors, such as food [21], pulp and paper [22], construction and building materials [23], engineering and industry [24], fuel, raw materials, etc. [25].It plays a crucial role in maintaining the social, economic, and environmental development of nations, and its identification has been a transdisciplinary priority in both botanical taxonomy and computer vision, with researchers collaborating across disciplines.The automated and accurate identification of commercial bamboo species helps a wide range of stakeholders, including farmers, foresters, pharmacologists, taxonomists, biologists, technical workers of environmental agencies, and ordinary individuals [21,22,24,25].All these groups use their expert skills and knowledge accumulation for accurate species identification or classification.Nonetheless, the manual identification and recognition procedure is often laborious, tedious, and time-consuming [26].Therefore, we need an automated system that can identify the bamboo species quickly and without human involvement, relieving all stakeholders in the bamboo industry.Hence, a significant number of researchers have carried out investigations in favor of the automated categorization of plants according to their different characteristics in images [27][28][29].
Consequently, this has fueled the interest in developing automated systems for the identification of different bamboo plant species.A completely automated system for the recognition of commercial plants using computer vision and deep learning (DL) techniques has been presented.The automated identification of plants has improved considerably in the last few years, particularly thanks to the recent advances in ML algorithms such as random forest, logistic regression, convolutional neural networks (CNN), ResNet, and other deep learning techniques [30,31].However, we anticipate a CNN trained on image datasets to obtain greater absolute accuracy levels.In this research article, we describe and propose a comprehensive and automated bamboo species categorization and classification system, especially for commercial bamboo, using its morphological and texture attributes.We use the CNN pre-trained model as a transfer learning (TL) method [29,32] to train our DL model, which classifies ten commercial bamboo species.The methodology used for classification is based on ML and DL with a majority multiclass voting (MajMulVot)-based ensemble algorithm.
The most significant findings and contributions of this study are as follows: • To the best of our knowledge, a morphological and image database of bamboo is not available from any other source.Thus, we describe the creation of a novel morphological and image dataset for ten commercial species by visiting different sites around Maharashtra where bamboo is growing.

•
We examine the possibility of distinguishing bamboo species without the appearance of blossoms.

•
We seek to find the most precise combination of morphological attributes that play a vital role in determining the commercial value and can be used for species classification [36,37].

•
We analyze a variety of CNN architectures to extract features and compare their accuracy with the help of the ensemble MajMulVot algorithm.It can be applied not only to our dataset but also to a benchmark rice image open-source dataset [2,[39][40][41]].

•
To improve the effectiveness of multiple models, we present an ensemble approach based on majority multiclass voting.
The overall architecture of the proposed automated commercial bamboo species classification is described in Figure 1.The complete summary of the paper is as follows.Section 2 presents the theoretical background and related work; Section 3 describes the methodology and outlines the procedures of the proposed approach.Our results are presented and discussed in Sections 4 and 5, respectively.Finally, Section 6 concludes this paper.

Background and Related Work
Machine learning (ML) and deep learning (DL) methods are subsets of AI techniques.The fundamental distinction between ML and DL approaches is feature selection.In general, ML approaches require handcrafted features (i.e., features created by human experts or retrieved using various techniques but before training), while DL methods automatically extract these features using convolution operations during training.The concept of machine learning may be further segmented into two types: supervised and unsupervised.Supervised learning is a method in which the supplied dataset is labeled, i.e., the expected output is given, while unsupervised learning is a method in which the given dataset is unlabeled [28,42].
In the recent literature, a large number of DL algorithms using CNNs, transfer learning, and computer vision-based methods have been proposed for the classification of images; however, they are used mainly for biological image data [43][44][45].In contrast to this, there are some DL algorithms using CNNs, and transfer learning methods have been proposed for the plant domain, but this method either works only on a plant leaf, which has a different structure, or has limitations in real-time processing and scalability for plants that have a similar leaf structure.Numerous methods have been developed to classify plants automatically based on their leaves [38,46].
All ML models require features before the model starts training; therefore, we call these features handcrafted features.This type of feature is extracted from leaves or other portions of plants and categorized as shape features, morphological features, color features, and texture features [36].Table 1 gives details about the different feature extraction techniques for various ML models for the classification of plants based on a leaf.The unconstrained categorization and identification of bamboo are challenging in locations where there is a change in luminance, background confusion among various bamboo species and other plants, and local weather conditions.Another obstacle to correct species classification is the similarity of bamboo species in terms of culm body, culm texture, and culm color.There have been numerous attempts to classify bamboo; some have used machine learning and deep learning.
Experiments have been carried out to assess fine-grained categorization among a collection of visually extremely similar plant species.The constructed classifier had a 90% accuracy rate in distinguishing between 30 distinct Chenopodiaceae species [51,52].Purwandari et al. [26] proposed an ML-based expert system for bamboo identification using the K-nearest neighbors (KNN) algorithm.The proposed method used 34 characteristics of the shoot, culm (height, diameter, and internode length), sheaths (long, broad, and hair color), branches, and leaves (length and width).The model's name is Case-Based Reasoning (CBR) and it has four main steps: retrieve, reuse, revise, and retain.In another work, Aggarwal et al. [37] presented an image and morphological-based optimized random forest classification method to recognize Indian oxygen plants using SVM, RF, and multilayer perceptron (MLP) classifiers.Tong et al. [53] also performed bamboo shoot species classification using the capability of near-infrared reflectance (NIR) spectroscopy and three ML algorithms, which was less expensive and more time-efficient.Three classification models, i.e., SVM, RF, and partial least squares discriminant analysis (PLS-DA), were used.The SVM classifier demonstrated the most significant classification accuracy, reaching 95% in some cases [53].
Contrary to obsolete machine learning techniques, convolutional neural networks (CNN) may swiftly infer more advanced characteristics from given data.Using DL methods, the model itself extracts the appropriate characteristics [40,54].This approach has been shown to enhance ML approaches for image classification, in addition to relieving the burden of feature selection [55].Convolution is a well-known technique in the fields of computer vision and signal processing.Professionals in computer vision often use the convolution operation to accomplish noise reduction and edge identification.CNNs are a particular type of artificial neural network (ANN) that has become important in a variety of applications in the fields of AI and DL.CNNs and their derivatives have produced promising results when it comes to classifying handwritten numbers and recognizing objects and faces.To do this, the CNN expands its reach and improves its capacity to provide distinctive solutions to a variety of problems, including plant categorization in agricultural fields.At the moment, there are a significant number of researchers working on developing the automated analysis of plant images [41,56].Researchers in the fields of AI and ML are focused on developing more sophisticated models that can learn new information and extract previously unknown properties.This marks the beginning of work on the very first model trained using deep neural networks.Models such as AlexNet, VGGNet, GoogleNet, and ResNet, as well as their variants, are quite popular and widely used in the field of deep learning.It has achieved tremendous success in a variety of categories as a direct consequence of the enormous advancements made in visual identification and detection [57,58].
Alimboyong et al. [59] utilized a convolutional neural network and data augmentation techniques such as flipping, scaling, resizing, and rotating to identify 12 plant species based on photos of plant seedlings.Zhang et al. [60] created an automated classification system for urban tree species using AlexNet, VGG16, and ResNet50 with the help of RGB optical images.Sun et al. [61] proposed a 26-layer ResNet model for the recognition of plant species from low-resolution images that were captured by mobile phones in a natural scene.Currently, the application of plant classification is more prominent for characterization, taxonomic, and systemic investigations of plants.In addition, several studies have been published in the last decade on plant classification and species identification (Figure 2).
2 0 1 2 2 0 1 3 2 0 1 4 2 0 1 5 2 0 1 6 2 0 1 7 2 0 1 8 2 0 1 9 2 0 2 0 2 0 2 1 2 0 2 2 0 5 0 0 1 0 0 0 1 5 0 0 2 0 0 0 2 5 0 0 3 0 0 0 3 5 0 0 4 0 0 0 4 5 0 0 5 0 0 0 P l a n t C l a s s i f i c a t i o n P l a n t C l a s s i f i c a t i o n u s i n g T r a d i t i n a l M e t h o d P l a n t C l a s s i f i c a t i o n u s i n g M a c h i n e L e a r n i n g P u b l i c a t i o n s Y e a r

Materials and Methods
This section discusses how we prepared the dataset, its description, and the methodology for the proposed work.In this work, three bamboo datasets were exploited, as described in Section 3.2.Firstly, the morphology, the second texture, and the image were obtained, and unwanted attributes were cleared using suitable methods.

Data Collection
Since a morphological and image database of these ten commercial bamboo species is not available from any other source, a morphological and image database was created by collecting samples of these bamboo species from two different locations: the Bamboo Garden Amravati and the fly ash disposal area of Koradi Thermal Power Plant, Nagpur, Maharashtra, India.Most of the bamboo species in the study area are commercially used in many countries, and the selected species were Bambusa (bambos, tulda, vulgaris, wamin), Dendrocalamus (giganteus, membranaceus), Gigantochloa atroviolacea, Melocanna baccifera, Sasa fortune, and Thyrsostachys oliveri.The images were collected by a camera in the natural environment; each species had 200 samples, so a total of 2000 samples were used to classify the species and determine the variations.Regarding the physical attributes of all species, we collected morphological data with the help of questionnaires, and we call this DS1, whereas the capture image dataset is called IDCBS10.These questionnaires included the following questions related to bamboo and its physical attributes, which were later used for classification purposes.What is the color and texture variation that is used for species differentiation?

Data Description
Bamboo domain expertise involves many different morphological attributes or features used to identify the species, and all of this is described as dataset 1 (DS1).Meanwhile, dataset 2 (DS2) is the texture feature dataset, created by applying various feature extraction strategies, such as LBP and GLCM methods, on the image dataset of commercial bamboo species (IDCBS10).
• Shoot: the characteristic of bamboo shoot's color and hair color.• Culm (pole): height (m), diameter (cm), internode length (cm), color, texture, wall thickness (mm) • Branch: number of branches • Sheaths: three types of sheath-the culm sheath, the rhizome sheath, and the leaf sheath • Culm sheath: long (cm), broad (cm), hair color • Leaves: length (cm), width (cm) Experts use all these attributes to classify bamboo species in traditional methods.In each species, the values of every attribute vary, but mostly we see that they have some common values.These common values can be used as mean values.Table 2 describes these attributes for all ten commercial species.The values are kept in the lower range to the upper range.Figure 3 shows the mean distribution of the ten commercial bamboo species by using their morphological attributes.The variations are due to the soil quality, local climate conditions, cultivation techniques, applied fertilizer, age, etc.The advantage of dataset DS1 is that it is in a structured form, so, when applying any ML model, feature selection is easy using the correlation approach.However, the disadvantages are that DS1 was prepared with handcrafted features or created by taking the opinions of many domain experts.Thus, multiple factors, including human error, may influence the data, causing ML model overfitting on DS1.

Dataset 2 (DS2):
This includes texture features.This dataset was obtained by applying LBP and GLCM [33][34][35] texture feature extraction techniques to the image dataset.It includes many attributes, like entropy, energy, contrast, homogeneity, mean, standard deviation, correlation coefficient, etc.All these attributes are different for all 200 samples of each species, so a total of 2000 samples with 131 attributes are included.The advantage of dataset DS2 is that it is in a structured form, so, when applying any ML model, feature selection is easy using the correlation approach.However, the disadvantage of DS2 is that it contains 131 attributes and features, so the training of the ML model could require more training time if we use all these attributes.Dataset 3 (IDCBS10): This dataset includes 2000 images of ten commercial bamboo species.The photographs were taken in the summer and winter of 2022 in an area with moderate temperatures, low wind speeds, and varying lighting conditions.Background complexity has a significant role in image categorization.The IDCBS10 dataset collection includes three distinct forms of image information: a single bamboo culm image, bamboo photos with basic backgrounds, and bamboo images with complicated backgrounds.The image describes some other attributes as per computer vision; the image is based on pixels, and each pixel is used as one attribute.The size of an image and its depth are significant for the image's classification.IDCBS10 has a total of 2000 images, and some sample images in each category are shown in Figure 4.The data for the IDCBS10 dataset were taken from natural settings using digital cameras.The camera was fitted with a prime lens with an equivalent focal length of 28 mm and an RGB sensor with a resolution of 3120 by 4208.The bamboo image dataset 'IDCBS10' is in an unstructured form, so a disadvantage of this dataset is that we cannot directly apply any ML model.First, we have to preprocess it and convert the unstructured image dataset into a structured one.However, an advantage of the IDCBS10 dataset is that this is a real bamboo image dataset, so any DL model that we apply to train these models will be robust and not biased due to the minimal human error, as we have seen in DS1.
In the case of a bamboo image, color plays an important role, e.g., Bambusa vulgaris is yellow, and Gigantochloa atroviolacea is black.Most other species show different shades of green.The texture on the culm is also used for identification.As the data are in image form, some other image attributes also impact our model, such as the color, size, pixel range, number of samples, augmented data, etc.The sample details of the datasets are summarized in Table 3.

Data Preparation
This experiment was performed on three types of data.DS1 contained morphological data, so we had to check whether the data in all samples were valid or not.Some attributes of these data were irrelevant, such as the flowering cycle, because it ranges from 3 to 120 years, and it is not possible to consider this feature as a relevant attribute.DS2 had various texture features that were extracted by techniques like LBP and GLCM using the MATLAB software (https://ww2.mathworks.cn/en/products/matlab.html, accessed on 7 October 2023).Again, some preprocessing was required on this created dataset.IDCBS10 was an image dataset, and there were some different preprocessing techniques that were applied for data preparation, such as conversion, resizing, data augmentation, and some computer vision techniques.To apply the ML models on DS2, many features were extracted as per previous studies and expert suggestions.As a result, the following well-known features were chosen for this study: contrast, correlation, energy, homogeneity, entropy, minor axis length, major axis length, orientation, eccentricity, mean, variance, standard deviation, skewness, kurtosis, minimum value, maximum value, autocorrelation mean, autocorrelation standard deviation, autocovariance mean, autocovariance standard deviation, median absolute deviation, root mean square, interquartile, etc.These shape and texture features were extracted from GLCM matrix images followed by LBP [62,63].On the other hand, DL was applied to IDCBS10, in which the feature extraction technique was different.The convolutional operation was used throughout the training process to automatically extract the features for the DL models (CNN) [40,64,65].

Feature Selection
Feature selection is a process used to identify an optimal subset of characteristics, which increases the classification performance while reducing the complexity and the amount of computing needed.In the current study, there were two models, among which DL selected the features automatically, but, in the case of ML, a methodology is required to choose the features, which increases the performance of the classification model [66].In general, having a large number of input features might potentially result in poor performance; this is because the feature space grows.Moreover, it was noticed that our dataset included linked characteristics.As a result, two distinct methods of feature selection were tested in order to choose the superior features, which ultimately led to an increase in the accuracy of the ML model and a reduction in the amount of loss.
(A) Using domain expertise: In a traditional classification system, domain experts use selected features for identification because, in the case of bamboo, many features are not able to be recorded or used for classification.For example, the flowering year ranges from 3 to 120 years, so, for a human being, it is not easy to record such features.Likewise, many other morphological features are not valid for identification; therefore, we used the recorded attributes that were suggested by the experts in the bamboo domain [67,68].(B) Using correlation: The statistical method known as correlation analysis determines the degree to which a linear connection exists between two or more characteristics [69].
The correlation method was employed in this particular research for two main reasons: first, it is simple to execute; second, it was a good fit for our datasets, DS1 and DS2.Given two variables with 'n' values each, X and Y, we can calculate the correlation (r) using (1).
The correlation coefficient is r, and the covariance is Cov(X, Y); σ X and σ Y are the standard deviations of variables X and Y, respectively.Assume that X and Y are two sets of values with respectively.We can determine the r-value by using (2): where n denotes the sample, X i and Y i are the ith data values, and X, Ȳ are the mean values of X and Y, respectively.The r value ranges from −1.0 to +1.0, and its parameter specifies two values.They are (1) the strength of the relationship, which determines how closely these sets are linked (the higher the value, the stronger the relationship), and (2) the direction of the relationship, which specifies that if one value rises in one set, another value rises in another set, or if one value falls in one set, another value falls in another set, respectively.In other words, this association is known as a positive correlation when they are both heading in the same direction.A negative correlation is referred to as "converse".A value of 0 indicates that there is no relationship.The relationship becomes stronger as the value approaches +1, and this means a strong connection.Values around −1 suggest a significant negative correlation [69,70].The three different forms of correlation are represented in Figure 5.The r values that indicate the degree of correlation between the characteristics and the labels were evaluated in this study.During our experimentation, we found that the features of our dataset had a significant degree of association with one another.

Normalization and Standardization:
In the ML model, each feature in our study belonged to various scales.Hence, we applied both scaling techniques to minimize the accuracy loss and decrease the computing cost as the data values varied significantly across multiple orders of magnitude.There are two scaling methods.One is min-max normalization, which scales the values between 0 and 1 for each feature in the feature vectors.The other is standardization, which is a different approach to scaling, in which the numbers or qualities are centered around the mean (µ), and a unit standard deviation (σ) is used [71].The feature vectors were scaled and normalized for this analysis and calculated using ( 3) and ( 4), where X min and X max are the minimum and maximum values of X feature in normalization; on the other hand, µ X and σ X are the mean and standard deviation for the calculation of the standard scaler.
Data Augmentation: The consensus is that DL methods achieve good accuracy on massive datasets (i.e., a number of samples in the millions).However, if the data are modest or relatively small (i.e., a sample number less than one thousand), then there is a possibility of overfitting.Data augmentation is often carried out when there are insufficient data for a DL model.During this procedure, the data are artificially enhanced with the use of image alteration or enhancement operations such as flipping, rotation, cropping, shifting, scaling, blurring, noise, and so on [72,73].Data augmentation was crucial here as the amount of existing data or the number of observations was modest (i.e., several hundreds).As a result, we chose several augmentation methods to enrich our data.

Methodology
The techniques in the realm of artificial intelligence include both DL and ML.They are often used for plant image analysis because they can automatically identify useful patterns from the provided examples.The selection of features is an essential step in plant imaging.This step is equally as significant as the pixel value, edge width, and variance in pixel values throughout an area.A complex problem and crucial component of ML approaches is the selection of a suitable feature from the image.The performance of the system will suffer significantly from improper feature selection.As a result, there is a probability that ML approaches may choose incorrect features as a result of human perception and mistakes.When using DL methods, particularly CNNs, it is feasible to automatically extract useful information from whole images that the human eye cannot perceive [74,75].As a consequence of this, the DL method lowers the likelihood of making an incorrect choice regarding the features that should be used.Therefore, this research employed the aforementioned hypothesis and compared DL to ML to demonstrate the validity of this theory.
The architecture and the functional mechanism of the bamboo plant classification system with k-fold cross-validation are illustrated in Figure 6.The methodology known as K5-CV, or five-fold cross-validation [76], was used for these studies.As seen in Figure 6, this procedure included randomly splitting each dataset into five subsets, with 80% of the input data samples utilized for training purposes and the remaining 20% for testing.Different species' samples were present in each fold or training and test set.Each fold or set used for training and testing included data from a different set of species.The classification performance of different learning models is measured using a confusion matrix.This matrix allows correlations between the classifier's efficiency and the tests' outcomes to be more easily identified.The confusion matrix reveals the proper and improper categorization of positive samples and the proper and improper categorization of negative samples.The bamboo dataset in this study consisted of ten bamboo classes C 1 , C 2 ,. . ., C 10 and the classification output was in the form of a confusion matrix, which consisted of the prediction in terms of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN), reflecting the expected positive and negative classes of all input datasets.The representation of the confusion matrix is given in Table 4.They served as the fundamental criteria in assessing the system's performance by calculating the accuracy (ACC), precision (PR), recall (RC), F1-score (F1S), sensitivity (SEN), and specificity (SPC); all these were measured in the form of percentages only.Some other attributes, like the positive predictive value (PPV), negative predictive value (NPV), and area under the curve (AUC) of the receiver operator characteristic (ROC), are also helpful for performance evaluation purposes [39,77].The parameters are expressed mathematically in Table 5.

Precision (PR)
Precision is utilized as a performance metric as the ratio of true positives to total predicted positives in the data.
Utilized as a performance metric, it is the ratio of true positives to total (actual) positives in the data.The proportion of positive cases correctly predicted.
Specificity is utilized as a performance metric, as the ratio of true negatives to total negatives in the data.The proportion of negative cases correctly predicted.
The F1-score is the harmonic mean of the precision and recall, where an F1-score of 1 is the best (immaculate precision and recall) and a value of 0 is the worst.The proportion of true negatives in the total number of negative predictions.

Convolutional Neural Network
Deep learning is a branch of AI that enables computers to learn from data.Deep learning systems modify the input data by applying linear and nonlinear transformations several times to create abstract and discriminative representations.DL is based on techniques for the training of neural networks, which often include extremely large datasets and, in most instances, do not have any labels attached to them.When neural networks are used, the input is filtered via several hidden layers and nodes.These nodes process the input and send the results to the next tier.More layers in the network lead to a more accurate representation of the data.This technique simultaneously trains features and classifiers.The initial layers of the network perform the feature extraction and contain filter banks, non-linear transformation, and pooling layers.Meanwhile, the upper layers function as fully connected layers.This approach is often used by a number of object recognition systems for the purposes of feature extraction and categorization [29,41].
Convolutional neural networks (CNN), recurrent neural networks (RNN), and Qlearning are components of deep learning.The most common type of neural network is the CNN or ConvNet.It has a deep feed-forward layer architecture and a capacity to generalize more effectively than other neural networks with fully connected layers.It can distinguish abstract characteristics and recognize objects efficiently.A CNN has three primary layers: convolutional, pooling, and fully connected (FC) (Figure 7).The first two layers, the convolution and pooling layers, are responsible for fundamental and local feature extraction, while the third layer (FC) transfers the retrieved characteristics to the final class classification array.Figure 8 describes the step-by-step procedures of the convolution and max-pooling operations.A CNN may extract information about objects based on their color, shape, and texture when the datasets include complex images of objects with different backgrounds [32,39,78,79].Constructing a fully functional CNN model from scratch is a difficult and timeconsuming endeavor, requiring the careful consideration of layer construction and parameter choices.These tasks are non-intuitive, and a great deal of trial and error is needed to make a CNN efficient [80,81].However, scientists and researchers have spent years working on effective CNN models.CNNs like AlexNet, VGGNet, GoogleNet, and ResNet have shown superior performance to other non-intensive learning techniques in the Ima-geNet Large Scale Visual Recognition Challenge (ILSVRC).These models can categorize over 1000 objects, including aircraft, dogs, fish, plants, etc., and have been trained on over a million images.These models' categorization accuracy has reached a comparable level to human intellect.Because of their numerous benefits and desirable qualities, we opted to employ pre-trained models (CNNs) in our research [82].The summary of the above-mentioned pre-trained CNN models, such as their salient features, number of layers, activation function, total parameter size, minimized error rate, and input size, are summarized in Table 6.AlexNet: AlexNet is a well-known deep convolutional neural network (CNN) utilized in many different image identification and classification systems.It was initially proposed by Alex Krizhevsky et al. [83] in the ILSVRC 2012 contest.Its design architecture comprises eight main layers: the first five are convolutional layers and the other three are max-pooling layers.It also consists of other supporting layers: two normalization layers, two fully connected layers, and the last softmax layer.Every convolutional layer has convolutional filters and a nonlinear activation function, the Rectified Linear Unit (ReLu).It also addresses the issue of overfitting by simply using data augmentation and dropout techniques.The AlexNet CNN has achieved top-1 and top-5 error rates of 37.5% and 17.0%, better than prior state-of-the-art approaches [83].
VGGNet: The Visual Geometry Group (VGG) is a standard deep CNN architecture with numerous layers.It was initially proposed by Simonian et al. ( 2014) and took second place in the ILSVRC 2014 contest, where the top-1 and top-5 error rates were 23.7 and 6.8%, respectively.There are two distinct variants of this model, 'VGG 16' and 'VGG 19', with 16 and 19 layers, respectively.All of VGG's hidden layers use ReLU (a significant advancement from AlexNet that reduces the training time).Stacks of smaller convolutional kernels are a key element of this design [84].
GoogleNet: It is a 22-layer deep network that was the winner of the ILSVRC 2014 contest.The top-5 error of this model reaches 6.67%, which is close to the human image interpretation level [85].The design was influenced by LeNet, which was developed with the assistance of a specialized module known as 'Inception' [54].This module reduces the number of intermediate parameters by employing tiny convolutional filters (1 × 1), which reduces the model computing time.
ResNet: It is used as a backbone for many computer vision tasks.It was the winner of the ILSVRC 2015 contest, where it achieved a 3.57% error on the ImageNet test set [86].This model is available in various versions, such as 'ResNet18', 'ResNet50', and 'ResNet101', with 18, 50, and 101 layers, respectively.To create this network, a method known as skip connections was used.This skip connection links directly to the output, rather than passing through the training for the first few layers.In the case of image recognition, the first layer might work for straight-line detection, the second might work for rough texture identification, and the third might be used to recognize specific objects.

Transfer Learning
Transfer learning (TL) is an approach where the model uses prior information that it has already acquired to speed up the learning method of a new task.This type of training process accelerates the performance of the DL model using prior knowledge [32,87].Traditional ML approaches have the issue of data distribution asymmetry between training and test samples, which may be mitigated using TL.Kaya et al. [32] performed a comparative analysis of a CNN with TL using pre-trained models (AlexNet and VGG16) and obtained a better result for the CNN with TL than in the training of the CNN model from scratch [32].Similarly, Duong-Trung et al. [88] investigated the combination of TL and deep learning for the classification of medicinal plants, achieving classification accuracy of 98.7% within the reasonable limits of the training time [88].
Similarly, Ghazi et al. [57] used deep convolutional neural networks (DCNN) to analyze the many aspects that contribute to the performance of the different CNN architectures used for plant classification with the help of TL [57].Several effective and widely used DL architectures, namely AlexNet, VGGNet, and GoogLeNet, were employed for this purpose.TL was used to propose this system, which obtained the second rank in the PlantCLEF 2016.
The entire transfer learning strategy is shown in Figure 9.In this technique, all CNN layers except the final few FC layers are pre-trained.These layers can be adjusted based on the total bamboo species that are to be classified using the model.We employed popular pre-trained CNNs in our experiment, including AlexNet, VGG16, ResNet18, VGG19, and GoogleNet, which have previously been trained on millions of real-world images, including crafts, plants, mammals, vehicles, etc.Although the models were pre-trained on natural images, they have previously been applied to the categorization of numerous crops and medicinal plants [56,57,89]

Voting-Based Ensemble Learning
Studies have shown that several pre-trained CNN models, independently of the number of layers or the values of their parameters, predict widely varying classification outcomes on publicly available datasets [32,39,90].Whenever multiple models are available to solve the classification problem in various domains, in order to enhance the performance of this multimodal paradigm, 'ensemble' algorithms are most frequently used.Such domains include plant classification [91], citrus pest classification [92], maritime vessel classification [93], etc.The ensemble method has three different approaches to learning: bagging, boosting, and voting.One of the simplest and most straightforward ensemble learning methods is vote-based ensemble learning, in which many models' diagnoses are integrated using either hard or soft voting [69,94].In hard voting, the aggregator picks the class prediction most often among base models.The class with the most votes N c will be selected in this voting, and the class label ŷ will be determined by majority (plurality) voting on each classifier's model [95], as indicated in (5).
On the other hand, soft voting uses a probability approach.During this vote, we make predictions for the class labels based on the projected probability p for the classifiers C i , as seen in (6).
The majority multiclass voting (MajMulVot) algorithm takes 'n' classifier models (C 1 , C 2 , C 3 , • • • , C n ) and dataset DS, which have a multiclass label as an argument.The given dataset (DS) is split into training (TR) and test (TS) sets according to the k-fold cross-validation criterion.In addition, each training set is used to train all classifiers, and the resulting trained classifier models are created as TC 1 , TC 2 , TC 3 , • • • , TC n .The Ith trained classifier model (TC i ) predicts probabilities P i (L 1 ), P i (L 2 ), and P i (L 3 ), • • • , P i (L m ), for all multiclass labels L 1 , L 2 , L 3 , • • • , L m , respectively, for sample (S).The voting mechanism was developed using the estimated probabilities for each label.If the predicted probability of model (TC i ) is (P i (L 1 ) then the vote count of L 2 : VCL 2 (i) = 1 and so on.Further, the total vote shares of all the models for class labels L 1 , L 2 , • • • and L m are calculated separately using variables TVCL 1 , TVCL 2 , • • • , and TVCL m , respectively.Furthermore, we check the total vote share for each model label-wise.If (TVCL 1 > TVCL 2 , TVCL 3 , • • • , TVCL m ), then the predicted label via the MajMulVot algorithm is L 1 ; otherwise, L 2 or L 3 , • • • or L m as per the TVCL for the respective label (Figure 10).Likewise, this process was repeated for all of the test samples of the test set (TEST).The effectiveness of the MajMulVot algorithm was assessed by comparing the ground truths of all the test samples with their predicted labels.

ML/DL Models for Bamboo Benchmark Classification
In this study, the DL-based approach was benchmarked against the ML-based approach.In order to evaluate the merits of the ML approach in comparison to the DL method, five ML-based models were selected.In our experiment, five popular ML models, which were logistic regression (LR), decision trees (DT), random forest (RF), support vector machine (SVM), and K-nearest neighbors (KNN), were chosen.These models were trained and tested on the morphological (DS1) and texture (DS2) datasets.The proposed ensemble MajMulVot method has been applied to all five ML and benchmark DL models to increase the accuracy and performance of all base models.All ML-and DL-based MajMulVot methods can be expressed as ML-MajMulVot (LR, DT, RF, SVM, KNN) and DL-MajMulVot (AlexNet, VGG16, ResNet18, VGG19, GoogleNet), respectively.

Results
This section describes a total of five experimental protocols.The first experimental protocol, detailed in Section 4.1, shows the comparative performance of all five ML models on DS1 with and without k-fold cross-validation (K5-CV), along with MajMulVot.The second experimental protocol Section 4.2 is the comparative performance analysis of all five ML models on DS2 with and without K5-CV along with MajMulVot.Further, the third experimental protocol Section 4.3 describes and compares all five DL models' performance using K5-CV and MajMulVot on the IDCBS10 dataset.Furthermore, the fourth experimental protocol, in Section 4.4, considers all the DL models' comparative and performance evolution results on benchmark datasets with and without K5-CV and MajMulVot.Lastly, the experimental protocol in Section 4.5 is the comparative analysis of the best ML model for DS2 and the best DL model for DS3 and DS4.The most promising results of the DL-and ML-based models are emphasized in tables using bold font.

Performance Analysis of Five ML Models on Morphological Dataset DS1
We use the bamboo plant dataset DS1 to evaluate the performance of the machine learning classifiers in terms of the classification accuracy, F1-score, sensitivity, specificity, PPV, and NPV.In this protocol, we use five ML classifiers without modifying their default structure, i.e., with their default hyperparameters.These classifiers are LR, DT, RF, SVM, and KNN, implemented individually on each dataset.We use a balanced dataset so that each model result is free from bias, and, for training and testing, the dataset is randomly divided into proportions of 80:20 without overlap.We can see that the ML-KNN model gives the best result on DS1 without cross-validation.This result is in the form of ACC: 99.12, F1S: 99, SEN: 99.02, SPC: 99.48, PPV: 99, and NPV: 99.38.Similarly, all five ML models are applied on DS1, but, for the training and testing of these models, the data are divided using the k-fold cross-validation (K5-CV) technique, and the overall performance of all models is enhanced (Table 7).The best-performing model is ensemble ML-MajMulVot, with the results of ACC: 99.49, F1S: 99.24, SEN: 99.15, SPC: 99.97, PPV: 99.25, and NPV: 99.91.Table 7 shows the different performance metrics for all ML-based models on DS1.These performance metrics use the confusion matrices (TP, TN, FP, and FN) of the DS1 dataset.

Performance Analysis of All Five ML Models on Texture Dataset DS2
In this experimental protocol, first, we apply all five ML models to the texture dataset DS2 without using the cross-validation technique.We then use the same ML model with the K5-CV approach.In the DS2 dataset, the number of attributes is greater, so we use the co-relation technique and obtain a better result.We see that the ML-SVM model gives the best result on DS2 without K5-CV.This result is in the form of ACC: 82.72, F1S:88.79,SEN: 90.05, SPC: 97.67, PPV: 81.04, and NPV: 91.74.Similarly, all five ML models are applied on DS2, but, for the training and testing of the models, the data are divided using the k-fold cross-validation (K5-CV) technique, and the overall performance of all models is enhanced (Table 8).In addition, we have used MajMulVot, which improves the performance of all five ML models, and these results are depicted in Table 8.Here, we can see that the ML-MajMulVot model performs well on the DS2 dataset, and the performance increases compared to the ML models.The result of ML-MajMulVot on DS2 is as follows: ACC: 86.96, F1S: 85.93, SEN: 84.05, SPC: 95.08, PPV: 87.54, and NPV: 91.74.

Performance of All Five DL Models on IDCBS10 with and without (K5-CV) and MajMulVot
In this experimental protocol, we use the bamboo plant image dataset IDCBS10 to evaluate the training time, accuracy, and performance of all five CNN models with respect to layers.First, we compare the average training times required for these five DL models on the bamboo image datasets and their respective layer counts.The VGG19 network, consisting of 19 layers, has the longest average training time, which amounts to 14 min and 19 s.In contrast, the GoogleNet network, with 22 layers, has a minimum average training time of 8 min and 31 s.The impact of deep layers on the training duration of five convolutional neural networks (CNNs) is shown in Figure 11 using a line chart.The figure demonstrates varying training times as the number of layers increases.However, the duration of training for CNN models is contingent upon the quantity of parameters inside the pre-trained network.
Second, we evaluate the performance of all five CNN classifiers in terms of classification accuracy, F1-score, sensitivity, specificity, PPV, and NPV.In this protocol, first, we apply all five CNN models (AlexNet, VGG16, ResNet18, VGG19, GoogleNet) on dataset IDCBS10 without K5-CV and next with K5-CV and MajMulVot.The comparative result of both approaches is described in Table 9.Both approaches perform well with and without VGG16 compared to other DL models.In the first approach, i.e., without using any cross-validation, the VGG16 model performs well among all DL models, with the result as follows: ACC: 88.75, F1S: 91.66, SEN: 92.8, SPC: 93.05, PPV: 91.25, and NPV: 86.84.When we apply K5-CV, the accuracy of all DL models is enhanced.We can see that the DL-MajMulVot model performs well on the IDCBS10 bamboo dataset, giving the highest accuracy of 92.8%, as well as scaling up all performance metrics (ACC: 92.8, F1S: 93.75, SEN: 100, SPC: 90.47, PPV: 923.23,NPV: 100).Furthermore, this experimental methodology examines the influence of the CNN layers on the accuracy.Figure 12 illustrates the correlation between the mean accuracy of each convolutional neural network (CNN) model and the number of layers.Among the CNN models, VGG16, a 16-layer deep model, exhibits the best average accuracy of 89.50%, while the lowest average accuracy of 83.30% is recorded using a 22-layer deep network known as GoogleNet.The proposed DL-MajMulVot approach gives the highest accuracy of 92.8%.The relationship between the accuracy and layer count for the five convolutional neural networks (CNNs) is shown in Figure 12 using a line chart.This demonstrates an incongruous relationship between the precision and the increase in layers.
As we used transfer learning to build all five CNN models, we kept the weights of the initial layers fixed, and the final classification layers were fine-tuned.We attempted to create a balanced dataset by taking 200 images from each class.As shown in our training vs. validation loss diagram (Figure 13), data augmentation techniques like rotation and shifts were applied to prevent overfitting.The hyperparameters were as follows: train size 1600, val-size 400, test-size 400, batch size 16, learning rate 0.001, epochs 50, step size 3, and gamma 0.01.

Performance of All Five DL Models on DS4
In this experimental protocol, we use the benchmark dataset of rice images of [96] as DS4 to evaluate the performance of all five CNN classifiers in terms of classification accuracy, F1-score, sensitivity, specificity, PPV, and NPV.In this protocol, first, we apply all five CNN models (AlexNet, VGG16, ResNet18, VGG19, GoogleNet) on dataset DS4 without K5-CV and next with K5-CV and MajMulVot.The comparative result of both approaches is described in Table 10.The accuracy for all DL models is increased by 0.5 to 1% using K5-CV, and, with the help of DL-MajMulVot, the performance metrics are enhanced as follows: ACC: 99.5, F1S: 100, SEN: 100, SPC: 100, PPV: 100, and NPV: 100.

Comparative Analysis and Model Validation
As we have discussed above, the DL model extracts the features via the convolutional operation in the training phase itself, but when we try to build an ML model, we require structured data, or all the handcrafted features must be ready before the training.Thus, for a comparative analysis of all five ML and DL models, it is necessary to prepare the data in a suitable format, especially in the case of the ML models.Therefore, in this experimental protocol, first, we convert the benchmark image dataset (DS4-rice image dataset) ino a structured dataset so that we can apply all five ML models to this dataset (Table 11).Next, we perform a comparative analysis of all five ML models on DS1, DS2, and benchmark dataset DS4.Similarly, we apply all five DL models to the image datasets IDCBS10 and DS4.The above task is performed with and without 5-fold CV and MajMulVot.The result is different in both approaches, but one common outcome is that when we apply 5-fold CV and MajMulVot, our classification result increases, regardless of the ML or DL method.Table 12 describes the comparative analyses of all ML and DL models on DS1, DS2, IDCBS10, and DS4 with and without 5-fold CV and MajMulVot (Figure 14).We can see that the accuracy of the ML model on DS1 is the highest among all datasets, because DS1 was prepared using handcrafted features or taken from many public sources and domain expertise (Table 12).Therefore, due to the number of sources and human error, there is a greater likelihood that the data are biased, due to which the ML model on DS1 shows overfitting.At the same time, the other datasets, such as DS2, DS3, and DS4, contain data generated using images.Hence, these data are not biased, and the model does not experience the overfitting problem.
In this study, we have seen that the DL model gives the best result as compared to ML on both datasets, whether bamboo (IDCBS10) or rice (DS4).The enhancements in the accuracy and other performance metrics of the DL-VGG16 model compared to ML-SVM are as follows: ACC: 6.53%, F1S: 2.87, SEN: 2.25%, SPC: 4.62%, PPV: 10.21%, and NPV: 15.1%.Similarly, ML with 5-fold CV and MajMulVot also shows enhanced performance compared to that without MajMulVot.Finally, Table 12 shows that the proposed DL-based MajVot algorithm works well on both the bamboo and benchmark rice dataset and significantly optimizes the performance of the five DL models.

Discussion
Although bamboo classification is best performed by domain experts or traditional methods using real morphological attributes [96], this requires more time and expert knowledge in the bamboo field.In addition, the classification of species belonging to the same genera and the different environmental conditions are major challenges in morphological classification.To overcome this, some researchers use DNA barcoding and machine learning techniques for bamboo species classification [97], but it has the limitation of requiring a huge public dataset.Therefore, in this research paper, we present a novel method for bamboo classification using different real datasets of bamboo that were created by visiting many different locations where bamboo is grown.In earlier research, firstly, it was observed that, for plant classification, only the leaf and flower were used, but, in the case of bamboo, there is not a single public dataset for either the leaf or culm.Second, it was found that the performance of several CNN architectures was inconsistent regarding the number of deep layers and the parameters used.With this motivation, we first created a bamboo database for ten commercial species, each having 200 images.Second, we devised MajMul-Vot, an ensemble approach that uses the combined expertise of many models to improve the classification accuracy by a large margin.The proposed MajMulVot approach was applied to five deep learning models with different types of architecture, such as AlexNet, VGG16, ResNet18, VGG19, and GoogleNet; in the same way, we applied this MajMulVot algorithm to five machine learning models, namely LR, SVM, KNN, DT, and RF.We also adopted a 5-fold CV protocol that applied to both ML and DL models for training purposes.Compared to existing DL models, the DL-based MajMulVot method has demonstrated a notable improvement in mean performance on bamboo image datasets: ACC: 4.05%, F1S: 2.09%, SE: 7.2%, SP: 2.58%, PPV: 1.98%, and NPV: 13.16%.Similarly, ML-MajMulVot also enhanced the mean performance on two datasets as follows: DS1-ACC: 99.59, F1S: 99.24, SE: 99.45, SP: 99.97, PPV: 99.95, and NPV: 99.91; DS2-ACC: 86.96, F1S: 85.93, SE: 84.05, SP: 95.08, PPV: 87.54, and NPV: 91.74.In association with the number of layers in the DL models, inconsistencies in accuracy were detected; this leads to the idea that a large number of layers in the CNN model is not always needed for outstanding performance.Moreover, the DL models were compared to the ML models in terms of their overall performance; for this, we used four different datasets, of which DS1, DS2, and IDCBS10 were bamboo datasets, and one was a benchmark dataset.Our experimental results show that the proposed DL-based MajMulVot algorithm performs well on both bamboo and rice image datasets [98].

Challenges for Dataset Creation
Creating a dataset for bamboo is a very challenging task because, to create a morphological dataset, we must communicate with different stockholders, such as farmers, bamboo experts, artisans, and many other people who work in various bamboo industries.We used a questionnaire that was very helpful in creating this dataset.We also had to visit different locations where the variety of bamboo was grown in massive amounts.Thus, site selection was the major problem for us because the bamboo sites were far from the research institute.In addition, we had to collect images of bamboo in different environmental conditions so that the developed AI model was robust; we thus visited these sites many times in different seasons or weather conditions.

ML/DL Methods Challenges and Solutions
In the ML method, the feature creation is the major challenge.For this, we used LBP, GLCM, and many other techniques to create the features that would be used for training and testing.Further, data quality is also essential and can be affected by factors like noisy and incorrect data, overfitting, and underfitting.For this, we created non-biased, balanced data that could be further cleaned via the exploratory data analysis (EDA) method.On the other hand, the primary use of CNNs is automatic feature selection.As a result of their immutability for transfer, rotation, and translation, the features learned by CNNs are robust against the variations seen in the same class of data.Therefore, a CNN was chosen as the best classifier model for the bamboo classification system.Since DL works well on large datasets, it is desirable to adopt it.For this, we created a large image dataset so that we obtained a good result for the classification model.
Furthermore, choosing layers and their parameter values for the CNN architecture is very difficult.Instead of constructing a CNN from scratch and selecting relevant parameters like layers, filters, pooling techniques, and nonlinearity functions, opting for an existing pre-trained CNN network may save a significant amount of time and effort.According to several researchers, another benefit of pre-trained models is that the varied domains of datasets improve the CNN's capacity for generalization.Therefore, with the help of the transfer learning concept, we chose six pre-trained CNN models, AlexNet, VGGNet16, ResNet18, VGGNet19, and GoogleNet, for the implementation of our automated bamboo species classification system.Another problem in DL is overfitting, which happens mostly when the training dataset is limited.Thus, to overcome this problem, we used the most popular technique, data augmentation.
In many research articles, a CNN is introduced as a black box because it has complex layers of architecture and lacks theoretical support and internal comprehension.CNNs primarily present two problems: first, it is not apparent which weights or patterns can be learned using convolutional kernels; second, the architecture-related parameters are uncertain, including the layer depth, kernel size, and pooling approach.As a result, there has not been much work done to fully comprehend CNNs, although this is currently a popular area of investigation.Despite these restrictions, it is employed in many real-world applications, such as self-driving vehicles, medical image processing, etc.

Strengths, Weaknesses, and Future Studies
As discussed earlier, there was no public dataset available for bamboo; this was overcome by creating our real bamboo dataset.Second, relatively few single or multiple ML/DL models have been studied for bamboo classification.In this work, we adopted five DL-and five ML-based models with the MajMulVot algorithm to improve the classification performance on the bamboo dataset.In addition, the DL technique was compared to the ML technique, with the DL technique demonstrating remarkable performance.This illustrates that automated feature creation and selection (DL) is better than handcrafted feature-based approaches (ML).This hypothesis may be considered to save effort and time in the feature creation selection process.Furthermore, the final DL model with MajMulVot gave the best result when we compared and validated it using the UC Merced Land Use Dataset.This illustrated that the MajMulVot approach outperformed the others when the multiple model option was available.
MajMulVot is independent of the number of layers and the model architecture; however, if we require a time-efficient model that enhances the overall performance, then it is necessary to select all the base models with lightweight parameters.Furthermore, we applied this proposed model to a dataset of ten species with only 2000 samples.Our model will obtain more accurate results if more species datasets are available with more images to be sampled.

Benchmarking the Proposed Methodology with an Existing State-of-the-Art Method
We compared the best results of the proposed DL-MajMulVot algorithm to those of various prior works in this area.Section 2 of the literature review discusses the studies being compared.Our proposed DL-based MajMulVot approach significantly outperformed the other available techniques.

Conclusions
In this study, we first found that while manual recognition is the traditional method of bamboo classification, it is suitable only for those who are experts in the bamboo domain.Second, this process is often laborious and time-consuming.Therefore, to perform this task in an automated manner, we created two bamboo datasets: the first one is morphological, and the second one is image-based.In this study, we applied ML and DL models with a combination of majority multiclass voting (MajMulVot) ensemble algorithms to this dataset.The ML classification model performed well on the bamboo morphological dataset DS1 compared to all the other ML and DL models on other bamboo datasets such as texture (DS2) and image (IDCBS1) datasets.ML gave the highest accuracy of 99.95% with K5-CV and 99.12% without K5-CV on DS1.For DL models, we used the transfer learning approach to improve the robustness of the end model.The proposed ML and DL methods with the MajMulVot approach enhanced the bamboo classification's performance.The average accuracy of all ML-MajMulVot models on DS1 was improved by 0.37%, 1.04%, 0.16%, 0.77%, and 0.62% and on DS2 by 0.35%, 0.44%, 0.11%, 0.36%, and 1.84%.Similarly, the average accuracy of all DL-MajMulVot models was improved on DS3 by 0.75%, 0.75%, 1.01%, 0.3%, and 0.8%.Furthermore, the proposed DL-based MajMulVot algorithm was validated on the rice image dataset [98], where improvements of 1%, 0.5%, 0.5%, 1%, and 1.5% were seen in the accuracy against AlexNet, VGG16, ResNet18, VGG19, and GoogleNet, respectively.The proposed DL-based MajMulVot model performed well on both datasets, i.e., the bamboo image dataset and rice image dataset, significantly improving the classification accuracy of the five DL models.This shows that the DL-MajMulVot method is better in classifying bamboo than the ML method.

Figure 2 .
Figure 2. Publication status of plant classification with respect to traditional and machine learning methods.

Figure 4 .
Figure 4.Ten bamboo species that were collected from a natural scene by a digital camera.

Figure 5 .
Figure 5. Visualization of five types of correlation.

Figure 6 .
Figure 6.The model architecture for bamboo classification system with five-fold cross-validation.

Figure 8 .
Figure 8. Schematic illustration of convolution and max-pooling operation.

Figure 10 .
Figure 10.The working mechanism of the MajMulVot algorithm.

Figure 11 .
Figure 11.The effect of CNN layers on training time.

Figure 12 .
Figure 12.The effect of CNN layers on accuracy.

Figure 14 .
Figure 14.Performance comparison of ML/DL models with and without MajMulVot algorithm.

Table 1 .
Different feature extraction techniques used by ML models.

Table 2 .
Description of all morphological attributes with a possible range of values.
Figure 3. Bamboo species distribution based on its morphological attributes.

Table 3 .
Description of the bamboo images' attributes.

Table 5 .
The statistical expressions of the performance criteria of the model.
and have shown a remarkable level of performance.

Table 7 .
Performance of all five ML models on DS1 with/without K5-CV and MajMulVot.

Table 8 .
Performance of all five ML models on DS2 with/without K5-CV and MajMulVot.

Table 10 .
Performance of all five DL models on benchmark dataset (DS4) with/without 5-fold CV and MajMulVot.

Table 11 .
Performance of all five ML models on DS4 with/without K5-CV and MajMulVot.