Wild Chrysanthemums Core Collection: Studies on Leaf Identiﬁcation

: Wild chrysanthemums mainly present germplasm collections such as leaf multiform, ﬂower color, aroma, and secondary compounds. Wild chrysanthemum leaf identiﬁcation is critical for farm owners, breeders, and researchers with or without the ﬂowering period. However, few chrysanthemum identiﬁcation studies are related to ﬂower color recognition. This study contributes to the leaf classiﬁcation method by rapidly recognizing the varieties of wild chrysanthemums through a support vector machine (SVM). The principal contributions of this article are: (1) an assembled collection method and veriﬁed chrysanthemum leaf dataset that has been achieved and improved; (2) an adjusted SVM model that is offered to deal with the complex backgrounds presented by smartphone pictures by using color and shape classiﬁcation results to be more attractive than the original process. As our study presents, the proposed method has a viable application in real-picture smartphones and can help to further investigate chrysanthemum identiﬁcation.


Introduction
Chrysanthemum (Chrysanthemum sp.), which belongs to the Asteraceae family, is a floricultural crop with high economic value, and is second only to roses in the floral trade market [1]. Wild chrysanthemums are attractive because of their flower color, leaf shape and type, and their secondary compounds, which are the main characteristics for discerning the distinctions between floral crops for horticultural studies [1,2]. Chrysanthemum boreale, a diploid species, displays small yellow flowers and shows a variety of morphological characteristics in natural growth in Korea [2]. Wild C. boreale is present in numerous habitats, including in Gangwon-do, Gyeonggi-do, Gyeongsangbuk-do, Gyeongsangnam-do, and Jeollabuk-do in Korea [2]. C. indicum consists of diploid, tetraploid, or hexaploid populations, presents small yellow ray florets, and has been known to be used in treatments of hypertension, inflammation, and respiratory disorders [2]. C. indicum, which has been studied for its flowering time, yields, and bioactive compounds, provides morphological leaf shapes in wild habitats [2]. In Korea, the wild C. indicum is located in Incheon, Jeollabuk-do, Gyeongsangbuk-do, Chungcheongbuk-do, and Jeju-do. C. makinoi, a diploid wild species, has been mentioned as one of the progenitors of the cultivated hexaploid chrysanthemum varieties that are presently produced worldwide [2]. The wild C. makinoi is raised in the sedimentary rock region of Gangwon-do and Daegu in Korea. C. zawadskii is the major and most popular variety of wild chrysanthemums, which has white ray florets or white-purple flowers in Korea [2]. The chromosome level of C. zawadskii ranges from diploid to octoploid. In Korea, C. zawadskii is found in Gangwon-do, Gyeonggi-do, Gyeongsangbuk-do, and Gyeongsangnam-do. Aster spathulifolius belongs to a chronic herb of the Asteraceae family, and is grown in the wild in the seaside regions of Korea [2]. It has vivid white or light blue ray florets and thick leaves. The chromosome A. spathulifolius is the diploid number. In Korea, it is found in Ulleungdo, Busan, and Jeju-do.
Chrysanthemum classification systems have been organized by the International Union for the Protection of New Varieties of Plants (UPOV). However, adjusting the morphological characteristics is time-consuming and only a few classification systems of chrysanthemums have been investigated. Leaf shape, which varies in different wild chrysanthemums, is one of the main classification characteristics [3]. However, phenotype identification with the handle method has some drawbacks: first, the common methods are not the automatic investigation in which the floral experts or owners inspect the handle in the wild; second, there are many leaf shapes of chrysanthemums, and the number of individual shapes that belong to the same type is erroneous due to errors that occur when using the human eye to identify leaf shape. Therefore, proper leaf shape identification tools require time, investigation, and application.
Machine learning (ML) is a golden key that allows computers to learn various features to perform a given task automatically [4,5]. Formerly, various automatic recognition systems were conducted by applying different ML algorithms [6] using Support Vector Machines (SVM) and spectral vegetation [7]. A framework was applied to image processing methods and ML to classify five different plant leaf diseases [8]. All results of automatic recognition techniques illustrated the identification with accuracy varying from 83% to 94%. Computer-based image analysis can be implemented to extract morphological features, such as leaves, for botanical identification [9]. Our solution to the taxonomic problems required extensively-trained specialists that used visual identification as the primary method for this approach. The study used 40 leaves from 30 trees, as well as shrub species from 19 different families, to compare scanners and mobile phone pictures based on color, shape, and texture [10]. All devices were compared by three ML algorithms (adaptive boosting-AdaBoost, random forest, support vector machine (SVM)) and an artificial neural network model (deep-learning) [10]. Computer vision identified species efficiently (higher than 93%), with similar results achieved for both mobile phones and scanners [10]. The algorithms' SVM, random forest, and deep-learning effectively accomplished more than the AdaBoost [10]. The smartphone-assisted disease diagnosis was applied to identify crop diseases by deep-learning methods. Based on a public dataset of 54,306 images of plant leaves, which showed both diseased and healthy samples to collect under controlled conditions, a deep convolutional neural network was trained to identify 14 crop species and 26 diseases (or absence thereof). The trained model obtained an accuracy of 99.35% on a held-out test set. The deep-learning models were useful to the increasingly large and publicly available image datasets that present a clear path toward smartphone-assisted crop disease diagnosis on a massive global scale [11].
In this study, we classify five chrysanthemum species to identify five leaf shapes using an SVM tool. We hope that this tool is useful for the identification of wild chrysanthemum species within flower recognition. We emphasize that the shape-recognition tool of chrysanthemum leaves might be applied for chrysanthemum image identification with the rapidly obtained predicted data of the cultivar characteristics that correspond to the cultivar image system. Moreover, tools for input data are needed to upgrade real-time identification for the next iteration. In addition, it could be used to recognize individual species to improve germplasm material for chrysanthemum breeding. Figure 1 describes the five main processes of the leaf dataset, which include: (1) data collection, (2) data partitioning, (3) feature extraction, (4) feature engineering, and (5) prediction model. Figure 1 describes the five main processes of the leaf dataset, which include: (1) data collection, (2) data partitioning, (3) feature extraction, (4) feature engineering, and (5) prediction model.

Plant Material and Data Collection
We collected four wild chrysanthemums, including C. borale, C. indicum, C. makinoi, and the common C. zawadskii ( Figure 2). A. spathulifolius was used as a wild type belonging to the Asteraceae family ( Figure 2). In this study, the dataset collection was performed in greenhouse shooting using an LG Q52 smart phone with a main quad camera including: 48 MP, f/1.8, (wide), 1/2.0", 0.8 µm, PDAF; 5 MP, f/2.2, 115° (ultrawide), 1/5.0", 1.12 µm; 2 MP, f/2.4, (macro); and 2 MP, f/2.4, (depth). The dataset contains a total of 1317 images with the same shooting condition at 9 am in a greenhouse at the Chrysanthemum Research Institute of Sejong University, Korea.

Plant Material and Data Collection
We collected four wild chrysanthemums, including C. borale, C. indicum, C. makinoi, and the common C. zawadskii ( Figure 2). A. spathulifolius was used as a wild type belonging to the Asteraceae family ( Figure 2). In this study, the dataset collection was performed in greenhouse shooting using an LG Q52 smart phone with a main quad camera including: 48 MP, f/1.

Data Partitioning
A total of 1317 images for 5 types of leaves are collected at the end of the data collection process. A detailed description of the dataset used in this study is described in Table  1. The original dataset, which contains 1068 images that present 5 species of examined chrysanthemums, is divided into a training set (80%) and a validation set (20%). Finally, the testing set, with an additional 249 images, is collected to evaluate model performance.

Feature Extraction
Before the feature extraction process, various preprocessing methods are implemented to reduce noises and improve the feature extraction process. Firstly, gaussian blur is implemented on the original datasets to smooth the image. Otsu's method is then applied to perform adaptive thresholding on the blurred images [12]. A morphological closing operation is conducted in order to fill the small holes that may appear after the thresholding process. Finally, the shape boundary is extracted using the contours.
By investigating the dataset, we conclude that it is challenging to perform leaf classification using only the image because some types of leaves appear identical at first look to the human eye (C. indicum and C. makinoi). Therefore, additional features, such as shape, color, and texture, need to be extracted from the original images in order to support the classification process. In total, there are 17 features extracted for the training process. The detailed description for each feature is described as follows.

Data Partitioning
A total of 1317 images for 5 types of leaves are collected at the end of the data collection process. A detailed description of the dataset used in this study is described in Table 1. The original dataset, which contains 1068 images that present 5 species of examined chrysanthemums, is divided into a training set (80%) and a validation set (20%). Finally, the testing set, with an additional 249 images, is collected to evaluate model performance.

Feature Extraction
Before the feature extraction process, various preprocessing methods are implemented to reduce noises and improve the feature extraction process. Firstly, gaussian blur is implemented on the original datasets to smooth the image. Otsu's method is then applied to perform adaptive thresholding on the blurred images [12]. A morphological closing operation is conducted in order to fill the small holes that may appear after the thresholding process. Finally, the shape boundary is extracted using the contours.
By investigating the dataset, we conclude that it is challenging to perform leaf classification using only the image because some types of leaves appear identical at first look to the human eye (C. indicum and C. makinoi). Therefore, additional features, such as shape, color, and texture, need to be extracted from the original images in order to support the classification process. In total, there are 17 features extracted for the training process. The detailed description for each feature is described as follows.

Shape Features
Shape features are an important indicator that can be used to differentiate between different species. This section extracts various shape-related features, such as physiological width, length, area, perimeter, aspect ratio, rectangularity, and circularity, from the extracted contours. The description for those shape features is explained in Table 2. Even though leaves from two different species can have the same physical width/length, other shape feature information, such as perimeter and area, can also be used to differentiate them. Table 2. Description and explanation for shape features extracted from the dataset.

Name Sample Explanation
Physiological width (w)

Shape Features
Shape features are an important indicator that can be used to differentiate between different species. This section extracts various shape-related features, such as physiological width, length, area, perimeter, aspect ratio, rectangularity, and circularity, from the extracted contours. The description for those shape features is explained in Table 2.
Even though leaves from two different species can have the same physical width/length, other shape feature information, such as perimeter and area, can also be used to differentiate them. Table 2. Description and explanation for shape features extracted from the dataset.

Name Sample Explanation
Physiological width ( ) Physiological width of the leaf.
Physiological length ( ) Physiological length of the leaf.
Area ( ) Area of the leaf.
Perimeter ( ) Total curve width of the leaf.
Aspect ratio The ratio of the leaf's width to its length.

Rectangularity
( ) The variations of width and length with respect to the area.

Circularity
Measures the similarity of the leaf to a perfect circle. is the perimeter of square.

Color Features
Color is an important feature that can distinguish objects with very similar geometric properties with different colors and vice versa. Therefore, this section extracts various color features from the most common RGB color space that contains red, green, and blue Shape features are an important indicator that can be used to differentiate between different species. This section extracts various shape-related features, such as physiological width, length, area, perimeter, aspect ratio, rectangularity, and circularity, from the extracted contours. The description for those shape features is explained in Table 2.
Even though leaves from two different species can have the same physical width/length, other shape feature information, such as perimeter and area, can also be used to differentiate them. Aspect ratio The ratio of the leaf's width to its length.

Rectangularity
( ) The variations of width and length with respect to the area.

Circularity
Measures the similarity of the leaf to a perfect circle. is the perimeter of square.

Color Features
Color is an important feature that can distinguish objects with very similar geometric properties with different colors and vice versa. Therefore, this section extracts various color features from the most common RGB color space that contains red, green, and blue Shape features are an important indicator that can be used to differentiate between different species. This section extracts various shape-related features, such as physiological width, length, area, perimeter, aspect ratio, rectangularity, and circularity, from the extracted contours. The description for those shape features is explained in Table 2.
Even though leaves from two different species can have the same physical width/length, other shape feature information, such as perimeter and area, can also be used to differentiate them. Aspect ratio The ratio of the leaf's width to its length.

Rectangularity
( ) The variations of width and length with respect to the area.

Circularity
Measures the similarity of the leaf to a perfect circle. is the perimeter of square.

Color Features
Color is an important feature that can distinguish objects with very similar geometric properties with different colors and vice versa. Therefore, this section extracts various color features from the most common RGB color space that contains red, green, and blue

Shape Features
Shape features are an important indicator that can be used to differentiate between different species. This section extracts various shape-related features, such as physiological width, length, area, perimeter, aspect ratio, rectangularity, and circularity, from the extracted contours. The description for those shape features is explained in Table 2.
Even though leaves from two different species can have the same physical width/length, other shape feature information, such as perimeter and area, can also be used to differentiate them.

Name Sample Explanation
Physiological width ( ) Physiological width of the leaf. Aspect ratio The ratio of the leaf's width to its length.

Rectangularity
( ) The variations of width and length with respect to the area.

Circularity
Measures the similarity of the leaf to a perfect circle. is the perimeter of square.

Color Features
Color is an important feature that can distinguish objects with very similar geometric properties with different colors and vice versa. Therefore, this section extracts various color features from the most common RGB color space that contains red, green, and blue The ratio of the leaf's width to its length. Rectangularity The variations of width and length with respect to the area.
Measures the similarity of the leaf to a perfect circle. P 2 is the perimeter of square.

Color Features
Color is an important feature that can distinguish objects with very similar geometric properties with different colors and vice versa. Therefore, this section extracts various color features from the most common RGB color space that contains red, green, and blue components with a range of values from 0 to 255. Initially, individual red, green, and blue color components were separated from the original image. Each component's mean and standard deviation are then computed and used as the color features.
At the end of this process, a total of six color features are extracted: mean red, mean green, mean blue, standard deviation red, standard deviation green, and standard deviation blue.

Texture Features
The Zernike and Haralick features [13] were first extracted using the Mahotasimage processing library to compute the texture features used in this study. Texture features, which provide surface characteristics and the appearance of an object found in an image, are crucial for many computer vision topics. Texture features can be extracted using various approaches, such as structural, statistical, and model-based techniques [14]. The most common method is the Gray Level Co-occurrence Matrix (GLCM), which offers 13 statistical points of information on the spatial relationships of pixels in an image [15]. Various important textural features can be computed based on the GLCM to expose details about the image content. Table 3 describes four textural features that are essential for leaf classification in this study. Table 3. Description and explanation for texture features extracted from the dataset.

Name Sample Explanation
Contrast Measures the intensity or gray-level variations between the reference pixel and its neighbor.
Measures the gray-tone linear dependencies in the image.
Measures the local homogeneity of an image.
The quantity of energy that is permanently lost to heat every time a reaction/physical transformation appears.

Feature Engineering
Seventeen individual features were collected for each image from the dataset after the feature extraction process. However, there was a huge different in value range of each feature. Therefore, StandardScaler, which is a crucial feature in engineering techniques, was implemented to standardize the range of functionality of each extracted feature by the mean of 0 and standard deviation of 1.

Model Description
A Support vector machine (SVM) is a supervised learning algorithm mostly used for classification, which has been proven to fit the training dataset well and to accurately classify unseen data [16]. SVM takes training samples and produces an optimal hyperplane that separates all training samples into different classes efficiently and can be used to classify new data points. In two dimensions, the hyperplane is a simple line.
As explained in the previous section, 17 features are extracted for each image, which increases the complexity and leads to a nonlinearity problem. Therefore, the kernel trick is introduced to SVM to solve the nonlinear dataset by mapping it to a higher dimensional space. Available kernel functions are linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid. This study implements the Radial Basis Function (RBF) kernel because it is localized and has a finite response along the complete x-axis [17]. The equation for RBF is supplied below.
where x i − x j 2 is the Euclidean distance between x i and x j , and γ represents the Gamma parameter.

Experimental Results and Discussion
The SVM model has some hyperparameters, which must be determined before training because a set of optimal parameters can significantly improve the model's performance.
In order to determine the optimal hyperparameter values, a grid search technique is first implemented on all possible hyperparameter combinations. After that, each set of hyperparameters is used to train the SVM model in order to find the one that helps the model obtain the highest performance. Finally, a 5-fold cross-validation, which randomly splits the training data into five non-overlapping subsets of equal size, is implemented to further improve the model's robustness. For each iteration, 4 subsets are utilized for training and the remaining subset is applied to evaluate the model. The final output is the mean value of the five folds. Table 4 illustrates the value ranges for each hyperparameter and the optimized parameter value after performing the grid search approach.

Testing Results
In this section, the model performance is tested on a manually collected testing set that was not used during the training process. Figure 4 visualizes the confusion matrix between the true and predicted labels for each leaf type. Overall, the model showed a high rate of classification accuracy of over 80% on A. spathulifolius, C. boreale, C. indicum, and C. zawadskii. However, it displayed poor performance for the C. makinoi class with an accuracy of 57%. It incorrectly predicted 15 C. makinoi images as A. spathulifolius and 7 C. makinoi images as C. boreale, which may be due to the similarities in shape and color between the C. makinoi leaf and those of the C.

Testing Results
In this section, the model performance is tested on a manually collected testing set that was not used during the training process. Figure 4 visualizes the confusion matrix between the true and predicted labels for each leaf type. Overall, the model showed a high rate of classification accuracy of over 80% on A. spathulifolius, C. boreale, C. indicum, and C. zawadskii. However, it displayed poor performance for the C. makinoi class with an accuracy of 57%. It incorrectly predicted 15 C. makinoi images as A. spathulifolius and 7 C. makinoi images as C. boreale, which may be due to the similarities in shape and color between the C. makinoi leaf and those of the C. boreale and A. spathulifolius leaves.
For each coefficient value, the grey field of the validation score is formed based on the lowest and highest scores of the 10 folds.

Testing Results
In this section, the model performance is tested on a manually collected testing set that was not used during the training process. Figure 4 visualizes the confusion matrix between the true and predicted labels for each leaf type. Overall, the model showed a high rate of classification accuracy of over 80% on A. spathulifolius, C. boreale, C. indicum, and C. zawadskii. However, it displayed poor performance for the C. makinoi class with an accuracy of 57%. It incorrectly predicted 15 C. makinoi images as A. spathulifolius and 7 C. makinoi images as C. boreale, which may be due to the similarities in shape and color between the C. makinoi leaf and those of the C. boreale and A. spathulifolius leaves. We then visualize the model performance for two different scenarios, including the frontside and backside of the leaves ( Figure 5). The results prove that the model performed well on A. spathulifolius for both scenarios, most likely because the leaf shape of A. spathulifolius is different from other types. Conversely, the model predicted the backside We then visualize the model performance for two different scenarios, including the frontside and backside of the leaves ( Figure 5). The results prove that the model performed well on A. spathulifolius for both scenarios, most likely because the leaf shape of A. spathulifolius is different from other types. Conversely, the model predicted the backside of the C. zawadskii and C. boreale significantly better than the frontside with a confidence rate of 90.3% and 62.6%, respectively. For the C. makinoi, the backside shows higher confidence than the frontside at 69%. Finally, for the C. indicum, the model showed higher confidence of 90% on the frontside compared to 63.4% on the backside.
In roses, the use of Convolutional Neural Network (CNN) was identified and classified in the flower's characteristics [18]. In the work of flower identification, ten species, Anthurium, Bougainvillea, Dianthus, Euphorbia, Ixora, Jetropha, Petunia, Phlox, Perwinkle, and Tecoma, were classified using Faster-Recurrent Convolutional Neural Network (Faster-RCNN) and Single Short Detector (SSD) for flower characteristics [19]. Furthermore, the CNN application is used to classify 43 different plant flowers in smart phone pictures, with up to 90% accuracy [20].
The featured plant leaf extraction, such as gas analysis, uses healthy and dead leaves to identify the plant leaves using CNN. When using the Hue, Saturation, Value (HSV) model, the accuracy is almost 98% in the proposed model [34]. The smartphone application was developed to identify four kinds of herbs, fruits, and vegetable plants available in Sri Lanka using leaf features such as shape, texture, and color [35]. Five machine learning algorithms, such as SVM, Multilayer Perceptron, Random Forest, K-Nearest Neighbors, and Decision Tree algorithms, are 85.82%, 82.88%, 80.85%, 75.45%, and 64.39% accurate, respectively [35]. SVM and Multilayer Perceptron algorithms exhibited satisfactory performance according to the results [35].
However, chrysanthemum leaf identification has not received enough attention from the research community. Based on our study, the proper tool for wild chrysanthemum identification was applied and our results show the proper performance. of the C. zawadskii and C. boreale significantly better than the frontside with a confidence rate of 90.3% and 62.6%, respectively. For the C. makinoi, the backside shows higher confidence than the frontside at 69%. Finally, for the C. indicum, the model showed higher confidence of 90% on the frontside compared to 63.4% on the backside. In roses, the use of Convolutional Neural Network (CNN) was identified and classified in the flower's characteristics [18]. In the work of flower identification, ten species, Anthurium, Bougainvillea, Dianthus, Euphorbia, Ixora, Jetropha, Petunia, Phlox, Perwinkle, and Tecoma, were classified using Faster-Recurrent Convolutional Neural Network (Faster-RCNN) and Single Short Detector (SSD) for flower characteristics [19]. Furthermore, the CNN application is used to classify 43 different plant flowers in smart phone pictures, with up to 90% accuracy [20].
The featured plant leaf extraction, such as gas analysis, uses healthy and dead leaves to identify the plant leaves using CNN. When using the Hue, Saturation, Value (HSV) model, the accuracy is almost 98% in the proposed model [34]. The smartphone application was developed to identify four kinds of herbs, fruits, and vegetable plants available in Sri Lanka using leaf features such as shape, texture, and color [35]. Five machine learning algorithms, such as SVM, Multilayer Perceptron, Random Forest, K-Nearest Neighbors, and Decision Tree algorithms, are 85.82%, 82.88%, 80.85%, 75.45%, and 64.39% accurate, respectively [35]. SVM and Multilayer Perceptron algorithms exhibited satisfactory performance according to the results [35].
However, chrysanthemum leaf identification has not received enough attention from the research community. Based on our study, the proper tool for wild chrysanthemum identification was applied and our results show the proper performance.

Conclusions
In this study, we classify five wild chrysanthemum leaf-shape types using a handle smartphone-collected dataset. The confidence percentage of A. spathulifolius is highest for both scenarios. For the remaining four wild chrysanthemums, differences between the frontside or backside are used to identify the leaves. Overall, we recognize that our model is proper for wild chrysanthemum leaf identification. However, this is the first study of chrysanthemum leaf identification, and we plan to extend the proposed model to identify over 200 wild chrysanthemum individuals. In the future, we hope more concentration can be applied to the development of leaf identification systems on smart phone devices because these devices are useful and easily accessible to farm owners, breeders, and researchers.