Variety Identification of Single Rice Seed Using Hyperspectral Imaging Combined with Convolutional Neural Network

The feasibility of using hyperspectral imaging with convolutional neural network (CNN) to identify rice seed varieties was studied. Hyperspectral images of 4 rice seed varieties at two different spectral ranges (380–1030 nm and 874–1734 nm) were acquired. The spectral data at the ranges of 441–948 nm (Spectral range 1) and 975–1646 nm (Spectral range 2) were extracted. K nearest neighbors (KNN), support vector machine (SVM) and CNN models were built using different number of training samples (100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500 and 3000). KNN, SVM and CNN models in the Spectral range 2 performed slightly better than those in the Spectral range 1. The model performances improved with the increase in the number of training samples. The improvements were not significant when the number of training samples was large. CNN model performed better than the corresponding KNN and SVM models in most cases, which indicated the effectiveness of using CNN to analyze spectral data. The results of this study showed that CNN could be adopted in spectral data analysis with promising results. More varieties of rice need to be studied in future research to extend the use of CNNs in spectral data analysis.


Introduction
Rice is one of the most common food crops planted in China and some other countries. Rice seeds are harvested as food to consumers and as seed for the following sowing season. Due to the vast planting areas across villages, towns, countries and continents, different varieties of rice seeds are developed to adapt to changes of growth environments (climates, soil, water, etc.) and improve nutrition and flavors. With the development of breeding techniques, more varieties of rice seeds are brought into the market, and the purity of rice seeds is critical for planters and consumers.
Different varieties of rice seed vary in physical and chemical characteristics, growth performances and stress tolerance. Rice seed varieties can be identified by inspecting the appearance characteristics such as size, color, shape and texture, or by determining the quality attributes such as protein, starch and aroma. Traditional methods for rice variety identification, like High Performance Liquid Chromatography (HPLC) and Gas Chromatography-Mass Spectrometer (GC-MS), are either time consuming or expensive, and generally are only applied on a small number of sampled seeds [1,2]. Thus, developing a rapid method to identify rice seed varieties is needed.
Hyperspectral imaging is a technique integrating visible/near-infrared spectroscopy and imaging to acquire spectral and spatial information of the samples simultaneously. A hyperspectral image

Hyperspectral Imaging Acquisition
Hyperspectral image acquisition was conducted on a visible near-infrared hyperspectral imaging system covering the spectral range of 380-1030 nm and a near-infrared hyperspectral imaging system covering the spectral range of 874-1734 nm.
The two systems were integrated in one platform. In the platform, two 150 W tungsten halogen lamps (3900 Lightsource, Illumination Technologies Inc., Elbridge, NY, USA) were used for illumination of the two systems; a conveyer belt driven by a stepper motor (Isuzu Optics Corp., Taiwan, China) was used for sample motion for line scan. The two systems were controlled by two data acquisition and preprocessing software (Spectral Image-V10E and Spectral-Image-Xenics 17E, Isuzu Optics Corp., Taiwan, China).
To acquire clear and nondeformable hyperspectral images, the moving speed of the conveyer belt, the exposure time of the camera and the height between the lens of the camera and the samples for the visible/near-infrared hyperspectral imaging system were adjusted to 2.3 mm/s, 0.04 s and 22.2 cm, respectively. The moving speed of the conveyer belt, the exposure time of the camera and the

Hyperspectral Imaging Acquisition
Hyperspectral image acquisition was conducted on a visible near-infrared hyperspectral imaging system covering the spectral range of 380-1030 nm and a near-infrared hyperspectral imaging system covering the spectral range of 874-1734 nm.
The two systems were integrated in one platform. In the platform, two 150 W tungsten halogen lamps (3900 Lightsource, Illumination Technologies Inc., Elbridge, NY, USA) were used for illumination of the two systems; a conveyer belt driven by a stepper motor (Isuzu Optics Corp., Taiwan, China) was used for sample motion for line scan. The two systems were controlled by two data acquisition and preprocessing software (Spectral Image-V10E and Spectral-Image-Xenics 17E, Isuzu Optics Corp., Taiwan, China).
To acquire clear and nondeformable hyperspectral images, the moving speed of the conveyer belt, the exposure time of the camera and the height between the lens of the camera and the samples for the visible/near-infrared hyperspectral imaging system were adjusted to 2.3 mm/s, 0.04 s and 22.2 cm, respectively. The moving speed of the conveyer belt, the exposure time of the camera and the height between the lens of the camera and the samples for the near-infrared hyperspectral imaging system were adjusted to 18.8 mm/s, 3 ms and 20 cm, respectively.

Hyperspectral Image Correction
The acquired hyperspectral images should be corrected to reduce the influence of dark current [16]. Raw hyperspectral images can be corrected by the following equation: where I c is the corrected image, I raw is the raw image, I dark is the dark reference image acquired by covering the camera lens completely with its opaque cap. I white is the white reference image acquired by using a white Teflon tile with nearly 100% reflectance.

Spectra Extraction and Preprocessing
Spectral information was extracted from the hyperspectral images after image correction. Due to limitation of the hyperspectral imaging systems, the reflectance values of the first several bands and the last several bands are noisy and cannot be trusted. Thus, only trusted range of spectrum was used for further analysis. The spectral range of 441-949 nm was selected for visible/near-infrared hyperspectral images and the spectral range of 975-1646 nm was selected for near-infrared hyperspectral images. Wavelet transform (WT) is an efficient method for spectra preprocessing [17]. In this study, WT using Daubechies 8 as basis function and a decomposition level 3 was applied as a preprocessing step to obtain smoothed spectra. After preprocessing of the spectral dimension, image segmentation was applied on the spatial dimensions. Generally, reflectance values of the black background were close to zero, so pixels of rice seeds can be easily segmented by thresholding. Then, each group of connected foreground pixels is regarded as a rice seed. At last, for each rice seed, all pixel-wise spectra were averaged to one spectrum for further analysis.

Support Vector Machine
Support vector machine (SVM) is a supervised pattern recognition method [18]. Due to the effectiveness of handling linear and nonlinear data efficiently, SVM has been widely used in spectral data analysis. SVM maps the original data into higher dimensional spaces and constructs a hyperplane or a set of hyperplanes to maximize the distance between the nearest samples of different classes. The selection of an appropriate kernel function is essential in SVM and affects the performance of SVM. Radial basis function (RBF) is a widely used kernel function with the ability to deal with nonlinear data efficiently. To conduct SVM using RBF kernel, penalty coefficient C and the kernel parameter g should be determined to obtain optimal performances, and a grid-search procedure is generally used to determine C and g. In this study, a grid-search procedure was applied to optimize C and g, and the searching range of C and g were from 2 −8 to 2 8 .

K-Nearest Neighbor
K-nearest neighbor (KNN) is a widely used pattern recognition method [19]. It calculates the distances between an unknown sample and the samples in the predefined training set. The number of nearest samples (k) to the unknown sample is manually defined. The category of the unknown sample is determined by the categories of its k nearest samples in the training set. The determination of k is crucial for KNN. In this study, k was optimized by comparing KNN models using k from 3 to 20 with a step of 1.

Convolutional Neural Network
Convolutional Neural Network Architecture The architecture of our convolutional neural network (CNN) is shown in Figure 2. We adapted the design of VGGNet [20] for one-dimensional spectra inputs. Patterns in spectral curves share some common characteristics with image patterns. For example, peaks and minimums in spectral curve is similar to edges in images. VGGNet is chosen for its high performance in image classification tasks and its modular design make it easy to be modified and extended. The architecture of our convolutional neural network (CNN) is shown in Figure 2. We adapted the design of VGGNet [20] for one-dimensional spectra inputs. Patterns in spectral curves share some common characteristics with image patterns. For example, peaks and minimums in spectral curve is similar to edges in images. VGGNet is chosen for its high performance in image classification tasks and its modular design make it easy to be modified and extended. In Figure 2, an example input of 200 spectral bands was used to show the output size of each block. There are five main blocks in the architecture. The first four blocks are convolutional blocks (Conv Blocks), each of which consists of two consecutive convolutional layers followed by a max pooling layer. As blocks going deeper, the number of convolutional filters is doubled (starting from 16 and ending with 128). All convolutional layers use a kernel size of 3, stride of 1 and padding of 1. A convolutional layer has local connections to its input and can be trained to learn local patterns. By chaining convolutional layers together, deeper layers could have connection to larger part of the raw input. Thus, different layers "see" the raw input and learn features at different levels. The last block is a block of a fully connected layer (FC Block). The fully connected layer is used to learn combinations of features extracted by the convolutional layers. At last, a dropout layer and a dense layer were added as the output layer [21].
The original VGGNet architecture used the rectified linear unit (ReLU) as activation function. The exponential linear unit (ELU) was reported to speed up learning and outperform ReLU [22]. In our experiment, ELU activation showed better performance than ReLU activation with batch normalization [23]. Thus, ELU is utilized in the architecture. ELU function is defined as In Figure 2, an example input of 200 spectral bands was used to show the output size of each block. There are five main blocks in the architecture. The first four blocks are convolutional blocks (Conv Blocks), each of which consists of two consecutive convolutional layers followed by a max pooling layer. As blocks going deeper, the number of convolutional filters is doubled (starting from 16 and ending with 128). All convolutional layers use a kernel size of 3, stride of 1 and padding of 1. A convolutional layer has local connections to its input and can be trained to learn local patterns. By chaining convolutional layers together, deeper layers could have connection to larger part of the raw input. Thus, different layers "see" the raw input and learn features at different levels. The last block is a block of a fully connected layer (FC Block). The fully connected layer is used to learn combinations of features extracted by the convolutional layers. At last, a dropout layer and a dense layer were added as the output layer [21].
The original VGGNet architecture used the rectified linear unit (ReLU) as activation function. The exponential linear unit (ELU) was reported to speed up learning and outperform ReLU [22]. In our experiment, ELU activation showed better performance than ReLU activation with batch normalization [23]. Thus, ELU is utilized in the architecture. ELU function is defined as The output of the CNN is followed by a softmax function to produce values in the range [0, 1] as classification confidence scores. A classification loss is then calculated by comparing the confidence scores and true labels of samples. The softmax function and the loss function are defined as follows, where z denotes the output of CNN, i denotes a sample, j denotes a class and K denotes the total number of classes.

CNN Training
Before training, a normalization step was performed by subtracting the mean and dividing by the standard deviation. The mean and the standard deviation of training data were saved for preprocessing of test data. The weights of the CNN were initialized using the strategy introduced by [24]. The training was carried out by optimizing the softmax cross entropy loss using the Adam algorithm [25]. During training, the learning rate (η) was gradually reduced according to the following equation: where η 0 denotes the initial learning rate, t denotes the number of epochs, and k controls the speed of learning rate reduction. A grid-search was applied to find the best combination of hyperparameters. Finally, the batch size was set to 256, the α in ELU was set to 1.0 and the dropout ratio was set to 0.5. The CNN was trained for 200 epochs with η 0 = 0.0005 and k = 0.045.

Spectral Profiles
After spectra extraction and spectral preprocessing, rice seeds were randomly divided into a training set with 3000 samples of each variety and a hold-out test set with 2664, 2394, 1933, and 1916 samples of Xiushui 134, Zhejing 99, Zhongjiazao 17 and Zhongzao 39, respectively. To train models with different sizes of training set, subsets of 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500 and 3000 samples of each variety were selected from the full training set. The test performance of all models was evaluated on the same hold-out test set.
The spectral range of 441-948 nm (Spectral range 1) for visible/near-infrared system and 975-1646 nm (Spectral range 2) for near-infrared system were analyzed. The average spectra of 4 varieties of rice seeds in two different spectral ranges are shown in Figure 3a,b, and corresponding average spectra preprocessed by second derivative (polynomial order: 2, smoothing points: 7) are presented in Figure 3c,d.
In the two spectral ranges, only slight differences existed among the average spectra of different varieties. As seen from the second derivative spectra in the Spectral range 1, more obvious differences could be observed at the wavelengths of 451, 456, 463, 478, 678, 698 and 716 nm. These wavelengths were mainly related to the color variations of the rice seeds. In the Spectral range 2, the wavelengths at 1136, 1177, 1251, 1325, 1386, 1440, 1494 and 1619 nm showed greater differences. The wavelengths at 1136, 1177 and 1251 nm are attributed to the second overtone of C-H stretching mode [26]. The wavelength at 1386 nm (around 1387 nm) might be attributed to the first overtone of stretching and anti-symmetric O-H bond [27]. The wavelengths at 1440 and 1494 nm are attributed to water [26,28]. The wavelength at 1619 nm (around 1620 nm) might be attributed to first overtone of C-H group absorption [29]. However, for both preprocessed and unpreprocessed average spectra, the spectral differences among different seed varieties were not significant.

Classification Results of Different Models
Discriminant models were built by KNN, SVM and CNN models. To evaluate the effectiveness of these models, training sets with different number of samples were used. Training sets with 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500 and 3000 samples selected from the total training samples of each rice seed variety were used to build discriminant models, and the test set for above-mentioned models were all the same

Classification Results of Different Models
Discriminant models were built by KNN, SVM and CNN models. To evaluate the effectiveness of these models, training sets with different number of samples were used. Training sets with 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500 and 3000 samples selected from the total training samples of each rice seed variety were used to build discriminant models, and the test set for above-mentioned models were all the same (2664, 2394, 1933, and 1916 rice seeds of Xiushui 134, Zhejing 99, Zhongjiazao 17 and Zhongzao 39) for different number of training samples. The overall results are presented in Figure 4.
As shown in Figure 4, the classification accuracy of the training and test set of different models increased with the increase in the number of samples in the training set. Some fluctuations could be observed from the classification accuracy of the training set. When the number of samples came to a critical value, the increase rate of classification accuracy was small. For different models, the critical values were different. The models using 3000 training samples performed best since large training set is expected to contain a more extensive set of different types of feature combinations and feature values than small training sets.
200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500 and 3000 samples selected from the total training samples of each rice seed variety were used to build discriminant models, and the test set for above-mentioned models were all the same (2664, 2394, 1933, and 1916 rice seeds of Xiushui 134, Zhejing 99, Zhongjiazao 17 and Zhongzao 39) for different number of training samples. The overall results are presented in Figure 4.  As shown in Figure 4, the classification accuracy of the training and test set of different models increased with the increase in the number of samples in the training set. Some fluctuations could be observed from the classification accuracy of the training set. When the number of samples came to a critical value, the increase rate of classification accuracy was small. For different models, the critical values were different. The models using 3000 training samples performed best since large training set is expected to contain a more extensive set of different types of feature combinations and feature values than small training sets.
It can be seen in Figure 5 that the KNN model performs much worse than the other two models, and the test accuracy of models using Spectral range 1 and 2 are both lower than 60%. For KNN, SVM and CNN models, the classification accuracy of the training and test sets increased along with the increase of the training samples, although some fluctuations existed in the classification accuracy of the training sets (especially KNN and SVM models). KNN models using different number of training samples obtained relatively worse results than the corresponding SVM and CNN models. For SVM and CNN models, the classification results were close when the same number of samples was used. With It can be seen in Figure 5 that the KNN model performs much worse than the other two models, and the test accuracy of models using Spectral range 1 and 2 are both lower than 60%. For KNN, SVM and CNN models, the classification accuracy of the training and test sets increased along with the increase of the training samples, although some fluctuations existed in the classification accuracy of the training sets (especially KNN and SVM models). KNN models using different number of training samples obtained relatively worse results than the corresponding SVM and CNN models. For SVM and CNN models, the classification results were close when the same number of samples was used. With the increase of training samples, the performances of CNN models were better than the corresponding SVM models. The CNN model using 3000 samples in the Spectral range 2 obtained the best results. The classification accuracy of the training and test set was 89.6% and 87.0%, respectively. For all methods, models trained with reflectance values in Spectral range 2 performed better than models trained with reflectance values in Spectral range 1. As seen from Figure 3, no typical spectral peaks or minimums could be found from the average spectra in the Spectra range 1, while obvious spectral peaks or minimums were found from the average spectra in the Spectra range 2. The differences in second derivative spectra in the Spectral range 1 were mainly attributed to the color variations of the rice husk, and the differences in second derivative spectra in the Spectral range 2 were mainly attributed to the chemical compositions of rice seeds. This might be the reason why models in Spectral range 1 performed a little worse than those in the Spectral range 2. Tables 1 and 2 show the confusion matrices of SVM and CNN models using 3000 training samples of each rice seed variety. The performances of KNN models were much worse than SVM models and CNN models, so the confusion matrices are not presented. As shown in Tables 1 and 2, rice seeds of variety 1 (Xiushui 134) and 2 (Zhejing 99) were more likely to be misclassified as each other, while rice seeds of variety 3 (Zhongjiazao 17) and 4 (Zhongzao 39) showed the similar phenomenon. This might be caused by the difference in rice types. Rice seeds of variety 1 and 2 are japonica rice, while variety 3 and 4 are indica rice. Compared with SVM, CNN produces less misclassified samples in each variety. In both spectral ranges, CNN outperformed SVM in overall accuracy and per-class accuracy. The results indicated that CNN was effective for spectral data analysis and could improve the general classification accuracy.  For all methods, models trained with reflectance values in Spectral range 2 performed better than models trained with reflectance values in Spectral range 1. As seen from Figure 3, no typical spectral peaks or minimums could be found from the average spectra in the Spectra range 1, while obvious spectral peaks or minimums were found from the average spectra in the Spectra range 2. The differences in second derivative spectra in the Spectral range 1 were mainly attributed to the color variations of the rice husk, and the differences in second derivative spectra in the Spectral range 2 were mainly attributed to the chemical compositions of rice seeds. This might be the reason why models in Spectral range 1 performed a little worse than those in the Spectral range 2. Tables 1 and 2 show the confusion matrices of SVM and CNN models using 3000 training samples of each rice seed variety. The performances of KNN models were much worse than SVM models and CNN models, so the confusion matrices are not presented. As shown in Tables 1 and 2, rice seeds of variety 1 (Xiushui 134) and 2 (Zhejing 99) were more likely to be misclassified as each other, while rice seeds of variety 3 (Zhongjiazao 17) and 4 (Zhongzao 39) showed the similar phenomenon. This might be caused by the difference in rice types. Rice seeds of variety 1 and 2 are japonica rice, while variety 3 and 4 are indica rice. Compared with SVM, CNN produces less misclassified samples in each variety. In both spectral ranges, CNN outperformed SVM in overall accuracy and per-class accuracy. The results indicated that CNN was effective for spectral data analysis and could improve the general classification accuracy.

Discussion
Machine learning methods play essential roles in interpreting spectral data for different spectroscopy techniques and spectral imaging techniques. Deep learning is now a hot topic in AI, and convolutional neural networks (CNNs) are one of the most popular deep learning models. Generally, deep learning methods were used to analyze two-dimensional images. In this study, CNN was applied to one-dimensional spectra, and the CNN models obtained good performances. When compared with KNN and SVM models, CNN model performed better. The results of this study indicate that CNN could be used to handle spectral data effectively, which brings new alternatives for spectral data analysis. Acquarelli et al. (2017) proposed a CNN architecture for vibrational spectroscopic data analysis, and partial least squares-discriminant analysis (PLS-DA), logistic regression and KNN were used for comparison. The results showed that CNN model yielded satisfactory results. In most cases, CNN performed best, and in some other cases, CNN was not the best model [14]. Liu et al. (2017) proposed a CNN architecture to classify preprocessed and unpreprocessed Raman spectra. CNN models obtained the best performances when compared with KNN, SVM, gradient boosting, random forest and correlation [15]. These researches together with the results in this manuscript indicated that CNN could be used to deal with one-dimensional spectral data.
As can be seen from the confusion matrices, models performed better in distinguishing two rice varieties of different types (e.g., rice 1 and rice 3) than in distinguishing two rice varieties of the same type (e.g., rice 1 and rice 2). Moreover, models of Spectral range 1 performed better in classification of rice varieties 3 and 4 (both indica rice) than Spectral range 2, while models of Spectral range 2 performed better in classification of rice varieties 1 and 2 (both japonica rice) than Spectral range 1. This indicates a potential model improvement by combining two spectral ranges. However, hyperspectral images of the two spectral ranges were scanned by two different hyperspectral cameras. To combine these two spectral ranges, works should be done to match rice seeds in two corresponding hyperspectral images. This might be time-consuming in practical use. In this study, only 4 varieties of rice seeds were used. Since CNNs have achieved high performance in classification of thousands of image classes, classification of more rice seed varieties using a CNN might also be possible. But as a cost, large amounts of samples should be collected for each rice seed variety. More varieties of rice seeds and even more crop species should be studied in future research.
The influence of the number of training samples was also studied. Generally, performance of machine learning methods increases with the number of training samples. The lack of training samples limits the test performance of trained models. While test performance increase with the number of training samples, the improvement might not be significant after a certain point due to redundancy of information contained in training samples. Moreover, the sample collection procedure could be time-consuming. Thus, it is important to balance performance and cost of models. As shown in Figure 5, overall, the model performance improved with the number of training samples for KNN, SVM and CNN. When the number of training samples increased, the performances of the CNN models were superior to the corresponding SVM models. Deep learning methods could learn features automatically, and more samples would allow more potential feature combination to be explored. For models trained with less than 1500 samples, performance improved significantly with the number of training samples. For models trained with more than 1500 samples, performance improvements were not significant. In practical applications, models should be built to identify much more rice varieties. To achieve high model performance with a reasonable sample collection cost, it is better to keep a hold-out test set, and then gradually collect samples for training until test accuracy does not change significantly.

Conclusions
In this study, hyperspectral imaging of two different spectral range was used to identify rice seed varieties. Performance of three machine learning methods, namely KNN, SVM and CNN were tested. Models built with reflectance values in Spectral range 2 performed better than models built with reflectance values in Spectral range 1. The influence of the number of training samples was also studied. As the size of training set increase, CNN models outperformed the other two models. The result showed that with the help of fast sample collection using hyperspectral imaging, CNN could be an effective method for spectral data classification. In future research, more rice varieties should be studied to extend the use of CNNs in spectra data analysis.