Locality Preserved Selective Projection Learning for Rice Variety Identification Based on Leaf Hyperspectral Characteristics

Long, Chen-Feng; Wen, Zhi-Dong; Deng, Yang-Jun; Hu, Tian; Liu, Jin-Ling; Zhu, Xing-Hui

doi:10.3390/agronomy13092401

Open AccessEditor’s ChoiceArticle

Locality Preserved Selective Projection Learning for Rice Variety Identification Based on Leaf Hyperspectral Characteristics

by

Chen-Feng Long

^1,2,

Zhi-Dong Wen

^1,2,

Yang-Jun Deng

^1,2,*

,

Tian Hu

³,

Jin-Ling Liu

⁴ and

Xing-Hui Zhu

^1,2

¹

College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China

²

Hunan Provincial Engineering and Technology Research Center for Rural and Agricultural Informatization, Hunan Agricultural University, Changsha 410128, China

³

Hunan Agricultural Equipment Research Institute, Hunan Academy of Agricultural Sciences, Changsha 410011, China

⁴

College of Agronomy, Hunan Agricultural University, Changsha 410128, China

^*

Author to whom correspondence should be addressed.

Agronomy 2023, 13(9), 2401; https://doi.org/10.3390/agronomy13092401

Submission received: 13 August 2023 / Revised: 12 September 2023 / Accepted: 15 September 2023 / Published: 17 September 2023

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Rice has an important position in China as well as in the world. With the wide application of rice hybridization technology, the problem of mixing between individual varieties has become more and more prominent, so the variety identification of rice is important for the agricultural production, the phenotype collection, and the scientific breeding. Traditional identification methods are highly subjective and time-consuming. To address this issue, we propose a novel locality preserved selective projection learning (LPSPL) method for non-destructive rice variety identification based on leaf hyperspectral characteristics. The proposed LPSPL method can select the most discriminative spectral features from the leaf hyperspectral characteristics of rice, which is helpful to distinguish different rice varieties. In the experiments, support vector machine (SVM) is adopted to conduct the rice variety identification based on the selected spectral features. The experimental results show that the proposed method here achieves higher identification rates, 96% for the early rice and 98% for the late rice, respectively, which are superior to some state-of-the-art methods.

Keywords:

leaf hyperspectral characteristics; rice variety identification; selective projection learning; support vector machines

1. Introduction

Nearly half of the world’s population feeds on rice, and rice is also one of the important staple foods in East Asia as in Southeast Asia, which feeds 21% of the world’s population on 7% of the arable land [1]. With the increase in population and the decrease in arable land resources, the pressure on food security is increasing. The quality and yield of rice are directly related to rice variety. At the same time, with the wide application of a hybrid technology, the problem of mixing among varieties has become more prominent. Therefore, rapidly identifying rice varieties has become an important link between agricultural production and rice variety breeding [2]. However, how to identify rice varieties efficiently and accurately is a topic needing further research. The traditional field planting method and the morphological observation method are heavily influenced by subjectivity and have a long experimental period.

In recent years, some variety identification methods have been proposed to make up for the shortcomings of traditional methods. Wang et al. proposed a single nucleotide polymorphism (SNP) molecular identification system for the identification of grape varieties, and the identification accuracy rate could reach 95.69% in 348 grape accessions [3]. Meng et al. applied simple sequence repeat (SSR) markers to the identification of sweet potato varieties and successfully constructed the identification system of 203 sweet potato varieties [4]. Kuang et al. developed SNP markers for the identification of upland cotton varieties [5]. Zhang et al. used gas chromatography–mass spectrometry (GC-MS) to screen the characteristics of fragrant soybeans, on which a soybeans variety-identification model was established [6]. Although the physicochemical detection method, the molecular marker method, and the electrophoresis method have a higher accuracy, they are time-consuming and require stronger professionality [7].

The hyperspectral technology has been widely used in agriculture, food, and other fields [8]. Due to the differences in crops in different growing stages, growing environments, and genotypes, the cell structure, pigment content, and water content of their leaves will change, which will further affect the absorption, reflection, and transmission of light, and eventually lead to different spectral curve characteristics [9]. Based on the spectral difference, different crop varieties can be identified. Hu et al. analyzed and processed the leaf hyperspectral data for rice variety identification [10]. Wu et al. distinguished oat seed varieties based on hyperspectral imaging (HSI), and the accuracy reached 99.19% [11]. Feng et al. used hyperspectral images of raisins to distinguish different varieties and grades [12]. Jin et al. used the near-infrared hyperspectral technology combined with the machine learning and deep learning methods for recognizing five common rice varieties [13].

Although the hyperspectral data present more comprehensive information, they suffer from problems including the large data volume and redundant information [14]. While providing rich information, there is often a strong correlation between different bands, especially between adjacent bands, which leads to the redundancy of information. Removing the redundant information in the hyperspectral data can be realized by reducing the dimensionality. The processing efficiency over the hyperspectral data can be effectively improved after the dimensionality is reduced [15]. Therefore, reducing the data dimensionality is one of the key issues that needs to be addressed while using the hyperspectral technology for crop variety identification. There are two common ways to reduce dimension: feature extraction [16] and feature selection [17,18].

Some feature extraction methods have been used to deal with hyperspectral data, such as linear discriminant analysis (LDA), principal component analysis (PCA), among others. Ji et al. used the LDA to extract features from the hyperspectral images of potato and realized lossless classification of potato defects with support vector machine [19]. Zhang et al. used the neighborhood component analysis (NCA) to extract feature wavelengths from the hyperspectral data, which combines with a classifier to achieve the classification of rice seeds undergoing different frost damages [20]. Wang et al. used the PCA for the feature extraction of the hyperspectral images and then combined it with the back propagation neural network (BPNN) for rice variety identification [21]. However, the PCA and the LDA are classical linear dimensionality reduction methods, which are not good at processing the nonlinear high-dimensional hyperspectral data.

To address this issue, manifold learning methods capable of capturing the nonlinear structure of data have been proposed and widely applied in hyperspectral data analysis. The locality preserving projection (LPP) is one of the most popular manifold learning methods, whose goal is searching a projection to map the high-dimensional data into a low-dimensional subspace while maintaining the neighboring relationships between samples in the raw data space. Unfortunately, the LPP is developed on

L_{2}

-norm and is sensitive to noises, which may amplify the influence of the outliers and the noises. To improve the robustness of the LPP, Pang et al. [22] proposed the LPP-

L_{1}

, which replaces the

L_{2}

-norm with the

L_{1}

-norm as the measure criterion. However, the constraints remain unchanged, and this algorithm still suffers from the problem that solving the model is difficult and time-consuming. Apart from above problems, the LPP and its existing extensions lack a constraint on the projection, which leads to a lack of the projection’s interpretability. The LPP only projects the original data into a lower-dimensional subspace and does not reflect what features are critical for combining the low-dimensional features. All in all, the physical meaning of features extracted is not preserved and thus lacks interpretability [23].

Feature selection, another kind of method for reducing dimensionality, selects a feature subset from the original features, which can well maintain the original physical meaning of the features. Zhang et al. used the random frog (RF) to perform feature selection on the infrared hyperspectral data of rice, which is joined by a machine learning model to identify rice grains damaged by rice weevil [24]. Zhu et al. adopted the competitive adaptive reweighted sampling (CARS) method to select features from hyperspectral data and combined it with a classifier to identify soybean seeds [25]. He et al. used the successive projections algorithm (SPA) to select the best wavelengths in the hyperspectral data of rice seeds and combined it with a classifier to construct a model for classifying the vigor levels of the rice seeds [26]. Shao et al. used the Boruta algorithm for the band selection of the corn hyperspectral images and then conducted the identification of maize varieties using the random forest algorithm [27]. However, most previous feature selection methods attempt to select the important features by weights ranking without considering the intrinsic manifold structure of the hyperspectral data.

Recently, researchers have paid more attention to the methods based on

L_{2,1}

-norm [28]. The advantage of these methods is that they join the feature extraction and the feature selection together. To that end, this paper proposes the locality preserved selective projection learning (LPSPL), which combines the manifold learning model LPP with the

L_{2,1}

-norm to learn a selective projection for the rice variety identification.

2. Materials

2.1. Experimental Site

The hyperspectral data of the rice leaves were obtained from Liuyang City, Hunan Province, China. Liuyang belongs to the subtropical monsoon climate, the average precipitation of the city over the years is 1400~1800 mm. The soil indicators for growing the early and the late rice are shown in Table 1.

2.2. Experimental Design

The experiment was conducted in a randomized block design with three replications. The mechanical transplanting method was used for the early and the late rice; the transplant specification of the early rice was 12 cm × 25 cm, and that of the late rice was 14 cm × 25 cm. Each cell had 20 square meters, and the ridge between the cells was covered with the plastic film to the bottom to prevent water and fertilizer from spilling, thus ensuring the independence of the cell experiment.

2.3. Data Acquisition

We collected the hyperspectral data of the rice leaves by using the Field Spectrometer 3 and the leaf clamp from ASD company located in Longmont, Colorado, United States. The wavelength range of Field Spectrometer 3 is 350–2500 nm. When the noon-light was sufficient, the Field Spectrometer 3 was placed behind, and the leaf clamp was held in hand to maintain the vertical orientation of the clamp to the rice leaf. The collected data were transferred to the computer in real time and saved as a file in ASD format according to the set path, and then the ViewSpecPro software (Version 6.2.0) was employed to convert the measured values, with the result exported to a txt file. At the heading stage, 20 representative leaves of plants with consistent growth and no diseases and pests were selected from the fixed points in each experimental plot. The same selection strategy was maintained in the full-heading stage and the grain-filling stage. The spectral values of the top, the middle, and the bottom of each leaf were measured, and the average of the three was taken as the spectral data of this leaf. The flowchart of the data acquisition is shown in Figure 1.

3. Methods

3.1. Locality Preserved Selective Projection Learning (LPSPL)

As a representative method of manifold learning, LPP has been widely applied for the dimensionality reduction of hyperspectral data. The idea of LPP is to learn a projection matrix that can preserve the original neighboring relationships among sample points in the low-dimensional subspace for dimensionality reduction. The main advantage of LPP is that it can capture the local manifold structure information of hyperspectral data for projection learning. However, the learned projection is generally sensitive to noise since the model is developed on

L_{2}

-norm. More unfortunately, the learned projection matrix

P

cannot reflect which of the features are critical for combining the lower dimensional features. In other words, LPP maps the original data into a low-dimensional subspace with weak interpretability.

In fact, the leaf hyperspectral data mentioned in Section 2 are collected from the rice under different growth environments, which inevitably contain noises. Thus, using LPP for the leaf hyperspectral data processing is still unsatisfactory. Moreover, the low-dimensional features extracted from rice leaf hyperspectral data by LPP are less interpretable, which cannot help to discover the important bands for rice variety identification. To address the above issues, this paper attempts to develop a projection learning method that can jointly perform feature extraction and feature selection for the robust dimensionality reduction of rice leaf hyperspectral data, resulting in better rice variety identification performance. Therefore, we construct a novel locality preserved selective projection learning (LPSPL) by integrating LPP with

L_{2, 1}

-norm. In detail, the objective function is expressed as

\min_{W} \sum_{i, j}^{N} | | W^{T} x_{i} - W^{T} x_{j} | |_{2}^{2} S_{i j} + λ {| | W | |}_{2,1} s . t . W^{T} X D X^{T} W = I

(1)

where

W

is a projection matrix,

X = {[x_{1}, \dots, x_{N}]}^{T} \in R^{N \times m}

denotes the observation data matrix of rice leaf hyperspectral data, each

x_{i} \in R^{1 \times m}

is an original hyperspectral feature vector of rice leaf,

N

is the number of samples,

m

is the dimension of the samples,

S_{i j}

is an element of the weight matrix that represents the similarity between the

i

-th and

j

-th samples,

D

is a diagonal array with

D_{i j} = \sum S_{i j}

, and

λ

is a balance parameter. Remarkably, in the LPSPL model, the

L_{2, 1}

-norm constraint can ensure the rows of the projection matrix

W

are sparse, which can make the projection matrix more interpretable. Specifically, in each row of the projection matrix, the zero elements indicate the weight of unimportant or redundant features, and non-zero elements indicate the importance weight of features that is helpful for rice variety identification.

In order to solve the above model, we have to do some simple mathematical derivations. Firstly, we calculate a diagonal matrix

G

, which is defined as

G_{i i} = \frac{1}{2 {λ | | w^{i} | |}_{2}}

(2)

where

w^{i}

denotes the

i

-th row of matrix

W

. Then, the minimization of

| | {W | |}_{2,1}

is converted to solve the following equivalent trace minimization problem

\min_{W} | | W {| |}_{2,1} = \min_{W} t r (W G W^{T})

(3)

Referring to Equation (9) in Article [29], we can transform the first term of Equation (1) into the following form

\frac{1}{2} \sum_{i, j}^{N} | | W^{T} x_{i} - W^{T} x_{j} | |_{2}^{2} S_{i j} = t r (W^{T} X L X^{T} W)

(4)

According to (3) and (4), the optimization problem (1) can be converted into the following equivalent problem

W^{*} = \min_{W} t r (W^{T} X L X^{T} W) + t r (W^{T} G W) = \min_{W} t r (W^{T} (X L X^{T} + G) W)

(5)

Thus, the optimal solution of (5) can be obtained from solving the standard eigen decomposition problem:

(X L X^{T} + G) w = α w

(6)

where

α

is the eigenvalue and

w

is the corresponding eigenvector. The optimal solution

W^{*}

consists of the eigenvectors corresponding to the

k

smallest none-zero eigenvalues. The details of solving the LPSPL model are shown in Algorithm 1.

Algorithm 1: LPSPL method

Input: Rice hyperspectral data

X \in R^{N \times m}

, the number of iterations

T

Output: Low-dimensional features

\tilde{X}

and projection

W^{*}

Step 1: Initialize the projection

W

to the identity matrix.
Step 2: For t = 1:

T

-: Step 2.1 Calculate and update the diagonal elements of $G$ using (2).
-: Step 2.2 Calculate $X L X^{T} + G$ and get the projection $W$ using (6).
-: Step 2.3 Determine whether the main function converges. If it converges, break out of the loop; otherwise, continue to execute step 2.1.

End
Step 3: Project the samples onto the low-dimensional subspace to obtain

\tilde{X} = W^{T} X

for classification.

3.2. LPSPL for Rice Variety Identification

The illustration of rice variety identification based on leaf hyperspectral data via LPSPL is presented in Figure 2.

The input leaf hyperspectral data contain

m

bands, where

m = 2151

in this paper. Firstly, the input original leaf hyperspectral data are normalized into the range of [0, 1] by using the max-min normalization method. The purpose of normalization is to eliminate the data-level differences between the various dimensions of data, which can lay the foundation for the postprocessing. Secondly, the proposed LPSPL method is adopted to extract features from the normalized hyperspectral data of rice leaves. As such, the top

k (k ≪ m)

most important features will be selected from the raw leaf hyperspectral characteristics, and their linear combination is calculated to obtain the final expected low-dimensional features. Specifically, the final dimension

k

was set as 30 in our experiments. With those yielded features, the classical classifiers (such as k-nearest neighbors (KNN), SVM, and so on) can be employed to accomplish the rice variety identification. Without loss of generality, this study utilized the SVM classifier to identify the rice varieties based on the low-dimensional features obtained from the leaf hyperspectral data via LPSPL.

3.3. Support Vector Machine Classifier

The SVM classifier was proposed to nonlinearly map an input vector into a high-dimensional space and construct a linear decision plane in the high-dimensional space, whose special properties ensure the generalization ability of the model [30]. It not only shows some unique advantages in solving small-sample, nonlinear, and high-dimensional pattern identification problems, but also certainly overcomes the traditional problems such as the disaster of dimensionality and over-fitting [31]. SVM first obtains the optimal solution by solving a pairwise problem equivalent to the original problem. Then, the model is extended to nonlinear classification problems by introducing a kernel function.

\max_{α} L (z, b, α) = \max_{α} \sum_{i = 1}^{n} α_{i} - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i} α_{j} y_{i} y_{j} k (x_{i}, x_{j}) s . t . \sum_{i = 1}^{n} α_{i} y_{i} = 0, {0 \leq α}_{i} \leq C, i = 1,2, \dots, n

(7)

In Equation (7),

z

is the normal vector that determines the direction of the hyperplane,

b

is the displacement term which represents the distance between the origin and the hyperplane, and

α

is a Lagrangian multiplier. The decision function is given as

f (x) = \sum_{i = 1}^{n} α_{i} y_{i} k (x_{i}, x) + b

(8)

where

k (\cdot, \cdot)

is a kernel function. In particular, the most widely used radial basic function (RBF) is adopted in this paper.

3.4. Control Methods and Evaluation Indicators

3.4.1. Control Methods

In order to verify the effectiveness of the proposed LPSPL method for rice variety identification, we chose three of the most popular dimensionality reduction methods, i.e., PCA, LPP, and t-distributed stochastic neighbor embedding (t-SNE), for experimental comparison. Meanwhile, the original data without dimensionality reduction were considered as baseline for comparison. To quantitatively evaluate the performance of the involved dimensionality reduction methods, the overall accuracy (OA), f1-score, and kappa coefficient were calculated for comparative verification.

The main goal of PCA is to construct new linearly independent features and maximize the diversity of the retained principal components simultaneously. In other words, PCA tries to exploit the global structure information for recombining the original many correlated features into a new set of uncorrelated comprehensive indicators to replace the original ones. In general, the dimension of the new features is much smaller than the dimension of the original ones.

LPP is a typical manifold learning method, which is developed on the assumption that examples in small local neighborhoods have similar properties. LPP achieves the dimensionality reduction of high-dimensional data in two steps. A weighted adjacency graph is first constructed to capture the locality structure information. Then, a minimization problem is optimized to learn a projection matrix that can effectively preserve the locality information captured in the adjacency graph during the process of mapping the high-dimensional data into a low-dimensional subspace.

The t-SNE is an unsupervised and nonlinear dimension-reduction method. It is capable of capturing much of the local structure of the high-dimensional data very well, while also revealing the global structure [32].

3.4.2. Evaluation Indicators

In this paper, three evaluation indicators are introduced to quantitatively evaluate the performance of the involved methods. The first one is the OA, whose definition is

O A = \frac{\sum_{i = 1}^{N} I (\hat{y_{i}} = y_{i})}{N}

(9)

Here,

\hat{y_{i}}

is the predicted label obtained from the model,

y_{i}

is the true label of the sample, and

N

is the total number of samples in the dataset.

The second one is the f1-score, whose definition is

f 1 - s c o r e = \frac{2 * (p r e c i s i o n * r e c a l l)}{(p r e c i s i o n + r e c a l l)}

(10)

Here, recall indicates how many positive examples in the total sample are correctly predicted, and its formula is

r e c a l l = \frac{T P}{T P + F N}

. The precision indicates how many of the samples predicted to be positive are truly positive example, and its formula is

p r e c i s i o n = \frac{T P}{T P + F P}

. The f1-score is the combination of precision and recall, and the larger this value is, the more robust the model is. Table 2 shows the meaning of TP, FP, FN, TN.

The third indicator is the kappa coefficient, an indicator to judge whether the model prediction results are consistent with the actual classification results, and it is calculated based on the confusion matrix. Its formula is

K a p p a = \frac{p_{o} - p_{e}}{1 - p_{e}}

, where

p_{o}

is the overall precision, and

p_{e}

is consistency error by accident.

4. Experimental Results

In this part, extensive experiments were conducted to validate the LPSPL method. The classification accuracy of the proposed method is compared with that of the t-SNE, the PCA, the LPP, and the baseline under different numbers of bands and different training ratios, and then a comprehensive evaluation is performed with the OA, the f1-score, and the kappa coefficient.

4.1. Datasets

The early rice dataset used for the experiment has 10 rice varieties, each containing 60 samples. Each sample has 2151 hyperspectral characteristics. The number of samples in the late rice dataset is the same as in the early rice dataset. The full name of each variety is shown in Figure 3. For convenience, a short name (Z1, Z2, …, Z10) is given to each variety to refer to early rice varieties 1 to 10. These short names are used in the Section 4 of the paper. A short name (W1, W2, …, W10) refers to the 1st to 10th late rice varieties.

4.2. Experiment Setup

The data were divided into the early rice samples and the late rice samples. Each rice variety was divided into training and prediction sets according to 3:7, 5:5, 7:3, and 9:1, respectively. The training set was used to obtain the rice variety identification model, and the prediction set was used to verify the accuracy of the model. SVM was used as the classifier in all experiments. For the early and late rice datasets, 30%, 50%, 70%, and 90% of the data were selected as the training set, and the rest as the test set. In the experiment, the optimal λ value in the LPSPL method was selected from the value range {0.0001, 0.01, 0.1, 1, 10, 100, 1000}. It was found from experiments that the optimal parameter λ for the early rice data was 0.001, and that for the late rice data was 0.1. What is more, each method was run 10 times independently, and we took the average values as the experimental results.

4.3. Feature Selection Based on LPSPL

For the hyperspectral data of rice leaves in the range of 350 nm–2500 nm, we used the LPSPL method and set the maximum number of iterations to 300. The late rice data finally converged at 229 iterations, and the early rice data finally converged at 152 iterations. The number of hyperspectral bands to be selected was set to 30, and then the 30 most important bands were selected by the LPSPL algorithm. The selected band indexes are shown in Table 3.

4.4. Comparison of the Classification Accuracy with Other Methods

The same training datasets were applied to train the five models, and then those models were evaluated by test datasets. Figure 4a shows the recognition accuracies of early rice varieties obtained by the five models at different training ratios, where the recognition accuracies of the LPSPL method are 46.24%, 64%, 80.89%, and 96% at the training ratios of 30%, 50%, 70%, and 90%, respectively. Figure 4b shows the recognition accuracies of the four models for late rice varieties at four training ratios, where the recognition accuracies of the LPSPL method are 56.48%, 71.47%, 85.33%, and 98.00% under 30%, 50%, 70%, and 90% training ratios, respectively.

Table 4 shows the accuracies of each early rice variety, which were obtained from all models under 30% and 50% training ratios. Table 5 shows the variety identification rate of different methods under 70% and 90% training ratios. It can be seen that the OA acquired by LPSPL is the highest among all training ratios.

The classification accuracy of the early rice varieties obtained by the five models at different numbers of features are shown in Figure 5. Here, Figure 5a represents the accuracy of the early rice dataset at the training ratio of 30%, Figure 5b represents the accuracy at the 50% training ratio, Figure 5c represents the accuracy at the 70% training ratio, and Figure 5d represents the accuracy at the 90% training ratio.

From the classification accuracy under different feature dimensions for the early rice dataset, it can be found that the t-SNE model has a certain advantage in a low dimension. But with the increase in the dimension, the OA achieved by the t-SNE model has no advantage, whereas the LPSPL model shows a great advantage. When the training ratio is 30%, the LPSPL model has the highest accuracy of 51.81% with the number of features being 15. Under the training ratio of 50%, the LPSPL model has the highest accuracy of 68.07% for the feature number of 15. While for the training ratio of 70%, the LPSPL model has the highest accuracy of 83.56% for the feature number of 15. For the training percentage of 90%, the LPSPL model has the highest accuracy of 97.67% under the feature number of 18.

Table 6 represents the identification rates for each class of the late rice varieties obtained by the five models under training ratios of 30% and 50%. Table 7 shows the identification rates of each type of late rice variety obtained by the five models at 70% and 90% training ratios, in which at the 90% training ratio, except for the two varieties of taoyou412 and changliangyou1408, the identification rates of the other eight varieties reach 100% by using the LPSPL method.

Figure 6 shows the identification accuracies of late rice variety obtained by the five models at different numbers of features, ranging from 1 to 30. Here, Figure 6a represents the identification accuracy of the late rice dataset at the training ratio of 30%, Figure 6b represents the identification accuracy at the training ratio of 50%, Figure 6c represents the identification accuracy at the training ratio of 70%, and Figure 6d represents the identification accuracy at the training ratio of 90%. From Figure 6, it is found that the LPSPL has a relatively obvious advantage in most of the dimensions. When the proportion of training set is 30% and the number of features is 25, the model LPSPL can reach the highest accuracy of 57.81%. When the training percentage is 50% and the number of features is 20, the accuracy of the LPSPL model is highest at 72.67%. When the training percentage is 70% and the number of features is 30, the accuracy of the LPSPL model is highest at 87.33%. When the training ratio is 90% and the number of features is 30, the accuracy of the LPSPL model is up to 98%.

According to the experimental results, the LPSPL method has the highest OA at different training ratios and feature numbers. For 10 varieties of early rice, the classification accuracy yield by LPSPL is 96% at a training ratio of 90%; compared with the baseline, it is increased by 3.67%. Compared with the LPP method, it is increased by 2%. Compared with the PCA method, the accuracy is improved by 3%. Compared with the t-SNE method, the accuracy is improved by 1.67%. Among the 10 varieties of late rice, when the training percentage is 90%, our classification accuracy is 98%, which is an improvement of 4.67% compared to the baseline and 1.33% compared to the LPP method. The accuracy is improved by 2% when compared with the PCA method. Compared with the t-SNE method, it is increased by 3%. According to the results, it can be preliminarily considered that the LPSPL method can better identify rice varieties.

4.5. Comparison of the f1-Score with Other Methods

From Figure 7, it can be seen that the LPSPL method achieves the highest f1-score for both early and late rice at different training ratios. Figure 7a,b show the f1-score of early rice and late rice under different training ratios, respectively. Among the early rice dataset, the f1-score obtained by the LPSPL method reaches 0.96 when the training ratio is 0.9. The f1-score obtained by the LPSPL method improves the most at a training set ratio of 0.7, and improves by 0.078 compared to the baseline. In the late rice dataset, the f1-score obtained by the LPSPL method is 0.98 when the training ratio is 0.9, and improves by 0.09 compared to the baseline.

4.6. Comparison of the Kappa Coefficients with Other Methods

The kappa coefficients of early and late rice at different training set proportions are shown in Table 8, from which we can see that the LPSPL method achieves the highest value under three different training ratios. The highest value of the kappa coefficient is 0.9556 for the early rice and 0.9778 for the late rice when the training ratio is 0.9.

5. Discussion

In this study, machine learning algorithms were used to analyze and process the hyperspectral data of rice leaves to identify the variety. The rice datasets were collected from three growth stages; the purpose was to verify the model’s ability to cope with complex data. The current sample number was relatively small; a larger sample number will enhance the persuasion of the model. In the next experiment, more data will be collected.

On the basis of the previous paper [10], the LPSPL algorithm was proposed. The algorithm can perform dimensionality reduction operation while selecting the bands. The physical meaning of the hyperspectral characteristics can be presented more clearly compared to that of the LPP algorithm. Benefitting from the special properties of the

L_{2,1}

-norm, a row-sparse projection matrix is generated to select the features of the hyperspectral data. In the row-sparse matrix, the column vector of zeros represents the redundancy or noisy spectral band, and the non-zero ones indicate the important spectral bands that can be used for distinguishing different rice varieties. In the experiment, SVM was used as the classifier. The research process did not pay much attention to other classifiers, so the next step is to explore the influence of different classifiers for variety identification accuracy.

By using the LPSPL method, 30 key bands were selected from the 2151 hyperspectral bands. According to the experimental results, the selected bands were relatively concentrated in the red light and the blue light ranges. Generally, in the 400–520 nm (blue light), chlorophyll and carotenoids have a greater absorption ratio, which has greater impact on photosynthesis. In the 610–720 nm (red light) range, the absorption rate of chlorophyll is low, which observably affects photosynthesis and the photoperiodic effect. Thus, there is a bold inference: the difference in photosynthetic intensity may be one of the reasons for the difference in spectral curves.

In this study, early rice and late rice were recognized separately because early rice and late rice grow at different times. If they were put together for recognition, it would not be very meaningful in real life. Of course, we also combined the early rice and late rice datasets together for the recognition test, with a total of 20 label values. Using the LPSPL algorithm combined with SVM can still be very good for recognizing varieties.

6. Conclusions

In view of the high dimension and rich information of hyperspectral data, we designed an efficient and convenient machine learning method to identify rice varieties efficiently and in a way that is lossless. The conclusion is that the LPSPL method has a better effect on feature extraction than that of other machine learning methods, and combined with the SVM classifier, a relatively excellent result can be obtained. Compared with the baseline and other models, the OA, f1-score, and kappa coefficient are significantly improved. The associated experimental results validate the effectiveness of the proposed LPSPL method. In detail, the results show that the recognition accuracy of the LPSPL for 10 early rice varieties and 10 late rice varieties reached 96% and 98%, respectively. Compared with the LPP method, the recognition accuracy over the early rice varieties and the late rice varieties increased by more than 2% and 1.33%, respectively. In conclusion, these results show that the LPSPL method is efficient and accurate, which provides a new method and reference idea for the identification of rice and other crop varieties.

Author Contributions

Conceptualization, C.-F.L.; Methodology, Y.-J.D.; Validation, Y.-J.D. and J.-L.L.; Formal analysis, X.-H.Z.; Investigation, C.-F.L.; Data curation, Z.-D.W. and T.H.; Writing-original draft, C.-F.L. and Z.-D.W.; Writing-review & editing, Y.-J.D. and J.-L.L.; Visualization, T.H.; Funding acquisition, X.-H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Hunan Provincial Key Research and Development Program under Grant 2023NK2011 and the Meizhou Tobacco Science Research Project under Grant No. 202204.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

This work was supported in part by the Hunan Provincial Natural Science Foundation of China under Grant 2022JJ40189, in part by the Hunan Provincial Key Research and Development Program under Grants 2023NK2011 and 2020NK2033, in part by Scientific Research Fund of Hunan Provincial Education Department under Grant 22B0181, and in part by the Meizhou Tobacco Science Research Project under Grant No. 202204. The authors would like to thank Jian Peng and Xiang Luo for the language editing. Additionally, the authors thank the anonymous referees for their valuable comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, J. Rice breeding: Never off the table. Natl. Sci. Rev. 2016, 3, 275. [Google Scholar] [CrossRef]
Wu, J.Z.; Yang, L.; Sun, L.J. Research on rapid and non-destructive identification of rice varieties based on THz-ATR technology. J. Chin. Cereals Oils Assoc. 2022, 37, 246–251. [Google Scholar]
Wang, F.Q.; Fan, X.C.; Zhang, Y. Establishment and application of an SNP molecular identification system for grape cultivars. J. Integr. Agric. 2022, 21, 1044–1057. [Google Scholar] [CrossRef]
Meng, Y.S.; Ning, Z.H.; Hui, L.I. SSR fingerprinting of 203 sweetpotato (Ipomoea batatas (L.) Lam.) varieties. J. Integr. Agric. 2018, 17, 86–93. [Google Scholar] [CrossRef]
Kuang, M.; Wei, S.J.; Wang, Y.Q. Development of a core set of SNP markers for the identification of upland cotton cultivars in China. J. Integr. Agric. 2016, 15, 954–962. [Google Scholar] [CrossRef]
Zhang, Y.F.; Zhang, C.Y.; Zhang, B. Establishment and application of an accurate identification method for fragrant soybeans. J. Integr. Agric. 2021, 20, 1193–1203. [Google Scholar] [CrossRef]
Tian, R.C.; Lu, J.W. Application of spectroscopic techniques in the identification of rice varieties. Food Sci. Technol. Econ. 2019, 44, 73–76. [Google Scholar]
Zhao, Q.; Zhang, Z.; Huang, Y. TPE-RBF-SVM model for soybean categories recognition in selected hyperspectral bands based on Extreme Gradient Boosting Feature Importance Values. Agriculture 2022, 12, 1452. [Google Scholar] [CrossRef]
Lu, J.; Tian, R.; Wen, S.; Guan, C. Selection of agronomic parameters and construction of prediction models for oleic acid contents in rapeseed using hyperspectral data. Agronomy 2023, 13, 2233. [Google Scholar] [CrossRef]
Hu, T.; Chen, Y.N.; Li, D.; Long, C.F.; Wen, Z.D.; Hu, R.; Chen, G.H. Rice variety identification based on the leaf hyperspectral feature via LPP-SVM. Int. J. Pattern Recognit. Artif. Intell. 2022, 36, 2350001. [Google Scholar] [CrossRef]
Wu, N.; Zhang, Y.; Na, R. Variety identification of oat seeds using hyperspectral imaging: Investigating the representation ability of deep convolutional neural network. RSC Adv. 2019, 9, 12635–12644. [Google Scholar] [CrossRef] [PubMed]
Feng, L.; Zhu, S.; Zhang, C.; Bao, Y.; Gao, P.; He, Y. Variety identification of raisins using near-infrared hyperspectral imaging. Molecules 2018, 23, 2907. [Google Scholar] [CrossRef] [PubMed]
Jin, B.; Zhang, C.; Jia, L. Identification of rice seed varieties based on near-infrared hyperspectral imaging technology combined with deep learning. ACS Omega 2022, 7, 4735–4749. [Google Scholar] [CrossRef]
Wang, Q.; Li, Q.; Li, X.L. Hyperspectral band selection via adaptive subspace partition strategy. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4940–4950. [Google Scholar] [CrossRef]
Wu, X.; Xu, X.Y.; Liu, J.H. Supervised feature selection with orthogonal regression and feature weighting. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 1831–1838. [Google Scholar] [CrossRef]
Nie, F.P.; Xiang, S.M.; Liu, Y. Orthogonal vs. uncorrelated least squares discriminant analysis for feature extraction. Pattern Recognit. Lett. 2012, 33, 485–491. [Google Scholar] [CrossRef]
Zhu, G.K.; Huang, Y.C.; Lei, J.S. Unsupervised hyperspectral band selection by dominant set extraction. IEEE Trans. Geosci. Remote Sens. 2015, 54, 227–239. [Google Scholar] [CrossRef]
Wang, J.; Zhou, J.; Huang, W.Q. Attend in bands: Hyperspectral band weighting and selection for image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4712–4727. [Google Scholar] [CrossRef]
Ji, Y.; Sun, L.; Li, Y. Non-destructive classification of defective potatoes based on hyperspectral imaging and support vector machine. Infrared Phys. Technol. 2019, 99, 71–79. [Google Scholar] [CrossRef]
Zhang, L.; Sun, H.; Rao, Z. Hyperspectral imaging technology combined with deep forest model to identify frost-damaged rice seeds. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2020, 229, 117973. [Google Scholar] [CrossRef]
Wang, L.; Liu, D.; Pu, H. Use of hyperspectral imaging to discriminate the variety and quality of rice. Food Anal. Methods 2015, 8, 515–523. [Google Scholar] [CrossRef]
Pang, Y.; Yuan, Y. Outlier-resisting graph embedding. Neurocomputing 2010, 73, 968–974. [Google Scholar] [CrossRef]
Liu, N.; Lai, Z.H.; Li, X.C. Locality preserving robust regression for jointly sparse subspace learning. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 2274–2287. [Google Scholar] [CrossRef]
Zhang, L.; Sun, H.; Li, H. Identification of rice-weevil (Sitophilus oryzae L.) damaged wheat kernels using multi-angle NIR hyperspectral data. J. Cereal Sci. 2021, 101, 103313. [Google Scholar] [CrossRef]
Zhu, S.; Chao, M.; Zhang, J. Identification of soybean seed varieties based on hyperspectral imaging technology. Sensors 2019, 19, 5225. [Google Scholar] [CrossRef]
He, X.; Feng, X.; Sun, D. Rapid and nondestructive measurement of rice seed vitality of different years using near-infrared hyperspectral imaging. Molecules 2019, 24, 2227. [Google Scholar] [CrossRef]
Shao, Q.; Chen, Y.H.; Yang, S.T. Hyperspectral image identification of maize varieties based on random forest algorithm. Geogr. Geogr. Inf. Sci. 2019, 35, 34–39. [Google Scholar]
Liu, Y.; Gao, Q.X.; Gao, X.B.; Shao, L. L_2,1-norm discriminant manifold learning. IEEE Access 2018, 6, 40723–40734. [Google Scholar]
Tang, C.; Liu, X.W. Feature selective projection with low-rank embedding and dual Laplacian regularization. IEEE Trans. Knowl. Data Eng. 2020, 32, 1747–1760. [Google Scholar]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar]
Gu, Y.X.; Ding, S.F. Advances in support vector machine research. Comput. Sci. 2011, 38, 14–17. [Google Scholar]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]

Figure 1. Data acquisition process.

Figure 2. Flow chart of rice variety identification based on leaf hyperspectral characteristics via LPSPL.

Figure 3. The name of the rice variety.

Figure 4. Overall classification accuracy of early rice (a) and late rice (b) datasets in different methods at different training ratios.

Figure 5. The early rice classification accuracy of each method with different dimensions obtained under 30% (a), 50% (b), 70%, (c) and 90% (d) dataset training, respectively.

Figure 6. The late rice classification accuracy of each method with different dimensions obtained under 30% (a), 50% (b), 70%, (c) and 90% (d) dataset training, respectively.

Figure 7. F1-score of the early rice dataset (a) and the late rice dataset (b).

Table 1. Soil indexes for planting early rice and late rice.

Indicators	Variety
Indicators	Early Rice	Late Rice
available nitrogen	83.24 mg/kg	90.83 mg/kg
available phosphorus	41.29 mg/kg	44.42 mg/kg
effective potassium	90 mg/kg	110 mg/kg
PH values	4.8	5.4
organic matter	23.43 g/kg	25.22 g/kg
total nitrogen	10.67 g/kg	10.33 g/kg
total phosphorus	1.40 g/kg	1.42 g/kg
total potassium	0.56 g/kg	0.59 g/kg

Table 2. Confusion matrix of positive and negative samples.

Truth	Predict
Truth	Positive Example	Negative Example
positive example	TP	FN
negative example	FP	TN

Table 3. Index of the selected bands by LPSPL method.

Dataset	Early Rice Dataset	Late Rice Dataset
Index of the selected bands	120, 121, 123, 124, 135, 136, 137, 138, 139, 140, 221, 223, 224, 227, 229, 268, 273, 275, 276, 274, 277, 278, 321, 322, 323, 325, 324, 326, 329, 328.	97, 98, 113, 114, 179, 182, 183, 191, 192, 195, 202, 246, 247, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 262, 261, 266, 317, 319, 320.

Table 4. Variety identification rate (%) of different methods under 30% and 50% training ratios in early rice.

Species Category	Prediction Set Accuracy (Training Set Proportion Is 30%)					Prediction Set Accuracy (Training Set Proportion Is 50%)
Species Category	Baseline	PCA	LPP	t-SNE	LPSPL	Baseline	PCA	LPP	t-SNE	LPSPL
Z1	36.19	34.29	38.10	38.10	37.14	48.00	56.00	54.67	55.33	65.33
Z2	41.90	42.86	37.14	48.57	44.76	61.33	72.00	50.67	56.00	58.67
Z3	37.14	43.81	47.62	42.38	43.81	60.00	61.33	69.33	60.00	68.00
Z4	36.19	39.05	38.10	41.90	35.24	53.33	54.67	60.00	48.00	58.67
Z5	50.48	44.76	49.52	45.71	42.86	65.33	60.00	68.00	56.00	66.67
Z6	39.05	44.76	32.86	26.19	46.67	47.33	62.00	50.67	48.67	56.00
Z7	48.57	44.29	34.29	33.81	44.29	51.33	58.00	50.67	59.33	49.33
Z8	45.71	47.62	40.00	47.62	39.05	66.67	56.00	54.67	64.00	68.00
Z9	53.33	57.14	54.29	49.52	63.81	62.67	70.67	65.33	57.33	73.33
Z10	51.43	50.48	60.95	50.47	64.76	74.67	68.00	72.00	73.33	76.00
OA	44.00	44.90	43.29	42.43	46.24	59.07	61.87	59.60	57.80	64.00

Table 5. Variety identification rate (%) of different methods under 70% and 90% training ratios in early rice.

Species Category	Prediction Set Accuracy (Training Set Proportion Is 70%)					Prediction Set Accuracy (Training Set Proportion Is 90%)
Species Category	Baseline	PCA	LPP	t-SNE	LPSPL	Baseline	PCA	LPP	t-SNE	LPSPL
Z1	66.67	71.11	75.56	70.00	84.44	93.33	100.00	80.00	83.33	100.00
Z2	77.78	73.33	84.44	75.56	84.44	93.33	80.00	93.33	86.67	100.00
Z3	66.67	68.89	71.11	64.44	77.78	100.00	93.33	100.00	93.33	100.00
Z4	75.56	77.78	71.11	77.78	84.44	93.33	86.67	93.33	100.00	86.67
Z5	73.33	82.22	75.56	75.56	71.11	86.67	100.00	86.67	100.00	100.00
Z6	73.33	73.33	62.22	70.00	77.78	96.67	83.33	96.67	90.00	96.67
Z7	65.56	77.78	70.00	78.89	73.33	86.67	86.67	90.00	96.67	83.33
Z8	68.89	68.89	77.78	88.89	71.11	73.33	100.00	100.00	93.33	100.00
Z9	75.56	88.89	80.00	77.78	93.33	100.00	100.00	100.00	100.00	93.33
Z10	86.67	88.89	73.33	82.22	91.11	100.00	100.00	100.00	100.00	100.00
OA	73.00	78.78	74.11	76.11	80.89	92.33	93.00	94.00	94.33	96.00

Table 6. Variety identification rates (%) of different methods at 30% and 50% training ratios for late rice.

Species Category	Prediction Set Accuracy (Training Set Proportion Is 30%)					Prediction Set Accuracy (Training Set Proportion Is 50%)
Species Category	Baseline	PCA	LPP	t-SNE	LPSPL	Baseline	PCA	LPP	t-SNE	LPSPL
W1	46.67	49.52	55.24	51.43	50.48	68.00	66.67	64.00	64.00	76.00
W2	50.48	55.24	54.29	41.91	66.67	61.33	72.00	71.33	60.00	77.33
W3	40.00	40.95	33.33	41.91	45.71	61.33	57.33	58.67	56.00	53.33
W4	41.90	55.24	50.48	49.52	54.29	64.00	68.00	72.00	72.00	78.67
W5	47.62	47.62	52.38	43.81	52.38	61.33	60.00	65.33	53.33	66.67
W6	48.57	39.05	35.24	41.91	40.00	50.67	61.33	58.67	62.67	60.00
W7	66.67	56.19	58.10	63.81	67.62	84.00	76.00	76.00	78.67	80.00
W8	40.95	41.90	40.00	48.57	47.62	65.33	62.67	66.67	64.00	70.67
W9	57.14	65.71	64.76	50.47	78.10	65.33	64.00	76.00	69.33	85.33
W10	48.57	49.52	60.95	69.52	61.90	57.33	78.67	73.33	77.33	66.67
OA	48.86	50.10	50.48	50.29	56.48	63.87	66.67	68.20	65.73	71.47

Table 7. Variety identification rates (%) of different methods at 70% and 90% training ratios for late rice.

Species Category	Prediction Set Accuracy (Training Set Proportion Is 70%)					Prediction Set Accuracy (Training Set Proportion Is 90%)
Species Category	Baseline	PCA	LPP	t-SNE	LPSPL	Baseline	PCA	LPP	t-SNE	LPSPL
W1	77.78	73.33	82.22	86.67	88.89	100.00	86.67	100.00	93.33	86.67
W2	84.44	82.22	80.00	84.44	88.89	100.00	100.00	100.00	93.33	100.00
W3	75.56	84.44	77.78	80.00	75.56	100.00	93.33	100.00	93.33	100.00
W4	64.44	88.89	84.44	86.67	86.67	86.67	100.00	93.33	100.00	100.00
W5	84.44	75.56	80.00	71.11	88.89	100.00	93.33	100.00	93.33	100.00
W6	57.78	77.78	77.78	71.11	75.56	86.67	100.00	93.33	83.33	100.00
W7	80.00	80.00	80.00	88.89	88.89	93.33	93.33	93.33	100.00	100.00
W8	73.33	73.33	77.78	75.56	86.67	93.33	93.33	93.33	100.00	100.00
W9	84.44	86.67	93.33	86.67	97.78	80.00	100.00	100.00	100.00	100.00
W10	80.00	93.33	84.44	66.67	75.56	93.33	100.00	93.33	93.33	93.33
OA	76.22	81.56	81.78	79.78	85.33	93.33	96.00	96.67	95.00	98.00

Table 8. Kappa coefficients under the early and late rice datasets.

Method	Proportion of Early Rice Training Set				Proportion of Late Rice Training Set
Method	0.3	0.5	0.7	0.9	0.3	0.5	0.7	0.9
Baseline	0.3378	0.5452	0.7	0.9148	0.4318	0.5985	0.7358	0.9259
PCA	0.3879	0.5763	0.7457	0.9222	0.4455	0.6297	0.7951	0.9556
LPP	0.3699	0.5511	0.7124	0.9333	0.4498	0.6467	0.7996	0.963
t-SNE	0.4369	0.5831	0.7637	0.9395	0.5223	0.6602	0.7948	0.9447
LPSPL	0.4027	0.6	0.7877	0.9556	0.5164	0.683	0.8371	0.9778

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Long, C.-F.; Wen, Z.-D.; Deng, Y.-J.; Hu, T.; Liu, J.-L.; Zhu, X.-H. Locality Preserved Selective Projection Learning for Rice Variety Identification Based on Leaf Hyperspectral Characteristics. Agronomy 2023, 13, 2401. https://doi.org/10.3390/agronomy13092401

AMA Style

Long C-F, Wen Z-D, Deng Y-J, Hu T, Liu J-L, Zhu X-H. Locality Preserved Selective Projection Learning for Rice Variety Identification Based on Leaf Hyperspectral Characteristics. Agronomy. 2023; 13(9):2401. https://doi.org/10.3390/agronomy13092401

Chicago/Turabian Style

Long, Chen-Feng, Zhi-Dong Wen, Yang-Jun Deng, Tian Hu, Jin-Ling Liu, and Xing-Hui Zhu. 2023. "Locality Preserved Selective Projection Learning for Rice Variety Identification Based on Leaf Hyperspectral Characteristics" Agronomy 13, no. 9: 2401. https://doi.org/10.3390/agronomy13092401

APA Style

Long, C.-F., Wen, Z.-D., Deng, Y.-J., Hu, T., Liu, J.-L., & Zhu, X.-H. (2023). Locality Preserved Selective Projection Learning for Rice Variety Identification Based on Leaf Hyperspectral Characteristics. Agronomy, 13(9), 2401. https://doi.org/10.3390/agronomy13092401

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Locality Preserved Selective Projection Learning for Rice Variety Identification Based on Leaf Hyperspectral Characteristics

Abstract

1. Introduction

2. Materials

2.1. Experimental Site

2.2. Experimental Design

2.3. Data Acquisition

3. Methods

3.1. Locality Preserved Selective Projection Learning (LPSPL)

3.2. LPSPL for Rice Variety Identification

3.3. Support Vector Machine Classifier

3.4. Control Methods and Evaluation Indicators

3.4.1. Control Methods

3.4.2. Evaluation Indicators

4. Experimental Results

4.1. Datasets

4.2. Experiment Setup

4.3. Feature Selection Based on LPSPL

4.4. Comparison of the Classification Accuracy with Other Methods

4.5. Comparison of the f1-Score with Other Methods

4.6. Comparison of the Kappa Coefficients with Other Methods

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI