Prediction of Oleic Acid Content of Rapeseed Using Hyperspectral Technique

In order to detect the oleic acid content of rapeseed quickly and accurately, we propose, in this paper, an artificial BP neural networks based model for predicting oleic acid content by using rapeseed’s hyperspectral information. Four types of spectral features are selected for our investigation, namely multifractal index, sensitive band, trilateral parameters, and spectral index. Both univariate variable and multiple variables are considered as our model input. The result shows that the combined feature has higher precision and better stability than when using a single parameter. An interesting finding shows that the combined feature involving multifractal parameters can significantly improve the model performance. Taking the combined feature {MF-h(0), SB-DR574, SPI-NDSI(R575, R576)} as the model input, the constructed BP (back propagation) neural networks model has the highest precision, with the coefficient of determination (R2) 0.8753, root mean square error (RMSE) 1.0301, and relative error (RE) 1.047%. This result provides some experience for the rapid detection of rapeseed’s oleic acid content.


Introduction
With the increase of rapeseed oil production in the world, people pay more and more attention on how to improve rape quality as well as to cultivate high oleic rapeseed. The higher the oleic acid content is, the higher the nutritional value and the longer the shelf life will be expected. The oleic acid is an indispensable nutrient element in animal food. It plays a pivotal role in the metabolism of humans and animals. In addition, high oleic sauerkraut oil can effectively prevent human cardiovascular disease. Due to its high economic and nutritional value, evaluation and predictions of the high oleic acid rapeseed have become a hot research area in recent years [1].
The traditional determination of oleic acid content relies on the gas chromatography method, which is time consuming and labor intensive. The prominent disadvantage is the destruction of seeds, which may disqualify the seed from being used used for sowing and reproduction. By this token, this method is not suitable for analyzing and the screening of large quantities of breeding materials of rare and precious quality [2]. Therefore, discovering methods that can quickly and accurately provide diagnosis of the fatty acid content in rapeseed is a critical job for the improvement of rapeseed fatty acid.
The rapid development of hyperspectral technology has created conditions for solving this problem. Scientist apply the hyperspectral technology to measure the seed spectrum information for crop growth diagnosis. The advantages are fast and non-destructive and therefore, in recent years, it has become the important pattern reform for determining the content of oleic acid. This has also attracted many scholars to study hyperspectral technology in crop diagnosis [3][4][5][6]. Due to the fact that the hyperspectral imaging technology combines image technology and spectroscopy technology, which can obtain image information and biochemical information of the research object at the same time, it gradually

Materials
The Selected Materials Two rapeseed varieties (Xiangyou 708 and Xiangyou 710) are used as the research objects. Our field experiment is located in the Yunyuan Base of Hunan Agricultural University (113 • 4 E, 28 • 10 N) in 2018 and 2019. The field soil is black loam soil with rice-rapeseed rotation.

Data Collection
The SOC710 portable hyperspectral imager (wavelength range 400-1000 nm, resolution 4.6875 nm), produced by Surface Optics Corporation (11555 Rancho Bernardo Road San Diego, CA 92127, USA) of the United States, is used for spectrum measurement. The dark box of the optical experiment (task cabin) is placed in dark room conditions. The bottom area of the box is 50 * 60 cm and the height is 100 cm with a movable base. The interior of the dark box has a diffuse reflection coating. Four sets of surround-type built Appl. Sci. 2021, 11, 5726 3 of 12 with 70 W halogen light source with an incident angle 45 • was used. It has a cooling device (in accordance with the requirements of the SOC710 hyperspectral imaging spectrometer). The instrument is placed vertically and directly above the target with the exposure distance of 300 mm, preheated for 15 min, and then used to measure spectral information. We put the rape seeds in 16 circular dishes. In every dish, 5 non-overlapping rectangular areas (see Figure 1) are randomly selected as the region of interest (ROI) and then 80 ROIs can be obtained. In each ROI, we randomly measure the spectral reflectivity five times and average them as the final spectral reflectivity value of the ROI. In this manner, 80 spectral reflectivity can be obtained that is labelled as 1~80. Table 1 shows the statistical characteristics of rapeseed oleic acid content. bottom area of the box is 50*60 cm and the height is 100 cm with a movable base. The interior of the dark box has a diffuse reflection coating. Four sets of surround-type built with 70 W halogen light source with an incident angle 45° was used. It has a cooling device (in accordance with the requirements of the SOC710 hyperspectral imaging spectrometer). The instrument is placed vertically and directly above the target with the exposure distance of 300 mm, preheated for 15 min, and then used to measure spectral information.
We put the rape seeds in 16 circular dishes. In every dish, 5 non-overlapping rectangular areas (see Figure 1) are randomly selected as the region of interest (ROI) and then 80 ROIs can be obtained. In each ROI, we randomly measure the spectral reflectivity five times and average them as the final spectral reflectivity value of the ROI. In this manner, 80 spectral reflectivity can be obtained that is labelled as 1~80. Table 1 shows the statistical characteristics of rapeseed oleic acid content.  On the other hand, all of the rapeseeds in every ROI are grinded to measure the fatty acid content of rapeseed and to further to obtain rapeseed oleic acid data by using Agilent 7890 Inductively Coupled Plasma Mass Spectrometer in the Oil Research Institute of Hunan Agricultural University, where the indoor temperature is 16 ℃ and the relative humidity is 40%. The experimental environment is as follows: • From the above process, we obtained 80 samples of spectral reflectivity and oleic acid data, out of which 64 were used as the training samples and the leftovers were used as the test samples.  On the other hand, all of the rapeseeds in every ROI are grinded to measure the fatty acid content of rapeseed and to further to obtain rapeseed oleic acid data by using Agilent 7890 Inductively Coupled Plasma Mass Spectrometer in the Oil Research Institute of Hunan Agricultural University, where the indoor temperature is 16°C and the relative humidity is 40%. The experimental environment is as follows: • From the above process, we obtained 80 samples of spectral reflectivity and oleic acid data, out of which 64 were used as the training samples and the leftovers were used as the test samples.
Divide the profiles {x t } N t=1 into Ns ≡ [N/s] non-overlapping segments with equal length s. A short part of data at the end of profiles could be left, since that the N is not always an integral multiple of the given scale s. In order to prevent the loss of original information, the dividing procedure is repeated starting from the opposite end of the profile. Thus, a total of 2Ns segments is obtained. Accordingly, the v-th segment is denoted as [lv + 1, lv + s], where lv = (v − 1)s for v = 1, 2, . . . , Ns and lv = N − (v − Ns)s for v = Ns + 1, Ns + 2, . . . , 2Ns.
Next, in each segment v, determine the local trend by using polynomial fitting y v (k). Denote y s (k) by the detrended series in v-th segment, as shown in the following.
In this work, we use a first order polynomial to fit the trend. Following this, we calculated the variance for every detrended series in segment v.
In addition, the averaged q-order fluctuation function (q = 0) over the all the segment can be calculated according to the following: and when q = 0, according to L' Hospital, the F q (s) is determined by the following.
Finally, vary the scale s and repeat Equations (1)-(4) to calculate the corresponding q-th fluctuation function F q (s). If the spectrum possesses fractal nature, there exists a power-law scaling behavior between the F q (s) and s as described in the following.
The index h(q) can be obtained by the linear fitting of F q (s) and s in a double-log plot. The h(q) is called generalized MF-st exponent and describes the long-term correlation for the original spectrum. Generally, h(q) > 0.5 expresses persistence of the spectrum reflectance series {x i } N i=1 and the h(q) < 0.5 is anti-persistent. The multifractal nature is present in case of dependence of h(q) on q. In order to measure the degree of multifractality, the ∆h defined by Equation (6): where h max (q) and h min (q) are the maximum and minimum of the h(q) for the considering qs, respectively. In this work, we took q ∈ [−3, 3]. The larger ∆h(q) is, the higher the strength of multifractality is expected to be. According to the typical multifractal analysis (MFA), the quality index τ(q), which can also express the multifractal nature, is related with h(q) as the following: where D f is the topological dimension of the object. For the spectral reflectance series, the D f = 1. When the τ(q) is the nonlinear function of q, the multifractality of object can be observed.
Via the Legendre transformation, the Lipschitz-Hölder index α(q) and multifractal spectrum f (α) are determined by the following.
In practice, the ∆α is the span of the multifractal spectrum. The larger ∆α is, the more uneven the reflectance distribution is and the greater the fluctuation is observed. In this work, above 12 multifractal parameters are employed as augments for our consideration, namely h(±3), h(±2), h(±1), h(0), ∆h, α max , a min , and ∆α.

BP Neural Networks
BP (back propagation) neural network is a concept put forward by Rumelhart and McClelland in 1986 [19]. It is a multilayer feedforward neural network trained according to the error back propagation algorithm. The structure of BP neural network contains input layer, hidden layer, and output layer, out of which, there are one or more layers in the hidden layer. Each neuron in two adjacent layers is connected to all neurons, while there is no connection between neurons in the same layer. In this manner, the BP neural network can deal with more complex computational tasks. Since the BP neural network has the back-propagation mechanism, the mean square of the difference between the actual output and the expected output can be regarded as an error signal to propagate back along the network in supervised learning. During the propagation process, the weight of each layer will be adjusted. This process ends when the error is lower than the target value [20]. In this work, we utilize the BP neural network to construct oleic acid content prediction model.
For the 80 groups of rapeseed samples, the hyperspectral features are regarded as the input layer, while the oleic acid content of rapeseed is used as the output layer in BP neural network. The number of hidden layer nodes p is based on the range given by the empirical Equation (10). By using a trial-and-error method for multiple training (100 training times for each sample), the optimal number of nodes is 9.
k is the number of input layer units; p is the number of hidden layer nodes; m is the number of output layer units; and a is the constant of 1-10. Set the number of iterations as 1000 and the learning accuracy as 0.01. The 64 samples are then randomly selected as the training set and the leftover 16 samples are regarded as the test set. By using the Trainlm training method [21] with cross-validation, the BP network predication model can be constructed and optimized to select the best model parameters.

Evaluate Indicator
In order to evaluate the model performance, three indicators, namely the coefficient of determination(R 2 ), root mean square error (RMSE), and relative error (RE), are employed in this work and shown as follows: Appl. Sci. 2021, 11, 5726 where Y i is the observed value,Ŷ i is the predicted value, Y is the average observed value, Y p is the average predicted value, and n is the total number of samples. The three evaluators describe the model's interpretation ability, model error, and model relative error, respectively.

Feature Selection
The red-edge parameter is one of the most significant characteristics of the green plant spectrum. It refers to the spectral position (wavelength) corresponding to the maximum value of the first derivative spectrum in the red-light range (680~760 nm). The red-edge amplitude refers to the maximum value of the first derivative spectrum in the red-light range. The red-edge area is the integral of the first derivative in the red-light range. Similarly, the blue-edge (490~530 nm) parameter and the yellow-edge parameter (560~640 nm) are also regarded as important features of the green plant. They are collectively called trilateral parameter (TriP) [22].
Spectral index is a linear or nonlinear combination of spectral reflectance at some specific bands, which has a certain meaning for the object. Generally, the Ratio spectral index (RSI), Normalized difference spectral index (NDSI), and Difference spectral index (DSI) are the three most significant spectral indexes, which are selected to our consideration and shown in Equations (14)-(16) (R λ 1 and R λ 2 denote the reflectance of the wavelength λ 1 and λ 2 , respectively). The hyperspectral parameters mentioned in this paper are summarized in Table 2. The multifractal feature captures the global singularity and correlation of the hyperspectral reflectance, which may reflect the essential characteristics of the spectral reflectance of rapeseed samples with different oleic acid content. The spectral index expresses the combined characteristics of the reflectance at different bands. The trilateral parameter focuses on the hyperspectral characteristics change at special locations. The sensitive band locates the band where the hyperspectral reflectance has the most significant correlation. In the following, we use these four types of parameters as features to predict the oleic acid content of rapeseed.

Correlation Analysis of Spectral Parameters and Oleic Acid Content
In order to choose the best parameters as the argument model for the four types of hyperspectral parameters mentioned above, we conducted a correlation analysis for the oleic acid content of rapeseed with all the parameters and reported the results in Figure 2. As shown in those subplots, for the multifractal features, h(0) possessed the best correlation coefficient 0.7898 and is superior to other Hurst exponents. For the trilateral parameters, the best correlation coefficient 0.7751 comes from the area of the yellow edge. In subplot Figure 2c, the original reflectance (blue line) and first-order derivative reflectance (red line) shows different performance. The maximum of the correlation coefficient is 0.782 and the corresponding wavelength is at 574 nm. The correlation coefficients between the rapeseed oleic acid content and the three spectral indexes of RSI, NDSI, and DSI are shown in subplot Figure 2d-f, respectively. By comparison, NDSI brings the best result with correlation coefficient being 0.7950 and the corresponding optimal spectral index is NDSI (R 575 and R 576 ). Normalized difference spectral index [28] The multifractal feature captures the global singularity and correlation of the hyperspectral reflectance, which may reflect the essential characteristics of the spectral reflectance of rapeseed samples with different oleic acid content. The spectral index expresses the combined characteristics of the reflectance at different bands. The trilateral parameter focuses on the hyperspectral characteristics change at special locations. The sensitive band locates the band where the hyperspectral reflectance has the most significant correlation. In the following, we use these four types of parameters as features to predict the oleic acid content of rapeseed.

Correlation Analysis of Spectral Parameters and Oleic Acid Content
In order to choose the best parameters as the argument model for the four types of hyperspectral parameters mentioned above, we conducted a correlation analysis for the oleic acid content of rapeseed with all the parameters and reported the results in Figure 2. As shown in those subplots, for the multifractal features, h(0) possessed the best correlation coefficient 0.7898 and is superior to other Hurst exponents. For the trilateral parameters, the best correlation coefficient 0.7751 comes from the area of the yellow edge. In subplot Figure 2c, the original reflectance (blue line) and first-order derivative reflectance (red line) shows different performance. The maximum of the correlation coefficient is 0.782 and the corresponding wavelength is at 574 nm. The correlation coefficients between the rapeseed oleic acid content and the three spectral indexes of RSI, NDSI, and DSI are shown in subplot Figure 2d-f, respectively. By comparison, NDSI brings the best result with correlation coefficient being 0.7950 and the corresponding optimal spectral index is NDSI (R575 and R576).  (d-f) represent the correlation coefficient between the oleic acid content and three spectral indexes, namely the ratio spectral index (RSI), normalized spectral index (NDSI), and difference spectral index (DSI).
According to the above correlation analysis, we select the first two parameters with the best correlation in each type of hyperspectral feature, as listed in Table 3. The eight parameters with the higher correlation coefficients are greater than 0.7, which passes the correlation test under the 0.01 significance level. Table 4 shows the statistics of the eight selected spectral feature. In the following, we use the above eight characteristic parameters to establish an estimation model for oleic acid content with BP neural network.

Estimation Model of Oleic Acid Content
The above four types of spectral feature MF-♦, SB-♦, TriP-♦, and SPI-♦ are combined as the input layer of BP neural network model, meanwhile the oleic acid content of rapeseed is combined as the output layer. The symbol '♦' in MF-♦, SB-♦, TriP-♦, and SPI-♦ denotes h(0) and h(1), R 818 and DR 574 , SDy and SDr, and DSI (R 572 , R 574 ) and NDSI (R 575 , R 576 ), respectively.
According to the number of characteristic parameters in the combination, univariate, bivariate, ternary, and quaternary models are constructed (At most, only one of each type of feature is selected for combination feature). In this manner, 8 univariate combinations, 24 bivariate combinations, 32 ternary combination, and 16 quaternary combinations can be obtained. We then used the R 2 , RMSE, and RE to evaluate the performance of BP-based neural network model. Since the result of the BP-based algorithm depends on the random initial weight, the modelling process is repeated 100 times and averaged for comparison. Figure 3 shows the average of R 2 over all possible combinations for the univariate, bivariate, ternary, and quaternary models. It clearly shows that the R 2 increases (meanwhile the error-bar is decreasing) with increasing the number of parameters in the combination feature. Table 5 lists the best model performance of the four corresponding combinations for the training set and testing set, respectively. An interesting finding uncovers that the performance obtained from the multivariate combination is significantly better than that of the univariate.
According to the Table 5, the model performance obtained from the ternary combination and quaternary combinations is significantly superior to that of the univariate and slightly better than that of bivariate combination.   According to the Table 5, the model performance obtained from the ternary combination and quaternary combinations is significantly superior to that of the univariate and slightly better than that of bivariate combination.      According to the Table 5, the model performance obtained from the ternary combination and quaternary combinations is significantly superior to that of the univariate and slightly better than that of bivariate combination.   In addition, as mentioned in Section 3.1, the multifractal feature depicts the global characteristic of the hyperspectral reflectance, which may bring better model performance for predicting rapeseed's oleic acid content. In order to investigate this, we compared the model results obtained between the feature combinations including and excluding the Hurst exponent. Here, we considered the combinations of univariate, bivariate, and ternary cases. For example, for the bivariate combination, there are 12 combinations including MF parameters (h(0) and h(1)) and other 12 combinations exclude them. The averaged results over the all-possible combinations are listed in Table 6. It can be observed from the results that the Hurst exponent is not as good as the traditional spectral parameters when the univariate is considered as the argument. However, the Hurst exponent exhibits its superiority in the case of the multivariate model because it brings a significantly better model performance. This finding suggests that the multifractal feature should be an important supplement to the traditional spectral characteristics when we construct the rapeseed's oleic acid content evaluation model. In addition, as mentioned in Section 3.1, the multifractal feature depicts the global characteristic of the hyperspectral reflectance, which may bring better model performance for predicting rapeseed's oleic acid content. In order to investigate this, we compared the model results obtained between the feature combinations including and excluding the Hurst exponent. Here, we considered the combinations of univariate, bivariate, and ternary cases. For example, for the bivariate combination, there are 12 combinations including MF parameters (h(0) and h(1)) and other 12 combinations exclude them. The averaged results over the all-possible combinations are listed in Table 6. It can be observed from the results that the Hurst exponent is not as good as the traditional spectral parameters when the univariate is considered as the argument. However, the Hurst exponent exhibits its superiority in the case of the multivariate model because it brings a significantly better model performance. This finding suggests that the multifractal feature should be an important supplement to the traditional spectral characteristics when we construct the rapeseed's oleic acid content evaluation model. As the last important task, the model test will show the model stability. In order to perform this, we test the model by changing the number of training samples. According to the parameter combinations listed in Table 5, 48-72 samples are randomly selected as  As the last important task, the model test will show the model stability. In order to perform this, we test the model by changing the number of training samples. According to the parameter combinations listed in Table 5, 48-72 samples are randomly selected as the training set, the three evaluators of R 2 , RMSE, and RE are calculated and shown in Figure 6. It is clearly shown that there is non-significant change of the three indicators with the increasing training numbers, which implies that the selected feature combination brings stable model result.

Conclusions
Hyperspectral technology possesses the advantages of being fast, non-destructive, and highly efficient and, therefore, it can play an important role in crop nutrition diagnosis. In this paper, we attempt to use the hyperspectral characteristics of seeds to construct the inversion model of rapeseed's oleic acid content. The proposed inversion model provides a helpful experience for estimating the oleic acid non-destructively. In practice, based on rapeseed hyperspectral data, four types of spectral features are considered, namely multifractal parameters, trilateral parameters, spectral indices, and sensitive bands. In order to select optimal characteristic parameters, we first choose two features in each type of spectral features according to correlation analysis. Then, by using the selected features as the model input of univariate (one feature) and multivariate combination (at least two features), the BP neural network model is established for an oleic acid prediction.
The results show that multivariate parameters can greatly improve the model's accuracy and stability. An interesting finding shows that the combined features including multifractal parameters will bring about better model performance. The best model performance comes from the combined parameters {MF-h(0), SB-DR574, SPI-NDSI(R575, R576)}. Model test shows that our model has good robustness.

Data Availability Statement:
The raw/processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.

Conflicts of Interest:
The authors declare no conflicts of interest.

Conclusions
Hyperspectral technology possesses the advantages of being fast, non-destructive, and highly efficient and, therefore, it can play an important role in crop nutrition diagnosis. In this paper, we attempt to use the hyperspectral characteristics of seeds to construct the inversion model of rapeseed's oleic acid content. The proposed inversion model provides a helpful experience for estimating the oleic acid non-destructively. In practice, based on rapeseed hyperspectral data, four types of spectral features are considered, namely multifractal parameters, trilateral parameters, spectral indices, and sensitive bands. In order to select optimal characteristic parameters, we first choose two features in each type of spectral features according to correlation analysis. Then, by using the selected features as the model input of univariate (one feature) and multivariate combination (at least two features), the BP neural network model is established for an oleic acid prediction.
The results show that multivariate parameters can greatly improve the model's accuracy and stability. An interesting finding shows that the combined features including multifractal parameters will bring about better model performance. The best model performance comes from the combined parameters {MF-h(0), SB-DR 574 , SPI-NDSI(R 575 , R 576 )}. Model test shows that our model has good robustness.

Data Availability Statement:
The raw/processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.