^{*}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Two sensitive wavelength (SW) selection methods combined with visible/near infrared (Vis/NIR) spectroscopy were investigated to determine the levels of some trace elements (Fe, Zn) in rice leaf. A total of 90 samples were prepared for the calibration (n = 70) and validation (n = 20) sets. Calibration models using SWs selected by LVA and ICA were developed and nonlinear regression of a least squares-support vector machine (LS-SVM) was built. In the nonlinear models, six SWs selected by ICA can provide the optimal ICA-LS-SVM model when compared with LV-LS-SVM. The coefficients of determination (R^{2}), root mean square error of prediction (RMSEP) and bias by ICA-LS-SVM were 0.6189, 20.6510 ppm and −12.1549 ppm, respectively, for Fe, and 0.6731, 5.5919 ppm and 1.5232 ppm, respectively, for Zn. The overall results indicated that ICA was a powerful way for the selection of SWs, and Vis/NIR spectroscopy combined with ICA-LS-SVM was very efficient in terms of accurate determination of trace elements in rice leaf.

Recently, variable selection or uninformative variable elimination has attracted more and more attention for the development of multi-component calibrations using spectroscopic techniques. The recently developed methods for variable selection include generalized simulated annealing [

Various calibration methods have been used to relate near-infrared spectra (NIRS) with measured properties of materials. Principal components regression (PCR), partial least squares (PLS), multiple linear regression (MLR) and artificial neural networks (ANN) are the most used multivariate calibration techniques for NIRS [

The least-squares support vector machine (LS-SVM) can handle the linear and nonlinear relationships between the spectra and response chemical constituents [

The objective of this study were (1) to investigate the feasibility of using Vis/NIRS to predict trace elements such as Fe and Zn in rice leaf; (2) to compare the performance of ICA and the newly proposed ICA-LS-SVM model, variable selection methods (PCA, LVA and ICA) to predict the trace elements in rice.

The experimental samples in this study were 15 basins of rice, which were planted in conditioned soil with three nitrogen levels: 0, 120, and 240 kg/ha. To avoid accidental damage to the basins or samples, a duplicate set of basins was prepared, so there were 30 basins in total. For each nitrogen level, there were 10 basins, including the additional basins. Each basin's inner diameter and height were 30 and 45 cm, respectively. Each basin contained 10 kg soil and four rice plants. The basins were placed in a slotted field using the surrounding soil for backfill, and they were placed along the line from north to south. The soil used in this experiment was from the 20 to 40 cm depth of the experimental field.

Three leaf samples from each of 15 basins were selected for spectral measurement. Samples were also selected from another 15 replicate basins, so a total of 90 samples were obtained. The measurements were made at the booting stages. All 90 leaf samples reflectance measurements were made using a portable Spectroradiometer (FieldSpec Vis/NIR, Analytical Spectral Device, Boulder, CO, USA), with a sensitivity range from 325 to 1,075 nm. The instrument uses a sensitive 512-element, photo-diode array spectroradiometer, with a resolution of 3.5 nm. The scan number for each spectrum was set to 10 at the same position, and for each sample, three reflection spectra were taken, thus a total of 30 data points were properly stored for later analysis. To achieve the relative reflectance measurements, the white reference (a white panel purchased with the spectroradiometer used as white reference) was collected before scanning samples until a nice, clean, 100% reference line was obtained. All leaves were randomly divided into two sets, one was used as a calibration set (n = 70) and the remaining samples as a validation set (n = 20). In order to compare the performance of different calibration models, the samples in the calibration and validation sets were kept the same for all the models.

In the study, we used the national standard method to measure the trace elements Fe and Zn [_{3}, HClO_{4}, and distilled water were diluted and adjusted to the required concentration solution. Rice leaf samples were finely ground and then passed through a 20 mesh sieve to obtain very fine particles. An air-dried, ground and sieved sample (2.0 g) was placed in an Erlenmeyer flask and the extracting solution (20 mL) was added. Then it was placed on a magnetic stirrer and the mixture was stirred for 20 minutes. The resulting solution was filtered through a filter paper into a 50 mL polypropylene vial and diluted to 50 mL with the extracting solution. After that, a Perkin-Elmer Analyst™800 atomic absorption spectrometer (PerkinElmer, Inc., Shelton, CT, USA) was used to measure the signal strength of the elements Fe and Zn in each Erlenmeyer flask, and the results were shown using the software package of the instrument. After calculation, the Fe content was from 39.951 ppm to 134.254 ppm, and Zn content was from 9.085 ppm to 49.927 ppm in all 90 samples.

Due to the potential system imperfections, obvious scattering noises could be observed at the beginning and end of the spectral data. Thus, the first and last 75 wavelength data points were eliminated to improve the measurement accuracy,

Reducing the number of inputs to the LS-SVM can reduce training time. Furthermore, it can also reduce repetition and redundancy of the input spectra data. PCA is a method of data reduction that constructs new uncorrelated variables, known as principal components (PCs). They account for as much information as possible for the variability of the original variables, which are then used as the inputs of network. In addition, PCs can also eliminate noises and random errors in the original data. The equation of PCA could be described as follows:

In the development of PLS model, calibration models were built between the spectra and the content of trace element (Fe and Zn), full cross-validation was used to evaluate the quality and to prevent over-fitting of calibration models. Latent variables (LVs) can be used to reduce the dimensionality of data, and the optimal number of LVs was determined by the lowest value of predicted residual error sum of squares (PRESS). The prediction performance was evaluated by the coefficients of determination (R^{2}) and root mean square error of calibration (RMSEC) or prediction (RMSEP), and bias. The ideal model should have higher r value, lower RMSEC, RMSEP and bias. The RMSEP and bias could be calculated via:
_{i}_{i}_{p}

Independent component analysis is a well-established statistical signal processing technique that aims to decompose a set of multivariate signals into a base of statistically independent components with the minimal loss of information content. The independent components are latent variables, meaning that they cannot be directly observed, and the independent component must have non-Gaussian distributions. A brief explanation of noise-free ICA model could be expressed by the following equation:

There are lots of algorithms for performing ICA [

Least squares-support vector machine can work with linear or non-linear regression or multivariate function estimation in a relatively fast way [_{i}) is the kernel function, xi is the input vector, α_{i} is Lagrange multipliers called support value, b is the bias term.

In the model development using LS-SVM and radial basis function (RBF) kernel, the optimal combination of gam(γ) and sig^{2}(σ^{2}) parameters was selected when resulting in smaller root mean square error of cross validation (RMSECV). In this study, gam(γ) were optimized in the range of 2^{−1}–2^{10} and 2–2^{15} for sig^{2}(σ^{2}) with adequate increments. These ranges were chosen from previous studies where the magnitude of parameters was optimized. The grid search had two steps the first step was for a crude search with a large step size, and the second step was for the specified search with a small step size. The free LS-SVM toolbox (LS-SVM v 1.5, Suykens, Leuven, Belgium) was applied with MATLAB 7.0 to develop the calibration models.

The lack of trace elements such as Fe, S, Mg, Mn may reduce the chlorophyll content of plant leaf, and will affect the solar radiation absorption by the leaf, so the changes of plant nutritional elements such as nitrogen, water content, and trace elements may directly result in the spectral reflectance changes [

Calibration models were built between the spectra and content of trace elements (Fe and Zn). Different LVs were applied to build the calibration models, and no outliers were detected in the calibration set during the development of PLS models. The models were used to predict the left 20 samples, and the best performance was achieved with six LVs for Fe and five LVs for Zn. The R^{2}, RMSEP and bias were 0.3820, 26.1431 ppm and −9.3674 ppm for Fe, 0.5800, 6.9637 ppm and 2.2320 ppm for Zn, respectively.

PCs obtained from PCA were applied as inputs of LS-SVM models to improve the training speed and reduce the training error of Vis/NIR model because the training time increased with the square of the number of training samples and linearly with the number of variables. From the aforementioned analysis of the performance of PCA models, the PCs from the Vis/NIR region were used as new eigenvectors to enhance the features of spectra and reduce the dimensionality of the spectra data matrix. Several PCs were extracted from the spectra of 90 samples.

Before the LS-SVM calibration model was built, three steps are crucial for the optimal input feature subset, proper kernel function and the optimal kernel parameters. Firstly, the six PCs obtained from PCA analysis were used as the input data set, and the accumulated contribution of it was reached 95.2%. Secondly, radial basis function could handle the nonlinear relationships between the spectra and target attributes. Finally, two important parameters gam (γ) and sig^{2} (σ^{2}) should be optimal for RBF kernel function as aforementioned in multivariate analysis.

The performance of the Vis/NIR models was evaluated by 20 samples in validation set. The R^{2}, RMSEP and bias for validation sets were 0.4012, 23.9920 ppm and −7.8789 ppm for Fe, 0.6109, 6.5308 ppm and 2.0571 ppm for Zn, respectively.

Latent variables obtained from PLS were applied as inputs of LS-SVM models to improve the training speed and reduce the training error of Vis/NIR model. From the aforementioned analysis of the performance of PLS models, the LVs from the Vis/NIR region were used as new eigenvectors to enhance the features of spectra and reduce the dimensionality of the spectra data matrix. Several LVs were extracted from the spectra of 90 samples.

The performance of the Vis/NIR models was evaluated by 20 samples in validation set. With a comparison of the results for calibration and validation sets, the best performance was achieved with six LVs for Fe and five LVs for Zn. The R^{2}, RMSEP and bias for validation sets were 0.4070, 23.3845 ppm and −7.4975 ppm for Fe, 0.6067, 6.4869 ppm and 2.2336 ppm for Zn, respectively.

Independent component analysis was applied for the selection of sensitive wavelengths (SWs), which could reflect the main features of the raw absorbance spectra. FastICA was used to the preprocessed spectra data, and the main absorbance peaks and valleys were indicated by the spectra of ICs. The SWs were selected by the weights of the first four ICs, which wavelengths with the highest weights of each IC were selected as the SWs. ^{2}, RMSEP and bias were 0.6189, 20.6510 ppm and −12.1549 ppm for Fe, 0.6731, 5.5919 ppm, 1.5232 ppm for Zn, respectively.

Ma ^{2} value of 0.623 [

Compared with the above PLS, PCA-LS-SVM, LV-LS-SVM and ICA-LS-SVM model, the nonlinear PCA-LS-SVM, LV-LS-SVM, ICA-LS-SVM models turned out to be better than linear model of PLS. The best model was obtained by using the ICA-LS-SVM model for prediction of trace elements in rice. ^{2} in the four models.

The ICA-LS-SVM models had a better performance, and the reason might be that the LS-SVM models took the nonlinear information of the spectral data into consideration and the nonlinear information had improved the prediction precision. The ICs from ICA were obtained by a high-order statistic that is much stronger condition than orthogonality, so the SWs selected from ICs were more effective, and it could be very helpful for the development of portable instrument or real-time monitoring of the rice trace elements.

Vis/NIR spectroscopy was successfully utilized for the determination of some trace elements (Fe, Zn) in rice. A new combination of ICA-LS-SVM was proposed with comparison of nonlinear LV-LS-SVM models, PCA-LS-SVM modes and linear PLS models. ICA-LS-SVM model turned out to be the best for prediction of trace elements in rice, and was better than the nonlinear LV-LS-SVM model. The R^{2}, RMSEP and bias by ICA-LS-SVM were 0.6189, 20.6510 ppm and −12.1549 ppm for Fe, and 0.6731, 5.5919 ppm and 1.5232 ppm for Zn, respectively. The overall results demonstrated ICA was a powerful tool for variable selection, and the newly proposed ICA-LS-SVM method could be applied as an alternative fast and accurate method for the determination of trace elements in rice.

This work was supported by the 863 National High Technology Research and Development Program of China (2011AA100705), Zhejiang Provincial Natural Science Foundation of China (Z3090295), and China Postdoctoral Science Foundation (2011M501009).

(

(

(

(

(

The statistic values of Fe and Zn contents in calibration and validation sets.

Fe | Calibration | 70 | 39.951–134.254 | 79.245 | 21.945 |

Validation | 20 | 40.011–133.992 | 80.649 | 24.003 | |

All samples | 90 | 39.951–134.254 | 80.935 | 23.669 | |

Zn | Calibration | 70 | 9.085–49.927 | 23.710 | 9.221 |

Validation | 20 | 9.103–49.849 | 23.134 | 9.518 | |

All samples | 90 | 9.085–49.927 | 23.408 | 9.928 |

The parameters of RMSEP and R^{2} in the four models.

^{2} |
|||||
---|---|---|---|---|---|

PLS | Fe | 6 | 0.3820 | 26.1431 | −9.3674 |

Zn | 5 | 0.5800 | 6.9637 | 2.2320 | |

PCA-LS-SVM | Fe | 6 | 0.4012 | 23.9920 | −7.8789 |

Zn | 6 | 0.6109 | 6.5308 | 2.0571 | |

LV-LS-SVM | Fe | 6 | 0.4070 | 23.3845 | −7.4975 |

Zn | 5 | 0.6067 | 6.4869 | 2.2336 | |

ICA-LS-SVM | Fe | 6 | 0.6189 | 20.6510 | −12.1549 |

Zn | 6 | 0.6731 | 5.5919 | 1.5232 |