In the past few decades, near-infrared spectroscopy (NIR) has been widely used in various fields, because of its fast speed and the fact that it does not cause damage to sample characteristics. These areas include pharmaceutical [1
], biomedical [4
], petrochemical [5
], agricultural [6
], food [8
]. In the NIR analysis, the most frequently used multivariate calibration techniques are partial least squares regression (PLS) [11
] and principal component regression (PCR) [13
]. However, the established calibration model is often outdated or unsuitable for new samples due to factors of the diversity of measuring instruments and measuring environments, as well as the variability of the materials being measured. New samples refer to any samples not included in the calibration model, such as those samples collected at different times or with different instruments. Frequent calibration is not desirable because a large amount of time and resources are devoted to establishing calibration models. One advisable option would be to carry out the calibration transfer.
Numerous relevant calibration transfer methods have been proposed in articles. In general, these methods can be divided into two types: transfer standard and non-standard. The transfer standard requires the same standard samples to be measured on the master instrument and the slave instrument. In this type of method, according to the stages in which the adjustment occurs are further divided into four types.
The first type is the method of correcting the slave spectra. In the standard samples, the slave spectra are made as close as possible to the corresponding master spectra by a transfer matrix. The most widely used are direct standardization (DS) and piecewise direct standardization (PDS) methods [15
]. In the PDS method, the transfer relationship between the master spectra and the slave spectra from the sliding window is established at each wavelength of the master spectra, and finally a band-shaped transfer matrix is formed for correcting the slave spectra.
The second type is the method of simultaneously correcting the master spectra and the slave spectra. Commonly used is calibration transfer by the generalized least squares (GLSW) method [17
]. GLSW uses the difference between the standard set of the master instrument and the slave instrument to build the weight matrix, and then uses the weight matrix to reduce the weight of spectral feature to be suppressed. A detailed description of the weight matrix is provided in [17
] and [18
The third type is the method of correcting the predicted values. Mainly the slope and bias correction (SBC) method [19
], this method considers that there is a linear relationship between the predicted values of the slave spectra obtained by the master spectral model and the response variable, usually using ordinary least squares method to calculate this relationship. The predicted values are then corrected using this relationship.
The fourth type is the projection method. For example, calibration transfer method based on canonical correlation analysis (CCACT) [20
], which uses CCA to find the set of canonical variables that are maximally correlated between the standard set of the master instrument and the slave instruments. Further explore the transfer relationship between the two canonical variables.
In practical applications, it is difficult or even impossible to measure the same samples on two instruments due to the position of the measuring instrument and the stability of the samples, etc. At this time, it is necessary to use a method that does not require measurement of the same standard samples, that is, a non-standard method. These methods are mainly divided into two types.
One is the signal preprocessing method, which removes the baseline offset and the linearly sloped baselines by simple mathematical operations of the first derivative and the second derivative. Common methods include multiplicative signal correction (MSC) [21
], finite impulse response (FIR) filtering [22
], generalized moving window MSC (W-MSC) [21
], OSC [23
], etc., wherein FIR and MW-MSC are variants of MSC. However, it must be noted that these simple preprocessing methods do not handle complex changes between the master spectra and the slave spectra.
The other is the projection method. It includes transfer component analysis (TCA) [25
] and kernel principal component analysis (KPCA) [26
]. TCA projects the master spectra and the slave spectra into a common feature space in which the distribution of the master spectra and the slave spectra are as similar as possible while retaining the key properties of the spectra. TCA and KPCA use different kernels, so they can cope with nonlinear and more complex changes in the spectra.
In this paper, a novel projection method is proposed, which is a feature transfer model based on PLS subspace (PLSCT). PLSCT establishes the PLS model of the calibration set of the master instrument firstly, constructing a low-dimensional PLS subspace, which is a feature space constructed by the spectral feature vectors. The PLS model is then used to extract the predicted features of the master spectra and the pseudo predicted features of the slave spectra, that is, to project all spectra of the master instrument and slave instrument into this PLS subspace. Then, the ordinary least squares method is used to explore the relationship between the two features in the identical PLS subspace, the relationship will then be resorted to construct a feature transfer relationship model.
Notice that the pseudo predicted feature of the slave spectra is acquired by the PLS model established by the master instrument rather than the PLS model of the slave instrument. And PLSCT does not need the response variable corresponding to the standard set. In addition, compared with PDS, PLSCT corrects the feature of the spectra rather than the spectra. In contrast to CCACT, PLSCT uses PLS to find the covariance between the spectra and the response variable, instead of using CCA to find the correlation between the master spectra and the slave spectra.
In order to validate the performance of the PLSCT model, we not only compare its prediction results against those of the SBC, PDS, CCACT, GLSW, and MSC methods, but also apply the Wilcoxon signed rank test [27
] to determine whether PLSCT is statistically significantly superior to other models. The experiment was conducted in three real near-infrared datasets. By analyzing all the experimental results, we conclude that the PLSCT can significantly reduce the prediction error.