When light hits the surface of a fruit it can be absorbed, scattered or re-emitted. The amount of each of these is determined by the physical properties and chemical constituents and thus ripeness of a fruit. Visible and Near InfraRed (VNIR) reflectance spectroscopy measures reflected light between 380 nm and 2500 nm, which is largely dependent on the light absorption by fruit sample and relates to almost all the major organic compounds. An example of the changes in spectra during ripening is given in
Figure 2, which shows marked changes in the values between 400–700 nm and again at longer wavelengths. VNIR spectroscopy has been widely applied as a non-destructive and fast measurement method for multiple quality attributes [
126]. More importantly, a portable device has been developed and used in the field [
127]. The recorded spectra can be analyzed and related to different ripeness stages by using spectral indices. The whole wavelength scan, or values at key selected wavelengths, are used in regression models to correlate with specific fruit qualities that are associated with fruit ripeness.
3.2. Full or Selected Wavelengths
Spectral indices can describe the change of peel pigment concentration during the ripening process and provide comparable values with the colorimetric method [
37], but peel colour is not always the only criteria for ripening assessment. The correlation between internal quality attributes related to ripening, such as firmness and SSC, were investigated with full or selected wavelengths from the VNIR spectra.
For the full wavelengths, PLS is the most used regression model to predict fruit quality. The prediction is achieved by extracting a set of orthogonal factors from the predictors, called latent variables, which have the best predictive power [
134]. Another common regression model is the Principal Component Regression (PCR), which uses Multiple Linear Regression (MLR) to correlate with the principal components scores extracted from the predictors [
135], and which has been applied in some studies for fruit quality assessment [
67,
135,
136]. Compared with PLS, PCR showed the drawback that the principal components were obtained without considering the dependent variables.
The variability of physical sample properties and/or the performance of the hardware can result in undesired results including light scattering, path length variations and random noise generated in the extracted spectra. These factors reduce the accuracy and robustness of the prediction models [
137]. In order to improve the data analysis, a number of studies have applied different pre-processing techniques to the spectra obtained before modelling [
136].
Savitzky–Golay (SG) is the most frequently used digital data smoothing filter [
42,
49,
65,
91,
138,
139], which applies the Linear Least Squares method to fit low-degree polynomial data [
140]. However, SG has contrasting effects on the performance of multivariate statistical models [
141]. For example, Jha et al. compared different pre-processing techniques and found that smoothing did not produce any improvement in comparison with other techniques for the assessment of the firmness in mangos [
49]. But Herrera et al. showed that SG filters using a second-order polynomial performed better than other scattering correction methods for the prediction of wine grape Brix [
137]. Standard Normal Variate (SNV) [
72,
78,
142,
143] and Multiple Scattering Correction (MSC) [
42,
67,
137,
139,
144,
145] are the two most frequently used techniques for scattering correction. MSC is used to eliminate the nonlinear scattering due to the non-uniform travel distance of light by linearizing each spectrum to a reference spectrum, which is the always the mean spectrum [
67]. Previous research has shown the similarity between SNV and MSC, for example Ma et al. confirmed that the correlation coefficients were the same when assessing the sugar content of peaches using PLS models with SNV and MSC [
146]. SNV can, however, be applied to an individual spectrum without requiring a reference [
147]. In some studies, SNV was performed with de-trending, which was used to correct the baseline shift of spectra [
72,
143].
Generating derivatives of spectra are useful pre-processing techniques to enhance subtle differences and reduce the effect of specular reflection [
79,
137]. Guo et al. found that the PLS regression model performed better with the first derivative of the spectra than SNV, MSC, and the second derivative for predicting the SSC in strawberries [
139]. A similar conclusion was drawn for the Total Soluble Solid (TSS) content prediction of strawberries [
73]. The second derivative has also been used in the prediction of chlorophyll content of apples [
23], the SSC of kiwifruit, strawberries, cherries, and peaches [
71,
79,
148,
149], and the firmness of apricots [
78]. Carlini et al. compared the second derivative, MSC and SNV methods, and found that the second derivative showed the best performance for the prediction of SSC in cherries [
71]. Interestingly, pre-processing techniques are not always beneficial to the spectral analysis. Clément et al. applied all the above-mentioned pre-processing techniques to the prediction of tomato ripeness, but it was found that none of them showed improvements on a PLS model due to the low levels of noise [
61]. Likewise, Jaiswal et al. also found that the best predictions of TSS and DM (Dry Matter) content of bananas with PLS model was obtained with no pre-processing to the spectra [
52].
In some studies, only a small number of selected wavelengths were used to reduce the multicollinearity among variables and to be modelled by the Multiple Linear Regression (MLR) model. The key wavelengths can be identified manually or automatically. The manual selection of key wavelengths has been used for the prediction of Brix in mangos [
150] and SSC for grapes, limes and star fruit [
84]. Guthrie et al. calculated the correlation coefficients between the second derivative of spectra and Brix values as the criteria of wavelength selection for the MLR model. This method provided better predictions than when using the first derivative [
90] and had previously been used in the prediction peach Brix levels [
151].
Automated wavelength selection methods, such as stepwise wavelength selection, have also been used to aid the predictive power of models. The first wavelength selected is that with the highest correlation to the depended variable. Additional wavelengths are added, one by one, in order to strengthen the correlation until none of the remaining wavelengths are significant. This method was used for the SSC prediction of peaches [
152], melons and pineapples [
153]. The Genetic Algorithm (GA), which uses natural selection and random mutations based on prediction accuracy, is another efficient automated method for identifying key wavelengths and was successfully applied to the SSC prediction of apples [
154].
A comparison of the performance of PLS and MLR for the assessment of the maturity of mangos indicated that when using MLR, a poorer correlation of data was observed and led to overfitting, as seen by the large gap between the correlation coefficients of calibration and validation models [
155]. A similar phenomenon was also observed for the prediction of TSS and DM for bananas [
52]. However, if the key wavelengths that were selected represented most of the variance of the whole spectra but low collinearity, MLR can show better performance than PLS, such as in the firmness prediction of mangos [
50]. These two methods of modelling were also compared for the prediction of Brix values in mangos and, interestingly, both correlation coefficients were high and comparable [
150]. Consequently, it is unclear which model can provide a better prediction, as the performance of MLR is largely dependent on the wavelength selection.
The correlation has always been higher for the prediction of SSC than of the firmness of fruit by using spectroscopic methods. Park et al. showed that the prediction of firmness was more complicated than SSC as it was not determined by a single analyte or a limited group of related chemicals [
156]. For both SSC and firmness prediction, the performance of the model is always cultivar dependent, and the calibration model trained by using mixed cultivars produces a lower correlation than when using results from an individual cultivar. A limited number of studies were focused on the assessment in-field, but compared with indoor measurement, the prediction is less accurate [
27,
45,
48].
Spectroscopic methods utilize longer wavelengths than colorimeter and visible imaging, but similar to colorimeter, they are not likely to be applied as high-throughput ripening assessment tools due to the low spatial resolution. The accuracy of the internal quality measurement is influenced by sample temperature, which needs to be compensated for by an extra calibration model [
48]. Spectroscopy has been used in assessing the quality of a large variety of fruits, and portable commercial spectrometers have been developed [
48,
78,
127,
157], but most of the studies have focused on the indoor, post-harvest assessment of fruit maturity. Inconsistent performances were observed for the models developed by spectra taken indoor and on-tree. Predicting apple firmness and SSC both on the tree and during storage showed that the on-tree PLS model had the best correlation coefficients for both firmness and SSC [
27,
158]. However, for nectarines, the on-tree model performed worse than the post-harvest one [
45]. Consequently, for the on-tree ripeness assessment, it is necessary to build the prediction model with spectra taken in-field and understand the effect of environmental factors on the quality of spectra.