3.1. Overview of Spectral Features and Statistics of Reference Analysis
Figure 4 shows the mean raw spectra of the intact tomato in the spectral range of 1000–1550 nm with resulting second-derivative preprocessed spectral profiles, which is similar with previous studies [
34]. The near-infrared region (NIR) was sensitive to the concentrations of organic materials, which involved the response of molecular bonds C–H, O–H, and N–H [
35]. The MC, pH, and SSC contain bonds of C–H, O–H, C–O, and C–C. Thus, it is possible to use this region for determination of MC, pH, and SSC in tomatoes. However, their absorption peaks overlap in several parts of the spectral region, resulting in the spectral profiles of tomatoes in the whole spectral region being quite even with some broadband peaks. There are peaks at 1100–1200 nm, and to a lesser extent at 1350–1500 nm, which may be associated with the second overtone of band C–H and the stretching first overtone of bond O–H in H
2O, respectively [
35]. However, these peaks are usually located in wide spectral bands such that those key wavelengths that are helpful for predicting the MC, pH, and SSC of tomatoes cannot be directly identified. The proposed spectral preprocessing is first applied to raw spectral data and the preprocessed spectra were used to develop the PLS regression model.
An overview of MC, pH, and SSC distributions of tomatoes in the calibration and prediction sets is presented in
Table 1. These statistic values include number of samples, range, mean, and standard deviation (SD). In this study, 95 samples were divided into the calibration and prediction sets (60:35). The range of the calibration set was from 91% to 95.9% for MC, 3.9 to 4.4 for pH and 2.7% to 5.5% Brix for SSC, and the range of the prediction set was from 91.2% to 94.4% for MC, 3.9 to 4.3 for pH, and 3.4% to 4.9% Brix for SSC. The range of the calibration set is bigger than that of the prediction set, which is helpful when developing a good model.
3.2. PLS Regression Models
Using the PLS regression method, calibration and prediction models were developed for the various preprocessed spectra (
Table 2). Among all of these calibration models, the S–G first-derivative preprocessed spectra-based model is better for MC, pH, and the smoothing preprocessed spectra-based model is better for SSC in intact tomatoes because of high correlation coefficient, minimal difference between RMSEC and RMSEP, and the minimal number of latent variables. PLS regression prediction results for MC, pH, and SSC are presented in the scatter plots shown in
Figure 5. In all figures, the ordinate and abscissa axes represent the predicted and measured fitted values, respectively, of the corresponding parameters. The calibration correlation between the spectra and the MC of tomatoes was high, with
rcal from 0.81 to 0.88 and RMSEC from 0.44 to 0.54 (see
Table 2). When the calibrated model was applied to the prediction set (35 samples), the results were applicable with
rpred = 0.81, RMSEP = 0.63% using the S–G first-derivative preprocessed spectra (see
Figure 5a). This calibration model was better than that reported for MC in intact tomatoes using HSI and artificial neural networks in the range of 400–1000 nm, with a correlation coefficient (
rpred) of 0.773 [
4]. For an online application, a smaller number of variables are important in order to develop a simple calibration model. In this study, the PLS model appeared to be acceptable since six factors (LVs) were used in the calibration model (see
Table 2).
In the case of pH, a good regression correlation coefficient was obtained in the calibration set, with
rcal from 0.32 to 0.76 and RMSEC from 0.06 to 0.09 respectively (see
Table 2). When the model was used to predict the samples, the best results were found with
rpred = 0.69, RMSEP = 0.06 using S–G first derivative (see
Figure 5b). The PLS model appeared to be acceptable due to the two factors (latent variables, LVs) used in the calibration model (see
Table 2). The pH content of intact tomatoes in the prediction set ranges from 3.9 to 4.3; the lack of large data variation was considered to be influential on the regression results. Although another study of pH prediction in strawberry using HSI showed disparate findings, with standard error of prediction (SEP) values of 0.129, the models developed here for predicting pH displayed adequate predictive capacity for an online application [
10].
For SSC in intact tomato measurements, the calibration correlation between the spectra and the SSC was as adequately high as 0.64–0.82, with the RMSEC ranging from 0.24% to 0.36% Brix (see
Table 2). When the model was used to predict the samples, the prediction results were also desirable, with a correlation coefficient (
rpred) of 0.74 between the measured and the predicted values; the RMSEP was 0.33% Brix with smoothing preprocessed spectra (see
Figure 5c). The PLS regression model appeared to be robust since only five factors (LVs) were used in the calibration model (see
Table 2). Our results are consistent with the findings of Li et al. [
35], who found a correlation coefficient (
r) of 0.88 and RMSEP of 0.35% Brix for the prediction of SSC in pear using HSI with a spectral range of 930–2548 nm. By a 95% confidence paired
t-test, there were no significant differences between the experimental values of MC, pH, and soluble solid content (SSC) and those predicted by HSI. These results demonstrate that a calibration model for prediction of internal qualities of intact tomatoes using HSI has been successfully developed and validated.
In the above PLS regression results, individual wavelength contributions by MC, pH, and SSC contents in tomatoes was not considered in the prediction results. This was because the PLS regression method first applied linear transform to the entire individual wavelength data [
34]. As a result, it was difficult to determine how individual wavelengths were directly related to the MC, pH, and SSC contents in tomatoes to be predicted. However, it would be helpful to examine how MC, pH, and SSC in tomatoes were simply related to individual wavelengths so that a better understanding of their correlated spectra might be achieved.
3.3. Chemical Images of MC, pH and SSC in Intact Tomatoes
Figure 6 showed a sequence of representative processed images, illustrating the application of the hyperspectral image processing, single-band and threshold methods for the prediction of the MC, pH, and SSC in intact tomatoes. The 1082 nm waveband image was used as a representative image for visualization purposes because it showed the higher contrast among the other bands. The background regions of the non-fluorescence black cup were eliminated from the image by using a 0.1 value of a threshold. The resultant image reveals the major area of the tomato from the background. It shows that the HSI technique is allowed to acquire multiple samples at a time that contains a complete spectrum for every pixel in each sample. It also allows for the visualization of the different chemical constituents in each sample based on their spectral signatures because regions of similar spectral properties have similar chemical composition.
Figure 7 shows the chemical images/prediction map of the MC, pH and SSC of the intact tomato. The images were constructed by multiplying the obtained beta coefficient (regression coefficient) from the best preprocessed PLS regression model with the spectra of each pixel in the image. The power of these distribution maps resides in the rapid and easy access they afford to the spatial distribution of MC, pH and SSC in the tomato and their relative concentrations.
By including all the pixels, this approach has the advantage of displaying more detailed and accurate information. The difference in MC, pH and SSC within the same sample was very interesting and easily visualized in the concentration maps. The MC of the tomato showed uniform distribution along the fruits, while the pH was almost doubled in certain areas of the tomato. In the case of pH, the tomato shows pH variation (red-yellow-blue color variation) with lower pH (more blue) in the central areas of the fruit compared to the peripheral areas of the fruit. In the case of SSC content, lower SSC showed uniform distribution in their peripheries and, to a greater extent, SSC towards their central parts in tomato. The whole-tomato differences in internal qualities such as MC, pH, and SSC may be caused by differences in sunlight exposure for the fruit surface during cultivation.