A Comparative Study of PLSR and SVM-R with Various Preprocessing Techniques for the Quantitative Determination of Soluble Solids Content of Hardy Kiwi Fruit by a Portable Vis/NIR Spectrometer

Linear partial least square and non-linear support vector machine regression analysis with various preprocessing techniques and their combinations were used to determine the soluble solids content of hardy kiwi fruits by a handheld, portable near-infrared spectroscopy. Fruits of four species, namely Autumn sense (A), Chungsan (C), Daesung (D), and Green ball (Gb) were collected from five different areas of Gwangyang (G), Muju (M), Suwon (S), Wonju (Q), and Yeongwol (Y) in South Korea. The dataset for calibration and prediction was prepared based on each area, species, and in combination. Half of the dataset of each area, species, and combined dataset was used as calibrated data and the rest was used for model validation. The best prediction correlation coefficient ranges between 0.67 and 0.75, 0.61 and 0.77, and 0.68 for the area, species, combined dataset, respectively using partial least square regression (PLSR) method with different preprocessing techniques. On the other hand, the best correlation coefficient of predictions using the support vector machine regression (SVM-R) algorithm was 0.68 and 0.80, 0.62 and 0.79, and 0.74 for the area, species, and combined dataset, respectively. In most cases, the SVM-R algorithm produced better results with Autoscale preprocessing except G area and species Gb, whereas the PLS algorithm shows a significant difference in calibration and prediction models for different preprocessing techniques. Therefore, the SVM-R method was superior to the PLSR method in predicting soluble solids content of hardy kiwi fruits and non-linear models may be a better alternative to monitor soluble solids content of fruits. The finding of this research can be used as a reference for the prediction of hardy kiwi fruits soluble solids content as well as harvesting time with better prediction models.


Introduction
Fruits are manually or mechanically sorted based on size, color, shape, and surface defects, while internal quality parameters including soluble solids content (SSC), acidity (pH), vitamins, and phytochemicals are generally determined by destructive approach [1]. Regarding kiwi fruit production, the determination of internal quality attributes is a paramount concern to consume quality fruits. In recent years, the improvement of non-destructive technology has gained much attention for the application in measuring food quality, which is easy in operation, fast, and reliable than traditional methods [2]. Visible/near-infrared (Vis/NIR) spectroscopy is one of the promising nondestructive spaces efficiently, to manage noisy patterns and multi-modal class distributions, and their restriction on a subset of training data to suit a non-linear function [32].
Studies carried out on hardy kiwi fruit with NIRS include heavy, equipped, complex, and lab-based NIR systems. Hence, a portable, user-friendly, and handheld NIR spectrophotometer is highly capable of in situ measurements that will help kiwi growers to check the pre-harvest maturity level and internal fruit attributes in a non-destructive way [33]. Based on the above discussion, this research was designed with an aim to compare the results of two different analyzing procedures PLSR and SVM-R with various preprocessing techniques for the prediction of SSC of hardy kiwi fruit from NIR spectra acquired using a portable handheld spectrometer.

Sample Preparation
The hardy kiwi fruit (Actinidia arguta) was used in this study and collected from ten different farms of five regions in South Korea. Table 1 shows the location of each farm in five areas as follows: In this study, hardy kiwi samples were collected from five areas (Gwangyang, Muju, Suwon, Wonju and Yeongwol) which include four species, namely, Autumn sense, Chungsan, Daesung, and Green ball. All samples were subjected to a post-ripening process under same environmental conditions of temperature (20°C), and relative humidity (65%) for three days. Datasets were prepared based on area, species, and in combination. Area dataset includes samples data of each area, species dataset includes different farms data, and combine dataset includes all data together. The classified dataset of each area, species, and combine set are presented in Table 2. Afterward, datasets were divided into two groups equally for each area, species, and combine set: the first group was used to develop the calibration models and the other for predicting quality and model validation purposes.
A number of factors, such as geographical origin, temperature, season, cultivar, and fruit maturity affect the fruit cellular and optical structure as well as chemical composition and fruit internal quality parameters, such as SSC, moisture, and acidity [34]. Therefore, to establish a more reliable and robust model for the prediction of SSC, biological and physical changes in samples on the NIR spectrum should be considered. In this study, one approach was applied to develop models using individual area data. The individual area or origin model indicates more reliable prediction ability if the predicted samples were collected from the same area [35]. Furthermore, a prediction model depending on a variety or species is more appropriate than the multiple species [36]. On the contrary, multiple species or varieties-dependent dataset gives a more accurate and robust prediction model for the determination of harvesting time or fruit maturity [37]. By considering all these integrities, datasets and the models were developed for each area, each species, and in combination.

Spectra Acquisition
In this experiment, spectra acquisition was based on interactance mode; each sample was scanned using a portable, handheld, and low precision near-infrared spectrophotometer (F-750 produce quality meter, Felix Instruments, Inc., 1554 NE 3rd Avenue Camas, WA, USA). The Vis/NIR spectrophotometer scans four times simultaneously to produce one average absorbance spectra by log (1/R reflectance) of the intact fruit between the wavelengths of 310 and 1100 nm at a data resolution of 3 nm, spectral resolution ranging from 8 to 13 nm. Xenon tungsten light source was used in this instrument to illuminate the fruit through a sampling window of 30 mm diameter. The device equipped with a Zeiss MMS1 VIS-NIR spectrometer (Carl-Zeiss-Promenade 1,007,745 Jena, Germany) placed just behind the sampling window to incorporate the light transmitted through the skin and flesh of the fruit. Furthermore, absorbance spectra were converted to second derivative spectra using F750 Data viewer (Felix Instruments, Inc., 1554 NE 3rd Avenue Camas, WA, USA) software, and spectra from 729-975 nm were analyzed while 310 to 728 nm and beyond 975 nm were eliminated because of the noise containing in this wavelength regions.

Reference Measurement
Reference measurement of SSC of hardy kiwi is the primary internal quality attribute that was measured using a portable digital refractometer (SCM-1000, HM Digital Inc. 2629 Manhattan Beach Blvd. Redondo Beach, CA 90,278 USA) with a measurement range of 0-55%brix destructively after spectra acquisition with the spectrophotometer.

Chemometrics
The most important chemical ingredient of fruits and vegetables is water which absorbs near-infrared radiation [38], and near-infrared spectrum consist of a large set of overtones and combination bands which are convoluted because of the complicated chemical composition of typical fruits and vegetables [20]. In addition, spectrums can be convoluted by complicated wavelength-dependent scattering effects, tissue heterogeneities, instrumental noise, ambient effects, and other sources of variability [39]. Therefore, chemometrics would be the compatible solution to extract as much information from the interpretive NIRS data [26].

Spectral Data Preprocessing
The spectra data obtained from the spectrometer consist of background information and noise other than sample information, and it is very decisive to pre-process spectra data before modeling to obtain reliable, accurate, and steady calibration models [39]. Spectral preprocessing techniques are required to remove any irrelevant information, including noise, uncertainties, variability, interactions, and Foods 2020, 9, 1078 5 of 23 unrecognized features. In this study, original spectra were converted to second derivative preprocessing (SD), later several preprocessing methods such as normalization-multiplicative scatter correction (MSC), standard normal variate transformation (SNV), orthogonal signal correction (OSC), Autoscale, and combination of normalization preprocessing techniques were applied on second derivative spectra data to build calibration and prediction models. Second derivative preprocessing was used to remove background and increase spectral resolution and to remove slope additive baselines. Normalization removes the multiplicative spectral effects where the spectral vector transformed into unit length. Precisely, MSC preprocessing removes the ideal linear scattering and its effects, SNV removes the deviations caused by particle size and scattering, and OSC preprocessing removes the unnecessary signals before calibration and reduces the principal component (PC) number significantly which strengthens the prediction ability of the model [14]. Another preprocessing method Autoscale is used to remove the influence of variables measured on different scales [40] that make the data more suitable for data analysis. In addition, different preprocessing techniques were used for both the analysis of partial least square regression (PLSR) and support vector machine regression (SVM-R) for the comparison of predicted models. Various preprocessing techniques are used in this study, as shown in Figure 1. derivative preprocessing (SD), later several preprocessing methods such as normalizationmultiplicative scatter correction (MSC), standard normal variate transformation (SNV), orthogonal signal correction (OSC), Autoscale, and combination of normalization preprocessing techniques were applied on second derivative spectra data to build calibration and prediction models. Second derivative preprocessing was used to remove background and increase spectral resolution and to remove slope additive baselines. Normalization removes the multiplicative spectral effects where the spectral vector transformed into unit length. Precisely, MSC preprocessing removes the ideal linear scattering and its effects, SNV removes the deviations caused by particle size and scattering, and OSC preprocessing removes the unnecessary signals before calibration and reduces the principal component (PC) number significantly which strengthens the prediction ability of the model [14]. Another preprocessing method Autoscale is used to remove the influence of variables measured on different scales [40] that make the data more suitable for data analysis. In addition, different preprocessing techniques were used for both the analysis of partial least square regression (PLSR) and support vector machine regression (SVM-R) for the comparison of predicted models. Various preprocessing techniques are used in this study, as shown in Figure 1.

Principle Component Analysis (PCA) and Partial Least Squares (PLS) Analysis
Before PLS analysis, PCA was used as a quantitative method to show the statistical variance among the different datasets. PCA loading values were obtained to select the impactful wavelengths within the selected window. The PCA was carried out considering only a part of the dataset. (d) spectra by SD + OSC; (e) spectra by the combination of SD, standard normal variate (SNV), and orthogonal signal correction (OSC); (f) spectra by the combination of SD, multiplicative signal correction (MSC), and SNV; (g) spectra by the combination of SD, MSC, and OSC; and (h) spectra by the combination of SD and Autoscale, from hardy kiwi samples.

Principle Component Analysis (PCA) and Partial Least Squares (PLS) Analysis
Before PLS analysis, PCA was used as a quantitative method to show the statistical variance among the different datasets. PCA loading values were obtained to select the impactful wavelengths within the selected window. The PCA was carried out considering only a part of the dataset. Latent variable (LVs) is an essential factor that could be extracted from PLS according to the relevant predicted Y variable from a highly correlated and collinear original spectra. The number of LVs is very significant to build a PLS model, which is generally optimized by cross-validation of the calibration samples. Therefore, the optimal number of LVs was chosen using 10-fold contiguous block cross-validation until the root mean square of cross-validation (RMSECV) becomes a minimum. The PLS analysis was performed using MATLAB 2018b (MathWorks, Natick, MA, USA) with PLS Toolbox (Eigenvector Research Inc., 196 Hyacinth Road, Manson, WA 98831, USA).
The model performance was evaluated and calculated based on the following parameters: the correlation coefficient (R 2 ) of calibration and prediction, root mean square errors of the calibration and prediction (RMSEC, and RMSEP) (see Equations (1) and (2), respectively), bias and ratio of prediction to deviation (RPD) (see Equation (3)) were used to test the predictability of the developed model. RPD calculation of the model suggests the accuracy of the prediction model, and the performance of the prediction model for NIR analysis can be classified as high, average, or low values of >2.0, 1.4-2.0, and <1.4, respectively where the higher RPD value indicates higher accuracy of the prediction [41,42].
where m represents the number of validation samples, the number of free variables is f , y i is the measured sugar content of sample, andŷ i is the predicted sugar content of the sample.
where SD is the standard deviation of the measured sugar content of validation set.

Support Vector Machine (SVM) Analysis
The SVM is a strong non-linear multivariate algorithm and establishes a hyperplane or set of hyperplanes in a high dimensional space for classification, and the same principle is used for the regression [43]. Three standard kernel functions, namely polynomial, radial basis, and sigmoid kernel function are used to build the SVM model, where radial basis function (RBF) is mostly used because of its capability of handling linear and non-linear relationships between the class labels and spectra data. Also, RBF can reduce computing intricacy of the training set, thereby providing an excellent performance under general smoothness assumptions [44][45][46].
In SVM-R, the cost value was used to optimize SVM model, and the optimum SVM-R model was achieved according to the minimum cost value. In this study, SVM-R was applied by choosing the ε-SVR (epsilon-support vector regression) algorithm with the following parameters: radial basis function (rbf) as kernel type, γ = 0.01 and ε = 0.1. MATLAB 2018b (MathWorks, Natick, MA, USA) with PLS Toolbox (Eigenvector Research Inc., 196 Hyacinth Road, Manson, WA 98831, USA) was used for the analysis of all datasets [47]. Figure 2a shows the second derivative spectra of hardy kiwi samples. The region covers Vis/NIR spectra range from 300 nm to1100 nm. According to Figure 2a, the optimal spectral range for model calibration of kiwi fruits was 729-975 nm, and this finding was in accordance with previous studies carried out to determine internal quality attributes of other fruits [48,49]. On the other hand, spectral profiles of all hardy kiwi samples exhibited a common pattern over the whole NIR region which indicates a similar component in all samples for area and species. The most important attributes affect the NIRS are chemical bonds, such as C-H, N-H, O-H, and C-O, which are subjected to stretching or bending caused by the vibrational energy change during irradiation by NIR light [50]. Outside the selected window of wavelength regions were noisy and uninformative due to the absorbance of chlorophyll and other pigments in the visible range of 400-700 nm [49]. From Figure 2b, the absorbance spectra were converted to the second derivative spectra and narrowed the wavelength region (729-975 nm) that could be used for the analysis of carbohydrate, sugar, and water absorbance bands in the NIR [51].

Characteristics of Spectral Profiles
Foods 2020, 9, x FOR PEER REVIEW 9 of 24 the NIRS are chemical bonds, such as C-H, N-H, O-H, and C-O, which are subjected to stretching or bending caused by the vibrational energy change during irradiation by NIR light [50]. Outside the selected window of wavelength regions were noisy and uninformative due to the absorbance of chlorophyll and other pigments in the visible range of 400-700 nm [49]. From Figure 2b, the absorbance spectra were converted to the second derivative spectra and narrowed the wavelength region (729-975 nm) that could be used for the analysis of carbohydrate, sugar, and water absorbance bands in the NIR [51].

Statistical Properties
In this study, a total of 3128 samples were obtained from five areas of South Korea and were divided into two subsets equally as calibration and validation set with respect to each area, their species, and all together. Tables 3-5 show the summary statistics of actual SSC for all datasets selected

Statistical Properties
In this study, a total of 3128 samples were obtained from five areas of South Korea and were divided into two subsets equally as calibration and validation set with respect to each area, their species, and all together. Tables 3-5 show the summary statistics of actual SSC for all datasets selected for the calibration and validation of the models in each area, each species, and combined datasets, respectively. The SSC ranges of calibration and validation set between 5.0 and 27.2%, which indicated that calibration and validation sets covered a broad enough range.

Principal Component Analysis (PCA)
The PCA method was used to investigate the variability of dataset along with the spectral differences among the five-area datasets. Before PCA computation, second derivative data were pre-treated with the combination of MSC and SNV preprocessing because the combination was best suited among eight preprocessing techniques. Figure 3a,b illustrates the PCA score plots of PC1 vs. PC2 vs. PC3 of original and pre-treated spectra data. The results of PCA analysis with second derivative spectra attained PC1 values of 69.4%, PC2 of 27.1%, and PC3 by 1.4%. Where, PC1 is a component that explains the greatest variability in the data set, and PC2 and PC3 explain the remaining variability. After the pre-treatment, the values of PC1, PC2, and PC3 were found to be 48.7%, 13.4%, and 10.9%, respectively. Data from all the five areas are evenly scattered in the space that exhibited similar characteristics of fruits collected from different areas.

PCA Loadings
To select the important and responsible wavelengths for the data grouping in PCA score plot, the pretreated spectra and the first loading were taken into consideration. From Figure 4, important wavelengths can be observed at 738, 819, 837, 843, 918, and 951 nm which correspond to the third overtones of C-H bond in water carbohydrates, starch, and sucrose, where peaks at 738, 837, and 951 nm are main and distinguishing [52]. This wavelength range is reliable for SSC prediction which is supported by instrument's manual information for SSC and dry matter prediction of kiwi fruits. Foods 2020, 9, x FOR PEER REVIEW 12 of 24

PLS Models for the Prediction of SSC-Area Dataset
PLS models were developed using the pre-processed spectra in each area. Table 6 shows the best result of calibration and validation models among the eight preprocessing techniques from the spectral sampling range of 729-975 nm wavelengths. Other preprocessing results of the area using PLSR are provided as supplementary material (see Table S1). Table 6. Best results of each area with second derivative data and their preprocessing at a spectral sampling range of 729-975 nm using partial least square regression (PLSR).

Area
Pre The best predicted R 2 of G, M, S, W, and Y areas were 0.72, 0.75, 0.72, 0.74, and 0.67, respectively obtained from different preprocessing with different latent variable (LV) numbers. Samples of Y area were found with relatively unsatisfactory results with all the preprocessing due to the negative impact of a number of samples on models. Although the same preprocessing was applied for all the areas, MSC + SNV was found excellent for M and Y area regarding the highest prediction correlation coefficient R 2 of 0.75 and lowest RMSEP values of 2.0051. For the G area, no preprocessing was effective to increase the predictability except the original second derivative preprocessing with the highest predicted R 2 of 0.72 and the lowest RMSEP of 1.7136. Regarding the S and W area, SNV and MSC + OSC suited most irrespective of others. However, preprocessing was applied to boost model performance by increasing the correlation coefficient and reducing RMSEP values, and in some cases, it was found that the latent variable (LV) numbers could be reduced by using data preprocessing. The SNV + OSC and MSC preprocessing significantly reduced the LV value that is irrespective of the calibration and prediction performance. It can be noted that different preprocessing techniques had effects on the prediction performance of the model using the PLS regression method. The best results

PLS Models for the Prediction of SSC-Area Dataset
PLS models were developed using the pre-processed spectra in each area. Table 6 shows the best result of calibration and validation models among the eight preprocessing techniques from the spectral sampling range of 729-975 nm wavelengths. Other preprocessing results of the area using PLSR are provided as Supplementary Material (see Table S1). Table 6. Best results of each area with second derivative data and their preprocessing at a spectral sampling range of 729-975 nm using partial least square regression (PLSR).

Area
Pre The best predicted R 2 of G, M, S, W, and Y areas were 0.72, 0.75, 0.72, 0.74, and 0.67, respectively obtained from different preprocessing with different latent variable (LV) numbers. Samples of Y area were found with relatively unsatisfactory results with all the preprocessing due to the negative impact of a number of samples on models. Although the same preprocessing was applied for all the areas, MSC + SNV was found excellent for M and Y area regarding the highest prediction correlation coefficient R 2 of 0.75 and lowest RMSEP values of 2.0051. For the G area, no preprocessing was effective to increase the predictability except the original second derivative preprocessing with the highest predicted R 2 of 0.72 and the lowest RMSEP of 1.7136. Regarding the S and W area, SNV and MSC + OSC suited most irrespective of others. However, preprocessing was applied to boost model performance by increasing the correlation coefficient and reducing RMSEP values, and in some cases, it was found that the latent variable (LV) numbers could be reduced by using data preprocessing. The SNV + OSC and MSC preprocessing significantly reduced the LV value that is irrespective of the calibration and prediction performance. It can be noted that different preprocessing techniques had effects on the prediction performance of the model using the PLS regression method. The best results within the number of preprocessing are shown in Figure 5 with the scattered plot of measured and predicted SSC for each area.
Foods 2020, 9, x FOR PEER REVIEW 13 of 24 within the number of preprocessing are shown in Figure 5 with the scattered plot of measured and predicted SSC for each area.   Table 7 includes the best SSC model calibration and prediction results for each species data analyzed by using PLSR with the same preprocessing techniques and spectral sampling range. Species results with the other preprocessing techniques using PLSR are shown as Supplementary Material (see Table S2). Table 8 shows the combined data results using the same procedures.  Table 7 shows the predicted values of R 2 of A, C, D, and Gb species were 0.77, 0.67, 0.69, and 0.61 with the best preprocessing of Autoscale, MSC + SNV, Autoscale, and second derivative, respectively. Species results of PLSR are unsatisfactory except species A because species A includes large data numbers and a mixture of different area farms. For species A, the value of LV was found 11 for Autoscale preprocessing and resulted in high RPD of 2.1411 with the lowest RMSEP of 1.8775. In the case of species D, PLSR was unable to predict calibration and prediction models with MSC + OSC preprocessing. The species Gb resulted in high calibration correlation coefficient but unable to perform satisfactorily in prediction condition because of having a few data. The combined dataset includes all data together, and it was expected to show lower performance compared to the area and species due to the adverse influence of some data from each area or species. In this case, MSC + OSC was found to be best among all the preprocessing with LV of 7, calibration correlation coefficient of 0.66, the prediction correlation coefficient of 0.68, RMSEP of 1.8286, and higher RPD of 1.8210. The scattered plot of best measured and predicted SSC is shown in Figure 6.

SVM-R Models for the Prediction of SSC-Area Dataset
Unlike PLSR, SVM-R also applied to the same data to compare the SSC predictability of two analyzing methods. The best predicted results of SVM-R analysis with the same preprocessing are presented in Table 9. Results of the area with other preprocessing techniques using SVM-R can be found in the Supplementary Material (see Table S3). The predicted R 2 for G, M, S, W, and Y areas were 0.68, 0.80, 0.79, 0.75, and 0.69, respectively with the lowest RMSEP and highest RPD from Autoscale preprocessing. Foods 2020, 9,

SVM-R Models for the Prediction of SSC-Area Dataset
Unlike PLSR, SVM-R also applied to the same data to compare the SSC predictability of two analyzing methods. The best predicted results of SVM-R analysis with the same preprocessing are The SVM fits well for non-linear models and results in better calibration and prediction of soluble solids with higher R 2 , lowest RMSEP, and a higher ratio of prediction to deviation (RPD). Area M follows the same trend of good prediction ability as PLSR with a higher calibration and prediction correlation coefficient of 0.82 and 0.80, respectively. While the lowest RMSEP was 1.8421 and the highest RPD was 2.2691 for area M. Again, area Y showed a lower prediction correlation of 0.69 but resulted in a higher RPD value of 2.3883 where lower prediction ability of this area indicates an issue with the hardy kiwis regarding the adverse influence of samples on models. Data of area S and W were found to be satisfactory in every case with Autoscale preprocessing. Remarkably, second derivative data and MSC preprocessing on second derivative data showed very poor results for each area and the same models obtained with the SVM-R analysis yielded better calibration and prediction outcomes only with the Autoscale preprocessing except area G. The scattered plot of measured and predicted SSC is shown in Figure 7. Table 9. Best results of each area with second derivative data and their preprocessing at a spectral sampling range of 729-975 nm using support vector machine regression (SVM-R).

Area
Pre found in the supplementary material (see Table S3). The predicted R 2 for G, M, S, W, and Y areas were 0.68, 0.80, 0.79, 0.75, and 0.69, respectively with the lowest RMSEP and highest RPD from Autoscale preprocessing. The SVM fits well for non-linear models and results in better calibration and prediction of soluble solids with higher R 2 , lowest RMSEP, and a higher ratio of prediction to deviation (RPD). Area M follows the same trend of good prediction ability as PLSR with a higher calibration and prediction correlation coefficient of 0.82 and 0.80, respectively. While the lowest RMSEP was 1.8421 and the highest RPD was 2.2691 for area M. Again, area Y showed a lower prediction correlation of 0.69 but resulted in a higher RPD value of 2.3883 where lower prediction ability of this area indicates an issue with the hardy kiwis regarding the adverse influence of samples on models. Data of area S and W were found to be satisfactory in every case with Autoscale preprocessing. Remarkably, second derivative data and MSC preprocessing on second derivative data showed very poor results for each area and the same models obtained with the SVM-R analysis yielded better calibration and prediction outcomes only with the Autoscale preprocessing except area G. The scattered plot of measured and predicted SSC is shown in Figure 7. Table 9. Best results of each area with second derivative data and their preprocessing at a spectral sampling range of 729-975 nm using support vector machine regression (SVM-R).   Table 10 presents the best predicted results of SVM-R analysis obtained from the species, and results with the other preprocessing techniques are presented in the supplementary material (see Table S4). Table 11 presents the combine dataset results with all the preprocessing techniques. For the species table, the best-predicted correlation coefficient was obtained from Autoscale preprocessing for species A, C, and D with the value of 0.79, 0.73, and 0.71, respectively. Another species Gb showed poor prediction correlation of 0.62. In addition, Gb relies on MSC + SNV preprocessing with higher predictability, the lowest RMSEP and higher RPD among preprocessing techniques. The second derivative and the preprocessing MSC on second derivative data yielded poor results and followed the same trend for each area.  Table 10 presents the best predicted results of SVM-R analysis obtained from the species, and results with the other preprocessing techniques are presented in the Supplementary Material (see Table S4). Table 11 presents the combine dataset results with all the preprocessing techniques. For the species table, the best-predicted correlation coefficient was obtained from Autoscale preprocessing for species A, C, and D with the value of 0.79, 0.73, and 0.71, respectively. Another species Gb showed poor prediction correlation of 0.62. In addition, Gb relies on MSC + SNV preprocessing with higher predictability, the lowest RMSEP and higher RPD among preprocessing techniques. The second derivative and the preprocessing MSC on second derivative data yielded poor results and followed the same trend for each area. On the other hand, for the combined dataset, Autoscale preprocessing had satisfactory results with a high correlation coefficient of 0.74, the lowest RMSEP of 1.6867, and a higher RPD value of 1.9742. The scattered plot of measured and predicted SSC is shown in Figure 8. On the other hand, for the combined dataset, Autoscale preprocessing had satisfactory results with a high correlation coefficient of 0.74, the lowest RMSEP of 1.6867, and a higher RPD value of 1.9742. The scattered plot of measured and predicted SSC is shown in Figure 8.

Comparison of PLSR and SVM-R Analysis
In this study, PLSR and SVMR analysis were performed on second derivative spectra data

Comparison of PLSR and SVM-R Analysis
In this study, PLSR and SVMR analysis were performed on second derivative spectra data obtained from original/absorbance spectra data. Various pre-treatments (MSC, SNV, OSC, and Autoscale) and their combinations (SNV + OSC, MSC + SNV, and MSC + OSC) were applied to the second derivative data to obtain the best predictive model for the SSC quality analysis of hardy kiwi. Regarding the comparison between the PLSR and SVM-R, the models were developed on the various dataset obtained from the area, species, and combine.
From the PLSR results for the area, the best predicted correlation coefficient can be found between 0.67 and 0.75, where SVM-R performed better than the PLSR models with a correlation coefficient of 0.68 and 0.80. In both cases, area M showed the highest predicted correlation coefficient with the lowest RMSEP and higher RPD value that indicates the optimum models RPD ratio between 1.5 and 2.5 is suitable for screening purposes [53]. Furthermore, with SVM-R analysis, RPD values of optimum models were higher than the PLSR optimum models. That indicates more reliability of SVM-R regression method over PLSR. Notably, Autoscale preprocessing technique exhibited satisfactory results than any other preprocessing for SVM-R analysis. On the other hand, optimum results of PLSR analysis confides on various preprocessing techniques. The values of regression coefficient of SSC were slightly higher than those of 0.72 to 0.89 obtained by Escribano et al. [48] for the prediction of SSC of sweet cherries. Furthermore, a study, conducted on peppers using a portable Vis/NIR spectrometer showed the prediction correlation coefficient of 0.68, SEP of 1.24, and RPD of 1.78 for SSC prediction [53].
On the other hand, species A was found superior to other species with Autoscale preprocessing for both methods resulting in the best prediction correlation coefficient of 0.77 and 0.79, and higher RPD of 2.1411 and 2.1896. This result also indicates the higher reliability of the SVM-R method for species A. For species C, SVM-R follows the same pattern for the prediction of SSC. Here, Autoscale preprocessing demonstrated a higher prediction possibility with a higher correlation coefficient of 0.73 and higher RPD. On the other hand, PLS resulted in the highest prediction R 2 and RPD of 0.67 and 1.7917, respectively with MSC + SNV preprocessing. Like species A, species D followed the same pattern and showed satisfactory prediction capability with the lowest RMSEP and higher RPD for Autoscale preprocessing by the SVM-R method. In contrast, the prediction correlation coefficient of PLSR was below 0.70, whereas MSC + OSC preprocessing was unable to perform the prediction. Feng et al. studied on Actinidia chinesis "Hort16A" for the prediction of SSC and observed the prediction of R 2 with 0.72 and RMSEP of 1.28 [54]. Again, Kim et al. [33] reported the prediction R 2 of 0.72 for the SSC prediction of baby kiwi fruit.
The last species Gb showed unsatisfactory results with both PLSR and SVM-R analysis (R 2 of 0.61 and 0.62, respectively) among the preprocessing techniques. This might be due to the inadequate number of samples involved in the calibration and prediction and adverse influence of samples on model. External factors, such as temperature, peel, humidity, product morphology, etc., can affect the model performance because of the high errors in data collection [53]. For the prediction of the combined dataset, SVM-R performed quite a satisfactory result compared to PLSR. The PLSR analysis resulted in a prediction correlation coefficient of 0.68 with the lowest RMSEP of 1.8286 and a higher RPD of 1.8210 with MSC + OSC preprocessing. Notably, Autoscale preprocessing of SVM-R analysis increased the prediction capability with a value of 0.74, minimum RMSEP of 1.6867 and higher RPD of 1.9742. In a study using d'Anjou and Bartlett pear, the predicted R 2 of the validation dataset for SSC prediction was in the ranges of 0.651-0.844 [49]. In a further study, the standard error of laboratory (SEL) of reference method will be calculated to determine the SEP/SEL ratio that evaluates the predictive ability of equations and the precision level of models for accurate routine use.
The second derivative preprocessing technique showed the best performance in prediction for area G and species Gb among prediction results of area and species. Contrarily, SVM-R analysis datasets with no preprocessing on second derivative data do not follow the same pattern as PLSR. Furthermore, Autoscale preprocessing is acceptable in most of the cases for SVM-R analysis.

Conclusions
The study was conducted to evaluate the application of a handheld portable Vis/NIR spectrometer for the SSC prediction using different preprocessing techniques with two analyzing methods. Hardy kiwi fruits were collected from five different areas of South Korea that include four different species. It has been shown that handheld Vis/NIRS is a promising tool for predicting the internal quality attributes of hardy kiwi fruits. Furthermore, it was found that Vis/NIRS with the combination of chemometrics could drive to proper models by choosing an appropriate preprocessing technique and analyzing method.
PLSR and SVM-R models were compared to each analysis method. Different preprocessing techniques were best for the different areas, species, and combined dataset model development using PLSR, whereas Autoscale preprocessing was found best in maximum cases for the model development using SVM-R analysis. Moreover, SVM-R analysis with Autoscale preprocessing resulted in better prediction ability with the higher correlation coefficient, lower RMSEP, and higher RPD value except for Gwangyang (G) area. The PLSR method is one of the common and widely used techniques in chemometrics, but relatively low prediction ability was obtained for each sector of datasets because of the use of a large number of samples containing very few adverse and influential samples. Results revealed that SVM-R analysis with Autoscale preprocessing can be used to validate the model and to predict the SSC of hardy kiwi fruit. The SSC prediction of hardy kiwi fruits using NIR data revealed that the use of a handheld portable spectrometer, such as F-750 produce quality meter shows excellency for the analysis of internal quality parameters of fruits. However, the RBF-SVM uses the assumption that there is an analytical function that describes the hyperplane. In the future, another regression model such as random forest model will be presented that does not use any assumption regarding hyperplane. Furthermore, models can be developed using Model builder software to check the instrument off the shelf capability to predict any other internal quality attributes of hardy kiwi fruits.
Supplementary Materials: The following are available online at http://www.mdpi.com/2304-8158/9/8/1078/s1, Table S1: PLSR results (excluding best) of each area with second derivative data and their preprocessing techniques applied at a spectral sampling range of 729-975 nm, Table S2: PLSR results (excluding best) of each species with second derivative data and their preprocessing techniques applied at a spectral sampling range of 729-975 nm, Table S3: SVM-R results (excluding best) of each area with second derivative data and their preprocessing techniques applied at a spectral sampling range of 729-975 nm, Table S4: SVM-R results (excluding best) of each species with second derivative data and their preprocessing techniques applied at a spectral sampling range of 729-975 nm.