Toward Achieving Rapid Estimation of Vitamin C in Citrus Peels by NIR Spectra Coupled with a Linear Algorithm

Citrus peels are rich in bioactive compounds such as vitamin C and extraction of vitamin C is a good strategy for citrus peel recycling. It is essential to evaluate the levels of vitamin C in citrus peels before reuse. In this study, a near-infrared (NIR)-based method was proposed to quantify the vitamin C content of citrus peels in a rapid way. The spectra of 249 citrus peels in the 912–1667 nm range were acquired, preprocessed, and then related to measured vitamin C values using the linear partial least squares (PLS) algorithm, indicating that normalization correction (NC) was more suitable for spectral preprocessing and NC-PLS model built with full NC spectra (375 wavelengths) showed a better performance in predicting vitamin C. To accelerate the predictive process, wavelength selection was conducted, and 15 optimal wavelengths were finally selected from NC spectra using the stepwise regression (SR) method, to predict vitamin C using the multiple linear regression (MLR) algorithm. The results showed that SR-NC-MLR model had the best predictive ability with correlation coefficients (rP) of 0.949 and root mean square error (RMSEP) of 14.814 mg/100 mg in prediction set, comparable to the NC-PLS model in predicting vitamin C. External validation was implemented using 40 independent citrus peels samples to validate the suitability of the SR-NC-MLR model, obtaining a good correlation (R2 = 0.9558) between predicted and measured vitamin C contents. In conclusion, it was reasonable and feasible to achieve the rapid estimation of vitamin C in citrus peels using NIR spectra coupled with MLR algorithm.


Introduction
Citrus fruit including oranges, tangerines, mandarins, clementines, grapefruit, pomelos, lemons, limes, and other minor varieties, originated in Southeast Asia, have been cultivated for the last 4000 years [1], and are some of the most popular fruits in the world. According to the statistics from Food and Agriculture Organization (FAO), citrus fruit are widely planted in more than 140 countries worldwide, with an annual output of about 100 million tons (Asia and the Americas accounts for over 70%) every year, playing an important role in the world's fruit trade and economy. In Asia, China is the largest producer and exporter of citrus fruit, with annual yields of over 40 million tons, exceeding 20% of the world's total fruit production.
Citrus fruit are mainly used to produce fresh juices or citrus-based drinks or as fresh foods for direct consumption of their edible parts, and large amounts of byproduct wastes are always generated either during industrial processing or after eating, such as citrus peels [2], which can cause environmental pollution issues.
Several studies have been reported that show that citrus peels are rich in various nutrients and bioactive ingredients, including carbohydrates, vitamin C, carotene, polyphenols, flavonoids, essential oils, pectin, etc. [3], which make citrus peels a good material In this study, we proposed to develop a NIR-based method for the rapid assessment of vitamin C levels in citrus peels. It is also the first effort to investigate the potential of NIR technology in monitoring the vitamin C of citrus peels. This study will provide a novel technical support to facilitate the use of citrus peels, reducing the waste.

Statistical Values of Vitamin C
A total of 249 citrus peels samples from 50 different varieties were prepared and their vitamin C contents were measured and arranged from small to large, and one of every three values was selected for model prediction, and the remaining values were used for model calibration and cross-validation. The statistical details are shown in Table 1. After calculation, it was found that all the measured vitamin C values obeyed normal distribution, indicating the statistical significance. On the other hand, the subsequent F-test and t-test required all the measure data to meet normal distribution. The specific data distribution is shown in Figure 1. studies on NIR technology for the quality evaluation of fruit peels have been r especially vitamin C in citrus peels. In this study, we proposed to develop a NIR-based method for the rapid ass of vitamin C levels in citrus peels. It is also the first effort to investigate the pot NIR technology in monitoring the vitamin C of citrus peels. This study will pr novel technical support to facilitate the use of citrus peels, reducing the waste.

Statistical Values of Vitamin C
A total of 249 citrus peels samples from 50 different varieties were prepared a vitamin C contents were measured and arranged from small to large, and one three values was selected for model prediction, and the remaining values were model calibration and cross-validation. The statistical details are shown in Table  calculation, it was found that all the measured vitamin C values obeyed normal d tion, indicating the statistical significance. On the other hand, the subsequent F-te test required all the measure data to meet normal distribution. The specific data d tion is shown in Figure 1.

Spectral Profiles of Samples
The extracted mean raw and preprocessed spectral profiles of citrus peel sam the range of 912 t 1667 nm are shown in Figure 2. Three absorption peaks at aro nm, 1200 nm, and 1450 nm were observed and originated from second overton stretching vibration (water absorption), second overtone C-H stretching (fat abs and first overtones O-H stretching vibration (water absorption), respectively [29] It was observed that the positions of absorption peaks (refer to the vertical from the x-coordinate) for the 249 curves in Figure 2a were different, and that wa bly due to the difference of physicochemical components in each citrus peel samp spectral preprocessing, the changes in absorption peak position in different ea were also observed, which may be due to the elimination of interference informati raw spectra such as electrical noise, light scattering and baseline drift, etc.
Although the specific absorption peak of vitamin C was not found, the relev ful NIR spectra can be mined and extracted by applying appropriate chemome

Spectral Profiles of Samples
The extracted mean raw and preprocessed spectral profiles of citrus peel samples in the range of 912 t 1667 nm are shown in Figure 2. Three absorption peaks at around 980 nm, 1200 nm, and 1450 nm were observed and originated from second overtones O-H stretching vibration (water absorption), second overtone C-H stretching (fat absorption) and first overtones O-H stretching vibration (water absorption), respectively [29].
It was observed that the positions of absorption peaks (refer to the vertical distance from the x-coordinate) for the 249 curves in Figure 2a were different, and that was probably due to the difference of physicochemical components in each citrus peel sample. After spectral preprocessing, the changes in absorption peak position in different each plots were also observed, which may be due to the elimination of interference information from raw spectra such as electrical noise, light scattering and baseline drift, etc.
Although the specific absorption peak of vitamin C was not found, the relevant useful NIR spectra can be mined and extracted by applying appropriate chemometrics, to relate to the vitamin C contents that is modeling, achieving the quantitatively prediction of vitamin C in citrus peels.
Molecules 2023, 28, x FOR PEER REVIEW 4 of relate to the vitamin C contents that is modeling, achieving the quantitatively predictio of vitamin C in citrus peels.

Predicting Vitamin C Using Full Wavelength
The full-band spectra (raw and preprocessed) within the range of 912 to 1667 nm (37 wavelengths) were mined to related to the measured vitamin C by partial least squar (PLS) algorithm, resulting in different performance of the nine PLS models in predictio of vitamin C of citrus peels samples, with correlation coefficients of prediction (rP) 0.877-0.974 and root mean square error of prediction (RMSEP) of 10.671-23.916 mg/100 ( Table 2).
It was also observed that the PLS models built with eight preprocessed spectra ha different predictive abilities in predicting vitamin C, compared with the RAW-PLS mod using raw spectra. Among, the NC-PLS model built with NC spectra showed the be performance in predicting vitamin C content of citrus peels samples (rP = 0.956, RMSEP 10.671 mg/100 g, residual predictive deviation (RPD) = 4.189), although largest numbe of latent variables (LV) involved, which indicated that NC was more suitable for prepr cessing the spectra of 912-1667 nm and that was probably due to the elimination of ad verse effects caused by outlier samples. Moreover, the NC-PLS model performed bett than the RAW-PLS model, in terms of correlation coefficients (r), root mean square erro (RMSEs), RPD, absolute value of difference between RMSEP and root mean square erro of calibration (RMSEC) (ΔE) values, indicating that implement of spectra pretreatment b

Predicting Vitamin C Using Full Wavelength
The full-band spectra (raw and preprocessed) within the range of 912 to 1667 nm (375 wavelengths) were mined to related to the measured vitamin C by partial least squares (PLS) algorithm, resulting in different performance of the nine PLS models in prediction of vitamin C of citrus peels samples, with correlation coefficients of prediction (r P ) of 0.877-0.974 and root mean square error of prediction (RMSE P ) of 10.671-23.916 mg/100 g ( Table 2). It was also observed that the PLS models built with eight preprocessed spectra had different predictive abilities in predicting vitamin C, compared with the RAW-PLS model using raw spectra. Among, the NC-PLS model built with NC spectra showed the best performance in predicting vitamin C content of citrus peels samples (r P = 0.956, RMSE P = 10.671 mg/100 g, residual predictive deviation (RPD) = 4.189), although largest numbers of latent variables (LV) involved, which indicated that NC was more suitable for preprocessing the spectra of 912-1667 nm and that was probably due to the elimination of adverse effects caused by outlier samples. Moreover, the NC-PLS model performed better than the RAW-PLS model, in terms of correlation coefficients (r), root mean square errors (RMSEs), RPD, absolute value of difference between RMSE P and root mean square error of calibration (RMSE C ) (∆E) values, indicating that implement of spectra pretreatment by NC method indeed improved the predictive accuracy and precision of the RAW-PLS model. In general, spectral preprocessing was necessary to predict the vitamin C of citrus peels, and an appropriate preprocessing method such as NC was required.
There are some reports on NIR technology in prediction of vitamin C concentration in other fruit such as tomatoes (range: 1295-2611 nm, r P = 0.81, RMSE P = 4.09 mg/100 g), oranges (range: 4000-10,000 cm −1 , r P = 0.71, RMSE P = 94.9 mg/L), acerola (range: 800-2500 nm, r P = 0.99, RMSE P = 166.27 mg/100 g), and apple (range: 4000-10,000 cm −1 , r P = 0.917, RMSE P = 4.8 mg/100 g) [30], different from our study, which was probably due to the different spectral ranges and samples involved in modeling. Until now, this is the first time using NIR to predict vitamin C level in citrus peels, and the satisfactory results were obtained. The further wavelength selection and model optimization were performed based on the NC spectra.

Optimal Wavelengths Selected by Four Different Methods
The optimal wavelengths were selected from NC spectra by regression coefficients (RC), stepwise regression (SR), successive projections algorithm (SPA) and competitive adaptive reweighted sampling (CARS) methods, respectively, and the results are shown in Table 3. After wavelength selection, the wavelength number decreased from 375 to 5-22, with the wavelength reduction of over 94%. It was also observed that most of the selected optimal wavelengths mainly located in the three regions of 912-1030 nm, 1161-1255 nm, and 1576-1667 nm, which are shown in Figure 3 and indicated that more spectral information related to vitamin C prediction existed in these regions.

Predicting Vitamin C Using Optimal Wavelengths
Based on the selected optimal wavelengths, the NC-PLS model was optimized and four optimized PLS models (RC-NC-PLS, SR-NC-PLS, SPA-NC-PLS, CARS-NC-PLS) were respectively established and their performance in predicting vitamin C of citrus peels samples are shown in Table 4.

Predicting Vitamin C Using Optimal Wavelengths
Based on the selected optimal wavelengths, the NC-PLS model was optimized and four optimized PLS models (RC-NC-PLS, SR-NC-PLS, SPA-NC-PLS, CARS-NC-PLS) were respectively established and their performance in predicting vitamin C of citrus peels samples are shown in Table 4.
Among these, the SR-NC-PLS model built with 15 optimal wavelengths selected from NC spectra by SR method showed a good predictive ability, with higher values of rP (0.936) and RPD (3.482) as well as lower values of RMSEP (16.689 mg/100 g) and ΔE (1.257 mg/100 g), better than those of other three optimized PLS models, which was probably due to the different optimal wavelengths involved in model construction. The results also indicated that the SR method was the best option to select optimal wavelengths.
In fact, multiple linear regression (MLR) can also be used for predicting a target parameter of samples in the situation where the number of wavelengths is less than the number of samples [31]. MLR is a classic linear algorithm and works to interpret linear relationship between one dependent variable and two or more independent variables [32]. In this study, based on the same selected optimal wavelengths, four MLR models including RC-NC-MLR, SR-NC-MLR, SPA-NC-MLR, and CARS-NC-MLR were respectively developed and assessed in terms of r and RMSE values. It was found that the SR-NC-MLR model had the best performance in predicting vitamin C among the four MLR models, carrying the largest values of rP (0.949) and RPD (4.260) as well as smallest values of RMSEP (14.814 mg/100 g) and ΔE (1.384 mg/100 g), better than the SR-NC-PLS model.
In addition, through a comparative analysis, it was found that the SR-NC-MLR model was comparable to the NC-PLS model in predicting vitamin C of citrus peels samples, indicating that the optimization of NC-PLS model was successful. This also meant  Among these, the SR-NC-PLS model built with 15 optimal wavelengths selected from NC spectra by SR method showed a good predictive ability, with higher values of r P (0.936) and RPD (3.482) as well as lower values of RMSE P (16.689 mg/100 g) and ∆E (1.257 mg/100 g), better than those of other three optimized PLS models, which was probably due to the different optimal wavelengths involved in model construction. The results also indicated that the SR method was the best option to select optimal wavelengths.
In fact, multiple linear regression (MLR) can also be used for predicting a target parameter of samples in the situation where the number of wavelengths is less than the number of samples [31]. MLR is a classic linear algorithm and works to interpret linear relationship between one dependent variable and two or more independent variables [32]. In this study, based on the same selected optimal wavelengths, four MLR models including RC-NC-MLR, SR-NC-MLR, SPA-NC-MLR, and CARS-NC-MLR were respectively developed and assessed in terms of r and RMSE values. It was found that the SR-NC-MLR model had the best performance in predicting vitamin C among the four MLR models, carrying the largest values of r P (0.949) and RPD (4.260) as well as smallest values of RMSE P (14.814 mg/100 g) and ∆E (1.384 mg/100 g), better than the SR-NC-PLS model.
In addition, through a comparative analysis, it was found that the SR-NC-MLR model was comparable to the NC-PLS model in predicting vitamin C of citrus peels samples, indicating that the optimization of NC-PLS model was successful. This also meant that the selected 15 optimal wavelengths and the full 375 wavelengths contributed similarly to the vitamin C prediction.

F-Test and T-Test Analysis
As shown in Table 5, after implementing F-test, it was found that the F value (1.004) was smaller than the F (one-tailed critical value) value for the SR-NC-MLR model, which indicated that no significant difference between the measured value and predicted value of vitamin C contents in citrus peels existed. It was also observed from t-test results that the t value was less than the t (two-tailed critical value) value, revealing that there was no significant difference between the mean values of the measured value and the predicted value of vitamin C contents in citrus peels.
In short, the good soundness and predictive validity of the SR-NC-MLR model were verified using the F-test and t-test analysis. In other words, it was reasonable and feasible to apply the SR-NC-MLR model to predict vitamin C contents of citrus peels.

Independent External Validation of Best Optimized Model
Forty citrus peels were used as independent samples to externally validate the validity and suitability of the SR-NC-MLR model in predicting vitamin C contents, and the results are shown in Figure 4. A good correlation (R 2 = 0.9558) was found between the predicted and the measured values of vitamin C contents, indicating the good predictive performance of the SR-NC-MLR model. df 164 t Stat 0.0517 P (T <= t) one-tailed 0.479 t (one-tailed critical value) 1.654 P (T <= t) two-tailed 0.959 t (two-tailed critical value) 1.975

Independent External Validation of Best Optimized Model
Forty citrus peels were used as independent samples to externally validate the validity and suitability of the SR-NC-MLR model in predicting vitamin C contents, and the results are shown in Figure 4. A good correlation (R 2 = 0.9558) was found between the predicted and the measured values of vitamin C contents, indicating the good predictive performance of the SR-NC-MLR model.

Spectral Collection and Preprocessing
A portable NIR spectroscopy device (Isuzu Optics Corp., Zhubei, Taiwan) was used to collect spectral information of citrus peels in reflectance mode. The machine mainly consists of four sections including a spectrograph (covering spectral range of 900-1700 nm, 1 mm InGaAs detector), a ring-shaped Halogen lamp (20 W), a glass plate (diameter, 60 mm; height, 10 mm). and spectral analysis software (NIRez 2.0 Rice, Isuzu Optics Corp., Taiwan). The device was operated by setting the exposure time of 0.63 ms and the scan

Spectral Collection and Preprocessing
A portable NIR spectroscopy device (Isuzu Optics Corp., Zhubei, Taiwan) was used to collect spectral information of citrus peels in reflectance mode. The machine mainly consists of four sections including a spectrograph (covering spectral range of 900-1700 nm, 1 mm InGaAs detector), a ring-shaped Halogen lamp (20 W), a glass plate (diameter, 60 mm; height, 10 mm). and spectral analysis software (NIRez 2.0 Rice, Isuzu Optics Corp., Taiwan). The device was operated by setting the exposure time of 0.63 ms and the scan number of 5. Before the spectral collection of samples, the device was calibrated by scanning a white tile bar with reflectance of 99.99% and then turning off the light source to ensure 0.00% reflectance.
Before each test, several citrus fruit were taken out and one piece of peel with size of 20 mm × 20 mm (length × width) from each citrus fruit was cut to put into the glass plate of NIR device. Each citrus peel was scanned five times to obtain the average spectra. Finally, a total of 249 spectra of citrus peel samples were prepared for further analysis. The spectra in the range of 912-1667 nm (375 wavelengths) was only considered and analyzed, because of obvious noises existed in the two regions of 900-912 nm and 1667-1700 nm.
The process of spectral collection is always negatively influenced by several factors such as sample status, light scattering, stray light, baseline drift, instrument response and the surrounding environment [33]. Therefore, it is quite necessary to perform spectra preprocessing to minimize or even eliminate the undesirable effects, improving the signalto-noise ratio of spectra and predictive ability of subsequent constructed model. In this study, six preprocessing methods including SGS, NC, MSC, 1st Der, 2nd Der, BC, SNV, and MCT were applied to preprocess the collected raw spectra, respectively.
SGS uses polynomials to achieve data smoothing, based on the PLS algorithm, retaining useful information in signal analysis and eliminating random noise [34]. NC is used to eliminate influence of changes in optical path or sample dilution on spectra [35]. MSC can eliminate noises caused by specular reflection and non-uniformity of sample, spectral baseline drift and non-repeatability [36]. Derivation is an effective preprocessing method used to eliminate baseline drift and improve spectral resolution. The 1st Der and 2nd Der can remove the constant baseline and the first functional baseline, respectively [37]. BC can effectively correct drifts originated from electronic offset, dark current and readout noise [38]. SNV is applied to reduce influences of uneven particle size and non-specific scattering of particle surface [39]. MCT is realized using sample spectra minus mean spectra of calibration set to increase the difference between sample spectra, thus improving robustness and prediction ability of model [40].
All the spectral preprocessing were completed using software Unscramble 10.3X (CAMO, Oslo, Norway).

Measurement of Vitamin C
The vitamin C contents in citrus peel samples were determined using the chemical 2-6-dichlorophenol indophenol titration method (AOAC Method 967.21) [41], three times for each sample, and the averaged values were used. In this study, the NIR-based method was developed and compared with the official chemical method, expecting to potentially substitute the official method for vitamin C determination in the future.

Quantitative Relationship Establishment between Spectra and Vitamin C
The quantitative relationships between the raw, preprocessed NIR spectra and the measured vitamin C values were respectively established by applying linear PLS regression algorithm. PLS is always used to build the fundamental relationship between two matrixes (X and Y), explaining Y space with greatest variance by finding multidimensional directions in X space, and is particularly suitable when X matrix (predicted) has more variables than Y matrix (observed), as well as when there is multicollinearity in X [42]. PLS combines the advantages of principal component analysis (PCA), canonical correlation analysis (CCA) and multiple linear regression (MLR) analysis, and can achieve predictive function through extracting a group of irrelevant latent variables (LV) [43]. PLS model performance is related to the number of LV, and a good PLS model always has small number of LV [44].
The predictive performance of the PLS model is evaluated mainly using r and RMSE in the calibration set (r C & RMSE C ), cross-validation set (r CV & RMSE CV ), and prediction set (r P & RMSE P ) [45]. The cross-validation is implemented by leaving one sample out from the calibration set in turn, and then rebuilding a model with the remaining samples to predict the excluded sample, i.e., leave-one-out cross-validation [46]. Generally, a PLS model with good performance always have higher values of r and lower values of RMSEs. Two other parameters including RPD and ∆E are also used to assess the PLS model quality. RPD is ratio of standard deviation of measured values to RMSE P values in prediction set. ∆E value is used to indicate the model robustness. A good PLS model is usually accompanied by a greater value of RPD and a smaller value of ∆E [47].
In this study, by inputting a matrix (375 NIR spectra as X variables, 166 vitamin C values as Y variables) to run PLS algorithm and using the remaining 83 vitamin C values for prediction purpose, an intrinsic relation between the two variables was explored, i.e., PLS modeling. The established PLS model was evaluated by terms of r C , r CV , r P , RMSE C , RMSE CV , RMSE P , RPD and ∆E.

Optimal Wavelength Selection and Model Optimization
Generally, NIR spectral analysis technique is accompanied by a large amount of spectral data, which inevitably contains some spectral variables carrying noise, non-information wavelength or even interference information, resulting in the decrease of predictive efficiency [48]. Selection of wavelengths holding useful information is therefore very necessary and has been a key step in NIR data analysis, which can greatly reduce the data calculation, accelerate the prediction, improve the model prediction accuracy, and effectively prevent the overfitting prediction [49].
Four efficient methods including RC, SR, SPA, and CARS were respectively used to select the optimal wavelengths for further model optimization. In the procedure of RC, the wavelengths with large absolute values of regression coefficients in developed PLS model were selected as optimal wavelengths [50]. By running SR program, the optimal wavelengths were automatically selected by repeating the operations of forward addition and reverse deletion of spectral variables at the same time, and terminating with the minimum values of residual sum of squares (MVSSS) via increasing spectral variables [51]. In SPA process, by sequentially executing the selection of candidate subsets by projection, the evaluation of candidate subsets by predicted residual error sum of squares (PRESS) and the elimination of variables through F-test criterion, the wavelengths corresponding the minimum number and the lowest values of PRESS were considered to be optimal wavelengths [52]. In the CARS method, the important wavelengths were picked out by assessing the importance of each wavelength through the corresponding absolute value of regression coefficient, according to the law of survival of the fittest [53].
Using the same modeling process, the optimized PLS model was developed by inputting a new matrix containing the selected optimal wavelengths as X variables and 166 vitamin C values as Y variables. The remaining 83 vitamin C values were used for prediction. The optimized PLS model was also evaluated using the same parameters mentioned above.
The RC process and model optimization were performed in software Unscrambler 10.3X (CAMO, Oslo, Norway). The SR, SPA and CARS program were executed in software Matlab R2018a (The Mathworks, Inc., Natick, MA, USA).

Statistical Two-Sample Analysis
F-test and t-test two-sample analysis were conducted to verify the suitability of the established model in predicting vitamin C of citrus peels, ensuring the model soundness and predictive reliability. F-test, also called homogeneity test of variance, is a test under null hypothesis with statistical values obeyed F-distribution [54]. The t-test, also called Student's t-test, is applicable in three simultaneously required conditions of two sets of samples coming from the normal population, independence of the two sets of samples and satisfying homogeneity of variance (pass F-test) [55]. F-test was applied to test whether there is a significant difference between the variances of the measured and predicted vitamin C values. t-test was conducted to examine whether there is significant difference between the mean values of the measured and predicted vitamin C values. F-test was completed before the t-test.

External Validation of Model
To further evaluate the applicability and validity of the established calibration model, it is necessary to conduct external validation using a set of independent samples. For achieving the stable and effective prediction of vitamin C, 40 independent citrus peel samples were randomly collected from fresh harvested citrus fruit and used to validate the best optimized PLS model externally. The operation of the external validation was executed in software Unscramble 10.3X (CAMO, Oslo, Norway).

Conclusions
This study aimed to investigate the potential of NIR (912-1667 nm) combined with linear algorithms to determine vitamin C contents in citrus peels. NC method was more appropriate to preprocess the raw NIR spectra, and NC-PLS model constructed with full NC spectra showed a better performance in predicting vitamin C contents (r P = 0.956, RMSE P = 13.798 mg/100 g). Fifteen optimal wavelengths (915, 927, 960, 965, 1016, 1028,  1094, 1109, 1397, 1576, 1623, 1642, 1648, 1662, and 1664 nm) were further selected from NC spectra by SR method and applied to optimize the full band NC-PLS model through MLR algorithm, resulting in an optimized SR-NC-MLR model with similar good predictive abilities in predicting vitamin C contents, with r P of 0.949 and RMSE P of 14.814 mg/100 g, compared with the NC-PLS model. The reasonability and feasibility of the SR-NC-MLR model was verified by means of F-test and t-test analysis. The validity and suitability of SR-NC-MLR model was further verified using 40 independent citrus peel samples. It concluded that the developed NIR-based method is simple, efficient, and can be used for rapid determination of vitamin C content in citrus peels to facilitate peel recycling.