Improving Prediction of Peroxide Value of Edible Oils Using Regularized Regression Models

We present four unique prediction techniques, combined with multiple data pre-processing methods, utilizing a wide range of both oil types and oil peroxide values (PV) as well as incorporating natural aging for peroxide creation. Samples were PV assayed using a standard starch titration method, AOCS Method Cd 8-53, and used as a verified reference method for PV determination. Near-infrared (NIR) spectra were collected from each sample in two unique optical pathlengths (OPLs), 2 and 24 mm, then fused into a third distinct set. All three sets were used in partial least squares (PLS) regression, ridge regression, LASSO regression, and elastic net regression model calculation. While no individual regression model was established as the best, global models for each regression type and pre-processing method show good agreement between all regression types when performed in their optimal scenarios. Furthermore, small spectral window size boxcar averaging shows prediction accuracy improvements for edible oil PVs. Best-performing models for each regression type are: PLS regression, 25 point boxcar window fused OPL spectral information RMSEP = 2.50; ridge regression, 5 point boxcar window, 24 mm OPL, RMSEP = 2.20; LASSO raw spectral information, 24 mm OPL, RMSEP = 1.80; and elastic net, 10 point boxcar window, 24 mm OPL, RMSEP = 1.91. The results show promising advancements in the development of a full global model for PV determination of edible oils.


Introduction
The peroxide value (PV) of an edible oil is an indicator of freshness as viewed through oxidative degradation. Chemically, PV is a measurement of the primary oxidation of hydroxyl groups of unsaturated fats in oils by molecular oxygen into hydroperoxides and peroxides [1]. This measurement is often presented in milliequivalents O 2 /kg (mEq O 2 /kg) of oil. Full auto-oxidation of oils further converts the created peroxides and hydroperoxides into alcohols, aldehydes and ketones, which are directly responsible for the rancidity of the oil [2,3]. Fresh oils have peroxide values below approximately 10 mEq O 2 /kg, while oils that have spoiled and became rancid present peroxide values above 30 mEq O 2 /kg [2]. Furthermore, peroxide values as high as 100 mEq O 2 /kg have been linked to cases of food poisoning [4]. The American Oil Chemists Society (AOCS) Official Method Cd 8-53 [5] and the Commission Regulation (EEC) No 2568/91 of 11 July 1991 [6] have established standard iodometric titrations for the determination of edible oil PV in an attempt to maintain product quality control. The titration endpoint, and hence sample PV, is determined by either colorimetric or electrochemical means. However, this established standard method requires toxic chemicals, is labor intensive and time consuming, and requires specialized equipment such as a chemical fume hood; for these reasons, rapid PV analysis in the field is not practical.
In this study, a wide variety of naturally aged edible oils were measured using NIR spectroscopy in sample cells having optical pathlengths (OPL) of 2 and 24 mm, respectively. The differing OPLs were required because there was considerable variation in the signal to noise of spectral features measured in the NIR. Spectral regions of lower signal intensity (i.e., lower absorption) benefited from the 24 mm OPL, whereas the shorter 2 mm OPL mitigated saturation of highly absorbing bands. The multivariate analysis methods including partial least squares regression (PLS), ridge regression, LASSO regression and elastic net regression were used to predict the PV. Additionally, the spectral information of the 2 and 24 mm OPL NIR spectra were fused to utilize the most optimal spectral information from each pathlength.

Results
The PV range of the calibration set was 5.6-80 mEq O 2 /kg PV; the validation set PV range was 12 to 52 mEq O 2 /kg PV. Optimized predictive performance for all methods investigated ranged from 1.80 to 2.50 mEq O 2 /kg PV (Table 1). Models were optimized for degree of boxcar smoothing and OPL of the spectra employed. The best models involved some 1-norm regularization, either LASSO or Elastic Net. The results from each regression technique are discussed in detail below. In all regression techniques, the 2 mm OPL data was shown to be outperformed by the 24 mm OPL data; however, the fused OPL data did marginal improve prediction errors when models were built with PLS. Example regression biplots can be found in Figures A1-A4 in Appendix A.

PLS
All investigated PLS models, spanning four boxcar averaging windows and three OPL combinations, were found to be optimized with six to nine latent variables (LV) based on RMSECV. The upper end of this LV range is in agreement with the original analyses of this data set by Ottaway [24]. However, fewer LVs were needed for PLS models when more aggressive boxcar averaging was employed. We hypothesize that this trend is due to noise reduction and eliminating chance correlations within lower signal-to-noise (S/N) wavelengths. PLS RMSEP values range from 2.50 to 4.80 mEq O 2 /kg PV (Table 2), showing slight performative increase in certain situations when performed with minimal signal averaging pre-processing steps. The best-performing subset of the PLS regression models utilized a boxcar window size of 25 data points.
Overall, the 24 mm OPL, and fused OPL data sets outperform the prediction of the 2 mm OPL data sets; this is an indication that the majority of the useful predictive bands come from the 24 mm data. A simple data fusion pre-processing step provides equivalent, or decreased prediction error in all tested cases for PLS regression with the exception of a boxcar window size of 10, which shows no change, or a slight increase in all cases. The 24 mm and fused OPL data sets perform within 5% of each other in all tested cases, but approximately 50% better than the 2 mm OPL set.

Ridge Regression
Ridge regression models returned a RMSEP range of 2.20-4.14 mEq O 2 /kg PV ( Table 3). The optimized ridge regression models demonstrated comparable fits of the model to the calibration set and the test set. When performing ridge regression on the 2 mm OPL data set under no boxcar averaging conditions, bias is presented via 1 high influence point. This biased, high influence point also influences the fused OPLs data set. The influence of this point is significantly reduced in pathlength fused, boxcar averaged sets. However, under identical treatment conditions, the 24 mm OPL NIR regression shows significantly less, if any bias. Overall the best prediction results from the 24 mm OPL data with the 5 point boxcar window. Spectral fusion provides marginally decreased prediction errors when no boxcar averaging is employed; however, once any level of boxcar averaging is used, the 24 mm OPL outperforms the fused OPL. In all boxcar averaged data sets, spectral fusion is outperformed by the 24 mm OPL individual regressions, by a minimum of 10%. However, in the analysis of the raw spectral information, spectral fusion out performs the others by at least 3.5%. Interestingly, the raw spectral information (that is, no boxcar filtering) provides the most consistent Ridge prediction errors across the three data sets, with a range of 0.41 mEq O 2 /kg PV (no boxcar) compared to ranges of 1.94 mEq O 2 /kg PV (5 point boxcar), 1.74 mEq O 2 /kg PV (10 point boxcar), and 0.96 mEq O 2 /kg PV (25 point boxcar). Furthermore, the 2 mm OPL data set shows approximately a 20% decrease in prediction error when utilizing a 25 point boxcar window, while the 24 mm and fused OPL data sets show the same, or greater decrease in RMSEP values when utilizing a boxcar window of only 5 data points.  24 mm OPL data set is the only method in which boxcar averaging shows a definitive increase in prediction error. However, the best LASSO models did demonstrate better RMSEP than the best PLS and ridge regression models, and comparable RMSEP with elastic net predictions.

Elastic Net
Elastic net regression shows prediction errors ranging from 1.91 to 3.83 mEq O 2 /kg PV (Table 5) with the exception of the raw, fused spectral information (no boxcar average pre-processing), and 5 point boxcar average pre-processing, fused spectral infomation. In the excluded cases, the elastic net regression prediction model fails, resulting in an RMSEP of 13.08 mEq O 2 /kg PV, and 12.11 mEq O 2 /kg PV and will be ignored for the immediate discussion of results. In the non-boxcar averaged data pre-processing trials, elastic net regression shows the smallest degree of bias of all 4 regression models tested. The elastic net regression model parameters leaned heavily to a LASSO model optimization with a slight contribution from a ridge regression model optimization in the penalty function. In many cases, the results of the elastic net and LASSO regressions produce almost identical RMSE values.

Discussion
It is noteworthy that the median RMSEP for the optimized models (Table 1) is only 2.06 mEq O 2 /kg PV given that the samples span 18 different classes of edible oils, multiple brands within each class, and occasionally multiple years within a brand. When considering all models, excluding the raw, and 5 point boxcar average fused elastic net, the models perform on average very similarly, with elastic net outperforming the others by a small margin, with RMSEP averages of 3.2 ± 0.94 mEq O 2 /kg PV for PLS, 3.2 ± 0.72 mEq O 2 /kg PV for ridge, 3.08 ± 0.85 mEq O 2 /kg PV for LASSO, and 2.99 ± 0.80 mEq O 2 /kg PV for elastic net. By comparison, many published studies present comparable RMSEP for studies employing only a single brand of one oil type. The relatively narrow spread of RMSEP across the 6 optimized models, +/−10% of the median mEq O 2 /kg PV, provides confidence that the best models are not spurious in nature-that is to say, one would lose confidence in a particular model were it to greatly outperform other models with similar structures and pre-processing protocols.
The regression methods investigated rely on different strategies to optimize the biasvariance trade-off. The methods that rely heavily on L-1 regularization (LASSO and elastic net) outperformed the models that rely heavily on L-2 optimizations (PLS and ridge). This is for both the 6 optimized models in Table 1 and as a trend across models for different combinations of OPL selection and boxcar smoothing. Elastic net does not, by default, rely heavily on L-1 regularization, but the λ-term in the optimized models showed that the best models were much closer to LASSO models then ridge models. The functional difference between LASSO and PLS/ridge models is that LASSO drives uniformative regression coefficients to zero. By contrast, PLS/ridge models minimize, but do not nullify, these coefficients. Hence, PLS/ridge leave a greater possibility for random errors to propagate through the calibration process. However, in some instances LASSO methods could eliminate needed variables, resulting in increased model bias.
As elastic net does not rely solely on L-1 regularization, the regression coefficients do not reach 0, and are subsequently not removed from calculation as quickly, meaning elastic net can be more computationally taxing than ridge and LASSO. An example of this occurred in the raw and 5 point binned fused spectral information regression calculation, where no reasonable model could be achieved. For these specific cases, the large number of variables (spectral space) necessitated a limited set of tuning parameters. Therefore, what may have happened was a local minima was found by chance when the tuning parameters happened to align so that the local minima showed a lower apparent RMSE during tuning. However, the addition of a greater number of tuning variables could lead to extensive calculation time; beyond those feasible for realistic use. Interestingly in this study, the raw and 5 point boxcar, fused spectral information sets failed in all attempts, even when performed with parameters which performed well in other models.
The different performances of LASSO and PLS/ridge methods can be viewed through the implied distributions of true regression coefficients for the calibration problem. PLS/ridge assumes a normal distribution of regression coefficients while LASSO assumes a Laplace distribution [27]. While it is impossible to know the true distribution of regression coefficients with certainty for any calibration problem, in general LASSO tends to out-perform ridge regression when there are a few variables with significantly larger effects than many other variables [27]. While ridge is often superior when the data is comprised of many effects of equal importance across the observations [28]. One could imagine that, even with narrowing the region of interest to center on the vibrational modes, there are still many 'baseline' observations with minimal predictive ability in the employed data. Although a LASSO model did provide the best RMSEP, multiple elastic net models nearly performed as well as the best LASSO model and the robustly consistent performance of elastic net across multiple OPL and boxcar window may make elastic net the preferred choice for NIR determination of PV in edible oils. The one instance where elastic net did not perform well was analyses of the fused data with no, or limited boxcar averaging. Theoretically, the fused data should perform comparable, at worse, to the best of the constituent data streams; however, the RMSEC, RMSECV and RMSEP were all significantly worse. This observation could stem from computational issues with the data set; many noisy variables in a wide, collinear matrix hinder the algorithm from approaching the true optimal solution. Analyses of the fused, no boxcar averaged, data by elastic net required extremely long calculation times, in the realm of 10 h. Applying a 10 point boxcar smoother greatly improved both the computation time and the model performance.
Simple fusion of the two OPL data streams improved only the PLS model prediction performances. Furthermore, the fused PLS predictions do not significantly outperform the 24 mm OPL prediction performance. For the case of edible oils, a low wave number NIR spectral range of 4450-6200 cm −1 is elucidated from the 2 mm OPL cell, and a mid-spectral range of 6300-11300 cm −1 from the 24 mm OPL cell. The information in each region is unique to the OPL it was measured from and together can prove greater than the sum of the parts. However, as the spectral fusion analysis relies on the input of both input OPLs, poor-quality spectra collection, low spectral resolution, or high noise in either of the OPL spectra will greatly impact the performance of the final model. In this case, the 2 mm OPL spectral set derives its prediction information from small changes in a low-intensity spectral feature which are redundant to information captured in the 24 mm OPL, specifically the CH 2nd overtone (Figure 1) [24].
performances. Furthermore, the fused PLS predictions do not significantly outperform the 24 mm OPL prediction performance. For the case of edible oils, a low wave number NIR spectral range of 4450-6200 cm −1 is elucidated from the 2 mm OPL cell, and a mid-spectral range of 6300-11300 cm −1 from the 24 mm OPL cell. The information in each region is unique to the OPL it was measured from and together can prove greater than the sum of the parts. However, as the spectral fusion analysis relies on the input of both input OPLs, poor-quality spectra collection, low spectral resolution, or high noise in either of the OPL spectra will greatly impact the performance of the final model. In this case, the 2 mm OPL spectral set derives its prediction information from small changes in a low-intensity spectral feature which are redundant to information captured in the 24 mm OPL, specifically the CH 2nd overtone (Figure 1) [24]. Boxcar averaging produces a minimal prediction improvement for all cases except the 24 mm OPL PLS and 24 mm OPL LASSO regressions. In general, the five-point and 10 point boxcar windows outperformed no boxcar averaging or using a 25 point boxcar win- Boxcar averaging produces a minimal prediction improvement for all cases except the 24 mm OPL PLS and 24 mm OPL LASSO regressions. In general, the five-point and 10 point boxcar windows outperformed no boxcar averaging or using a 25 point boxcar window, although not unanimously. However, the 2 mm OPL unanimously performs best utilizing a large boxcar window. For the smaller boxcar windows, baseline noise reduction occurs in larger magnitude/proportion than information loss contained in that same spectrum. With a 25 point boxcar, vibrational bands begin to blur together, hindering the possibility to differentiate among PV-related changes in the spectra and other sources of variance such as oil type-related changes in the spectra. However, a 25 point boxcar window retains great enough spectral resolution to provide adequate regression predictions. Furthermore, the 2 mm OPL data set benefits from loss of spectral resolution as it helps to alleviate the spectral variation of the 2 mm OPL data set. Applying a minimal boxcar average to the data has the added advantage of greatly improving the computational time required to construct and optimize each model. Overall, the effect of spectral fusion and pre-processing demonstrates the preference of good data collection before complex statistical analysis.

Edible Oils
All edible oil samples were purchased from consumer supermarkets in the Newark Delaware region between 2012 and 2016. This multi-year range allowed for a wide range of PVs. All samples were stored at ambient conditions in the original container until aliquots of oil samples were collected for measurements. The starting set of oils consisted of 99 measured oil samples, with PVs ranging from1.7 to 80 mEq O 2 /kg, spanning 19 unique oil classes. Four outliers were identified and removed for a final 95 oil samples ranging 18 unique classes. Classes were assigned based upon manufacturer labeling, and therefore present the possibility of mislabeling, fraud or blend confusion. For example, oil blends of vegetable oil, vegetable and canola oil, and canola, sunflower and soybean oil were all classified uniquely even considering vegetable oil is most commonly a blend of soy and canola oils. Single measurements of both the AOCS iodometric titration oil PV and NIR spectrum were used for later analysis.

Peroxide Value Measurement
Peroxide values were determined using AOCS Cd 8-53 method by Eurofins Scientific inc., Nutrition Analysis Center, 2200 Rittenhouse Street, Suite 150 Des Moines, IA 50321. All edible oil PVs were measured close in time to the spectral analysis as possible to minimize additional aging the oil samples.

Spectroscopic Oil Measurement
Edible oil NIR spectra were also measured in the same time frame and are representative of the same oil PVs. Both the 2 and 24 mm OPL NIR spectroscopic measurements were performed at Lawrence Livermore National Laboratory using a Bruker Vertex 70 (Billerica, MA, USA) fitted with a room temperature InGaAs detector. The 24 mm NIR spectra were collected directly through the side wall of the glass scintillation storage vial, while the 2 mm OPL spectra were collected in a cuvette (Starna Cells, Inc. Atascadero, CA, USA, Spectrosil 1-Q-2). The recorded spectra were an average of 64 scans at 2 cm −1 resolutions, with a spectral range of 3799-14,998 cm −1 for the both OPL cells. All NIR spectra were referenced to an air blank, and no sample preparation was used. Both the exterior of the glass vial, and glass sample cell were wiped clean with a cloth before insertion in the instrument sample chamber. Usable region selection was dependent on two factors: all spectral features with an absorbance value greater than two were removed, and all regions of baseline with little variance were cut. For the 2 mm OPL, the region from 4450 to 6200 cm −1 was chosen; and for the 24 mm OPL, the region from 6300 to 11300 cm −1 was chosen (Figure 2).

Pre-Processing
Outlier removal was performed in two steps. First, two titrated samples were identified as statistical PV outliers, and thus removed. These had determined PV of 129.0 and 155.0 mEq O 2 /kg PV, well beyond the range of the other 97 samples and were identified through finding values outside of the range of 3 standard deviations from the mean in both directions. Two more of the titrated oil samples were identified as spectral outliers using Cook's distance as an identifier. Cook's distance was found by performing a PLS regression and predicting a linear fit to the regression. Any value outside of the range of 3 standard deviations from mean Cook's distance were removed.
The oil samples were then randomly split into calibration and validation sets. The validation set was approximately 20% of the total samples, specifically 17 of 95 total samples, and contained proportional olive oil and non-olive oil samples, 8 olive oil and 9 non-olive oil. All samples were classified as specific oil type, as well a secondary classification of olive oil or non-olive oil. For blended oils: if the blend contained olive oil it was considered part of the olive oil class. The final, analyzed set contained 95 unique oil samples covering a range of 18 unique oil classes ( Table 6).
All subsequent data pre-processing steps were performed individually on each set of data, with the pre-processing parameters determined from the calibration subset and subsequently applied to the validation subset. The two previously selected spectral regions were later combined to create a simple, fused data set used to perform the same, final regression model calculations. Fusing the different OPL measurements includes the unique aspects of each NIR spectra into the same regression, therefore, providing the greatest possible information, and therefore variance, to regress to.
Each of the three data sets were tested using baseline correction techniques to determine which was most beneficial for creating regression prediction models for PV. Based on previous work, Ottaway et al. [24], 1st-and 2nd-order Savitzky-Golay smoothing was performed, with window sizes between 5 and 50; however, it showed no benefit to final prediction results. For these data sets, autoscaling provides adequate reduction in the random baseline variance within each spectra set. Small window spectral boxcar averaging is shown in previous work to further reduce the intrinsic noise in each observed data point when applied to systems with adequate spectral resolution [29]. Spectral boxcar averaging works to increase prediction efficiency by reducing the effect of random signal intensity changes, such as noise, while simultaneously increasing the effect of true chemical signals (Figure 3). Boxcar averaging spectral information has the added benefit of decreasing computational requirements and allowing simpler systems to perform the regressions require for proper PV prediction. Table 6. Class identification as indicated by oil manufacturer, and number of samples in each class. Classes with 0 samples indicate classes from previous work (Ottaway et al.) [24] with all samples removed as outliers, or not re-titrated in 2019.

Multivariate Analysis
PLS regression has been previously studied in depth to predict the PV of edible oil samples. The bilinear model for PLS regression is better able to address data sets with multicollinearity and data sets with more variables than predictors than univariate methods. Therefore, PLS is well suited for spectroscopy as the spectral space variables (X) are multicollinear while predictors (Y) are often only a scalar for each spectrum. The generalized model for PLS regression is the decomposition of the data (X) and predictor (Y) matrices to maximize the covariance of the X and Y scores matrices (T and U, respectively), with included error terms (E and F, respectively). This algorithmic method optimizes the regression coefficients used to create a least squares regression model.
To our knowledge ridge, LASSO and elastic net have not been used before to perform the same predictions on oil samples. Ridge, LASSO and elastic net are all based on the same extension on the Ordinary Least Squares (OLS) optimization equation [30]. The basic OLS optimization is modified using a penalty factor to bias the regression coefficients for model construction (Equations (3)-(5)).
All three regression techniques are the same expansion of the multilinear model approaching the regression coefficient penalization factor (k) in a different way. Ridge regression imposes a L-2 regularization penalty which varies the regression coefficient by a squared factor of the penalty value [31], where λ is equal to the tuning parameter and β is equal to the regression coefficients. This results in regression coefficients never reaching 0, meaning the model must account for more regression coefficients to properly calculate. LASSO regression prioritizes a simpler model and therefore uses a L-1 regularization to reduce some of the regression coefficients to 0, eliminating them from the calculation [32], where λ is equal to the tuning parameter and β is equal to the regression coefficients. This is performed by use an absolute value factor for the penalty factor. Elastic net combines both L-1 and L-2 regularization to maximize the benefits of both previous techniques [33], where λ is equal to the tuning parameter and β is equal to the regression coefficients. In elastic net regression, the alpha parameter is a hyperparameter that acts as a decider between ridge and LASSO contribution with each extreme (1 or 0) being a pure regularization regression. However, this must be performed simultaneously, as sequential regularization of the data results in greater bias and overfitting, and therefore a worse prediction model than using either method individually. Ridge, LASSO and elastic net are chosen specifically for their enhanced ability to deal with high multicollinearity within data sets. All 3 OLS based optimization regression models approach the least squares regression model from the same algorithmic method, with related optimization. However, the optimization method for Ridge, LASSO, and elastic net varies greatly from the method used in PLS. The root mean squared error of cross-validation (RMSECV) and percent variance explained from the PLS regression were used to determine the number of LVs for modeling. The RMSECV and root mean squared error of calibration (RMSEC) were found using the calibration set, then the validation set was projected into the model in order to calculate RMSEP as prediction error from the true measurements. Next, ridge, LASSO and elastic net regression were performed on each calibration set in two steps. The process of developing a prediction model was identical for ridge and LASSO, and only slightly varied for elastic net. First, for all three regression types, a separate tuning step was performed to identify the best fit parameter/s, hereby identified as lambda for ridge, and fraction for LASSO. The optimal lambda or fraction value was chosen via a plot of RMSE vs. log(lambda/fraction). For all calibration models, leave-one-out cross-validation was chosen as due to the limited number of samples. For elastic net regression, which varies both parameters simultaneously, the same method of tuning the parameters was used; however, the proper value was chosen from the parameter set, which provided the lowest RMSE vs. log(lambda and fraction) without obvious overfitting. An identical process was followed for the boxcar averaged data sets. In all cases, the RMSECV was calculated from leave-one-out cross-validation and the RMSEC was calculated as the calibration sets fit to the model. Lastly, the validation set was predicted into the regression model, and the RMSEP was calculated as a measurement from the true titration PVs.

Conclusions
Our results propose that, in the case of edible oil PV prediction, PLS regression presents reasonable, but substandard prediction results compared to other tested methods. Overall, elastic net regression provides the most consistent and, often, the most precise PV predictions when utilizing NIR spectroscopy, apart from some of the spectral fusion sets. The main benefit of elastic net regression is the inclusion of both the ridge and LASSO penalization parameter, allowing for a more precise estimation of the regression coefficient penalty factor. The result of considering two penalization parameters to produce prediction models creates models that are less affected by pre-processing method than the other presented models. While PLS, ridge, and LASSO individually provide reasonable predictions of PV, and can be improved though differing pre-processing methods, equivalent elastic net regression prediction results can be obtained without the need of these methods. Unfortunately, due to the more accurate prediction of regression penalty parameters, the same pre-processing methods have significantly less benefit than in other, simpler, regression methods. Furthermore, performing these regression predictions is not dependent on a small PV range, or different models for oil classes. This study shows that prior knowledge of the sample in question is not required for the accurate and precise prediction of PV. Most importantly, inclusion of varied oil classes and/or increased PV range, have little effect on PV prediction. This study provides excellent prediction of edible oil PVs, in agreement with previously published literature, without the need of class segregation or PV range limitation.
Dependent on the regression model in question, spectral fusion can provide significant benefit for PLS, moderate improvement for ridge, or seemingly no prediction improvement but possible prediction consistency improvement for LASSO and elastic net. However, the largest improvement of spectral fusion is the overall prediction consistency provided through the consideration of both OPLs simultaneously. The total prediction range of the spectral fusion sets, utilizing all regression models, is from 1.91 to 3.67 mEq O 2 /kg PV, a total range of 1.76 mEq O 2 /kg PV.