Rapid Detection of Fatty Acids in Edible Oils Using Vis-NIR Reflectance Spectroscopy with Multivariate Methods

The composition and content of fatty acids are critical indicators to identify the quality of edible oils. This study was undertaken to establish a rapid determination method for quality detection of edible oils based on quantitative analysis of palmitic acid, stearic acid, arachidic acid, and behenic acid. Seven kinds of oils were measured to obtain Vis-NIR spectra. Multivariate methods combined with pretreatment methods were adopted to establish quantitative analysis models for the four fatty acids. The model of support vector machine (SVM) with standard normal variate (SNV) pretreatment showed the best predictive performance for the four fatty acids. For the palmitic acid, the determination coefficient of prediction (RP2) was 0.9504 and the root mean square error of prediction (RMSEP) was 0.8181. For the stearic acid, RP2 and RMSEP were 0.9636 and 0.2965. In the prediction of arachidic acid, RP2 and RMSEP were 0.9576 and 0.0577. In the prediction of behenic acid, the RP2 and RMSEP were 0.9521 and 0.1486. Furthermore, the effective wavelengths selected by successive projections algorithm (SPA) were useful for establishing simplified prediction models. The results demonstrate that Vis-NIR spectroscopy combined with multivariate methods can provide a rapid and accurate approach for fatty acids detection of edible oils.


Introduction
The consumption of various edible oils has been increasing due to the growth of population. Edible oil is good of taste and health properties which provides many health beneficial substances including fatty acids, energy, and other essential trace elements [1]. The quality of edible oil is closely related to public health and food safety and the choice of edible oils affects the nutritional balance of the human body [2]. However, the quality of edible oils is frequently subjected to adulteration, contamination, deterioration, and re-use problems in production process [3,4], which greatly affects their edibility. The external features of edible oils are easily tampered by physical and chemical means, it is more difficult to discriminate the quality of edible oils based only on color, smell, and taste [5,6]. Every kind of pure edible oil has a relatively stable composition ratio of fatty acids, and all the above quality problems cause the significant change of the content of fatty acids in terms of the internal composition of edible oils. Therefore, the quantitative analysis of fatty acids would be an effective method to assess the quality of edible oils.
Some studies have been conducted to precisely measure the content of fatty acids for quality assessment of oil products in recent years [7]. Gas chromatography was the preferred analytical method for the determination of fatty acid methyl esters (FAMEs) [8][9][10]. High-performance liquid chromatography was also used for the analysis of fatty acids in biological samples [11]. The nuclear magnetic resonance technology can be used to measure fatty acids in edible oils [12]. In addition, fluorescent spectroscopy was studied as a detection method in oil quality testing [13]. Except fluorescent spectroscopy, these detection methods involve a series of time-consuming experimental preparation, such as extraction, derivatization, and chromatography steps. Meanwhile, hazardous chemicals are utilized in these methods; therefore, skilled analytical technicians are required. Fluorescent spectroscopy is a non-destructive method for discriminating edible oils without pretreatment steps. However, fluorescent spectroscopy depends on bulky and expensive fluorescent devices that severely limits its application in edible oil industry. All of these hamper the widespread applications of these methods. Therefore, a simple, rapid, and economical method is of high demand to detect the fatty acids of edible oils. In recent years, near-infrared spectroscopy was widely applied for oil detection. Near-infrared spectroscopy is an optical detection technique that measures the interaction of infrared radiation with analytes by absorption or reflection. The characteristics of molecular vibration and rotation of analytes can be obtained by inspecting the absorption spectra. For example, the rapid identification of edible oil and swill-cooked dirty oil was studied by employing nearinfrared spectroscopy and the sparse representation classification method [14]. Jiang et al., applied a near-infrared spectroscopy system to determine acid values during edible oil stored procedures [15]. However, the near-infrared spectroscopy was generally effective for qualitative analysis, while quantitative analysis based on near-infrared spectroscopy was not ideal in some studies of edible oils [16]. The wavelengths in near-infrared spectrum range mainly reflect the molecular vibration characteristics of objects. The visible spectrum range is a very important supplement for distinguishing the objects based on the basic physical forms. Therefore, the integrated analysis of visible spectrum and near-infrared spectrum can provide more comprehensive spectral information [17]. The visible and near-infrared part of the electromagnetic spectrum includes both the visible (350-780 nm) and near-infrared (780-2500 nm) ranges. However, the application of Vis-NIR reflectance spectroscopy in food composition detection is still restricted to the construction of the accurate data analysis model.
In this study, we presented an efficient and non-destructive method for the fatty acids detection of edible oils using Vis-NIR reflectance spectroscopy. The accurate computational models were constructed based on the Vis-NIR spectra data. The Vis-NIR spectra data were obtained by a hyperspectral spectroradiometer system to give a comprehensive characteristic information of edible oils. The palmitic acid (C16:0), stearic acid (C18:0), arachidic acid (C20:0), and behenic acid (C22:0) were chosen as experimental objects in this paper. Palmitic acid is a saturated long-chain fatty acid which accounts for 4.6-20.0% dominated in edible oils [18]. Stearic acid is a kind of saturated fatty acid with health benefits. It has no effect to increase the plasmatic levels of low-density lipoprotein [19] in contrast to other fatty acids in edible oils. Arachidic acid is kind of a saturated fatty acid with a 20-carbon chain. It can be a chemical messenger released by the muscle that controls the physiological response to the exercise [20]. Behenic acid is a saturated fatty acid that can promote the cholesterol levels in humans [21]. The four fatty acids are commonly found in various vegetable edible oils. They have specific physiological functions for human body and the relatively stable molecular structure resistance to temperature and other environmental factors. Therefore, the contents of the four fatty acids can be used as important references for the quality evaluation of edible oils. The four fatty acid contents were measured by the gas chromatography-mass spectrometry (GC-MS) as standard references for the purpose of model training and validation. The multivariate methods including partial least squares regression (PLSR), support vector machine (SVM) and random forest (RF) were applied with multiple pretreatments of spectra to establish prediction models for content prediction of the four fatty acids. In addition, the successive projections algorithm (SPA), variable importance of projection (VIP) and principal component analysis (PCA) were used for selecting the effective wavelengths to simplify the prediction models. The accurate quantitative analysis of the four fatty acids could provide a critical reference for quality assessment of edible oils.

Oil Samples
In this study, seven kinds of commonly consumed edible oils were randomly collected from local Walmart stores (Hefei, China). There were 93 oils including 15 brands of sesame oil, 15 brands of soybean oil, 11 brands of corn oil, 11 brands of sunflower oil, 13 brands of rapeseed oil, 15 brands of peanut oil, and 13 brands of olive oil obtained in the final sample set. All the oils were firstly stirred evenly for the separation of experimental samples. A piece of oil sample was taken from each of brand edible oils (20 mL/piece). A total of 93 oil samples were first used for Vis-NIR reflectance spectroscopy experiments. After spectra measurement, the four fatty acid contents in each piece of oil sample were analyzed by GC-MS.

Measurement of Vis-NIR Reflectance Spectra of Oil Samples
The Vis-NIR spectra of the 93 oil samples were measured by a field portable spectroradiometer (PSR-3500, Spectral Evolutions, Lawrence, MA, USA). The field portable spectroradiometer is a spectral remote sensing instrument that can achieve the fast and stable measurement of Vis-NIR spectra of objects. Three array detectors (one 512Si detector and two 256InGaAs detectors) were equipped in the hyperspectral instrument to take a accurate measurement of oil samples. The measurement range of the Vis-NIR spectra was 350-2500 nm. The spectral acquisition lens was set to 3 cm away from the oil samples. The optical fiber with field angle of 8 degree was used to measure the Vis-NIR spectra. Each piece of the oil samples was placed in a separated glass beaker (25 mL), as illustrated in Figure 1. For each piece of oil samples, ten Vis-NIR reflectance spectra were recorded with the resolution of 1 nm. To ensure the randomness of the spectrum in each measurement, the 93 oil samples were measured in ten different batches. To be specific, all the oil samples were measured in an independent batch and the process was repeated 10 times. Hence, a total of 930 Vis-NIR reflectance spectra data were measured for the following data analysis.  For each spectrum, there were 2151 data points spreading over all the 350-2500 nm wavelengths. A black box system with fixed light source was constructed for the acquisition Vis-NIR spectra to exclude the interference from external light. In order to eliminate some of the disturbance factors of the black box system, a whiteboard calibration was applied in our experiment of Vis-NIR spectra acquisition [22]. The final Vis-NIR spectra were calibrated based on Equation (Equation (1)) as follows: where R mi is the corrected result of the oil sample and R ri is reflectance the whiteboard. DN mi and DN ri are the original values for the oil samples and the whiteboard, respectively.

Measurement of Four Fatty Acid Contents in Oil Samples
The composition of various fatty acids is an important indicator of quality for edible oils. Therefore, the quantitative analysis of fatty acids is often used for assessing oil quality and discrimination of edible oil adulteration. In this paper, four fatty acids (i.e., palmitic acid, stearic acid, arachidic acid, and behenic acid) were chosen as the objects of study.
The reference values of four fatty acids in different edible oils were determined by GC-MS method. The accurate quantification of fatty acids in different edible oil samples were measured by the GCMS-QP2010 SE (Shimadzu Corporation, Japan) with the DB-5MS gas chromatographic column (30 m × 0.25 mm × 0.25 um). The purity 99.99% helium with the constant flow rate 1 mL/min was used as the carrier gas in the experiment. The fatty acid contents of the edible oils were analyzed after derivatization to their methyl ester products. The oil samples were preprocessed by four corresponding methyl esters (Methyl palmitate, Methyl stearate, Methyl arachidate, and Methyl behenate) in different concentrations to conduct the hot boiling separation because the boiling points of the derivatives from methyl esterification varied in a long range compared to the primitive forms. The detailed experimental process of methyl esterification was provided in Supplementary Materials, as shown in Figures S1 and S2 and Table S1. The oil samples were processed with methyl esterification and measured by the GC-MS to quantify the content of palmitic acid, stearic acid, arachidic acid, and behenic acid.

Pretreatment of Vis-NIR Reflectance Spectra
Although the precautions (the black box environment and the whiteboard calibration) had been taken in the spectral measurement system, the raw spectra are inevitable to suffer from the system noise and disturbance from measuring environment including light scattering, temperature, baseline migration and others. The multivariate scattering correction (MSC) [23], standard normal variate (SNV) [24], savitzky-golay (SG) smoothing [25], and wavelet transform (WT) [26] were applied for the Vis-NIR reflectance spectra to correct spectral data. The MSC and SNV are used to eliminate the effect of surface scattering and optical path variation in spectra data. The SG smoothing is an effective algorithm for improving spectral signal-to-noise ratio. The WT is commonly used for spectrum filtering and noise reduction. The four pretreatment algorithms were used to preprocess the reflectance spectra. Besides, the raw spectra without any pretreatments were also considered for model establishment to evaluate the effectiveness of different pretreatments methods in the application of Vis-NIR reflectance spectra.

Selection of Effective Wavelengths of Vis-NIR Reflectance Spectra
In addition, the obtained Vis-NIR reflectance spectra with full wavelengths were a high-dimensional data matrix. A large amount of redundant and irrelevant spectral information was mixed in the data. The effective wavelength selection was of important for establishing simplified and stabilized prediction models. In this paper, the successive projections algorithm (SPA) [27], variable importance of projection (VIP) [28] and principal component analysis (PCA) [29] were used for spectra feature extraction and wavelength selection.

•
Successive projections algorithm (SPA): SPA is a variable-selection technique that selects variables with minimal redundant information and collinearity from the spectral information. It is a forward selection method by calculating the projection of each wavelength on the other unselected wavelengths and introducing the wavelengths with maximum projection into the combination of wavelengths. • Variable importance in projection (VIP): VIP is an analytical technique for estimating the effect of individual variables in a system. The VIP score is a parameter used to evaluate the importance of the independent variable to the dependent variable in the model. An independent variable with a higher score is considered as significant influence on the dependent variable. Variables with low scores are discarded to ensure the validity of the model.
• Principal component analysis (PCA): PCA is a statistical analysis method that can reduce and simplify the original data. The spectra had a wide range of bands with a certain correlation between different bands. The generated principal components are the comprehensive indices by the linear combination of the primitive features (i.e., different wavelengths in this study), that can eliminate the correlation in original data. The loading vectors of PCA can be used to select the important wavelength regions. The higher the loading values, the more important the corresponding wavelengths. The wavelength points with the larger absolute values in the top loading vectors were selected as the effective wavelengths.

Models Establishment
The regression models for fatty acid contents prediction were developed by PLSR, SVM and RF. PLSR is a linear regression method to process high-dimensional regressors of one or several response variables [30]. A linear regression model is constructed based on a small number of latent variables which are the projection of explanatory variables and response variables in new space. The obtained latent variables have the maximum covariance between the new explanatory variables and the new response variables. PLSR possesses good performance in prediction analysis of spectrum data and has been widely used in chemometrics. The main idea of SVM is to convert the inputs from a low-dimensional feature space to a high-dimensional feature space [31]. The SVM applies a kernel function to construct an optimal hyperplane for separating the samples in high-dimensional feature space with good theoretical properties in generalization and convergence. Although the SVM is proposed for classification, it could also be used to regression analysis and has outstanding performance in spectral data analysis field [32]. RF is a non-linear ensemble method for prediction analysis. A series of simple decision trees are generated based on an injection of randomness strategy and all the prediction results are integrated as the final result. The result of classification or regression is the mean value of a large number of the generated decision trees [33,34].

Performance Evaluation
To assess the prediction performance of the three regression models SVM, RF, and PLSR, the obtained spectra of different edible oil samples were divided into calibration set and prediction set. The 70% spectra of a kind of oil were randomly split into calibration set and the rest were taken as the prediction set. The calibration set and the prediction set were used to train the model and evaluate model performance, respectively. The model performance was quantitatively evaluated using coefficients of determination of calibration set (R 2 C ) and prediction set (R 2 P ), and the root mean square errors of calibration set (RMSE C ) and prediction set (RMSE P ) as follow equations (Equations (2) and (3)): is the residual sum of squares and TSS = ∑ N i=1 (y i − y) 2 is total sum of squares. y i is the reference value of the experimental oil sample, y p is the predicted value for the oil sample, and y is mean of the reference values. N is the number of the oil samples.
The SVM, RF, and PLSR models were performed in MATLAB (Version: R2019b, Mathworks, Inc., Natick, MA, USA). The experiment computer used an Intel core i7-8700 CPU with a main frequency of 3.7 GHZ.

Vis-NIR Reflectance Spectra of Different Edible Oils
The spectra of the 93 samples including seven kinds of edible oils were measured by the constructed hyperspectral spectroradiometer system. The representative spectra of the seven kinds of edible oils were shown in Figure 2. The spectral properties of the 7 kinds of edible oils can be classified into three categories. The spectra of soybean oil, corn oil, sunflower oil, and peanut oil showed the similar trends in visible region with a gentle absorption peak (around 510 nm). The spectra of rapeseed oil and olive oil showed the multiple absorption peaks (585 nm and 631 nm) in visible region, which were markedly different with the above four edible oils. The spectra of sesame oil had no absorption peak in visible region which was unlike all other edible oils. On the whole, the spectra of the seven kinds of edible oils in NIR range (780-2500 nm) performed very similar overall trends. However, the spectra of different edible oils vary in spectral reflectance and spectral shape in some specific narrow regions. The spectral peak at wavelength 800 nm is related to O-H first stretching overtone. The spectral peaks at wavelengths 856 nm and 1098 nm belong to the C-H third overtone [35], and the spectral peak in wavelength 1586 nm is for the second overtone of N-H [36]. The peak at around wavelength 1320 nm is related to C-H combinations, and the peak at wavelength 980 nm is related to the second overtone of O-H bending [37,38].

Quantitative Determination of Fatty Acids by GC-MS
The reference values of the fatty acids in edible oils need to be accurately obtained for model establishment and model validation. In this study, the oil sample of each brand was measured by GC-MS. All the oil samples were methylated by four FAMEs. The complete data of the content of the four fatty acids measured by GC-MS was provided in Supplementary Materials (Table S2). The quantitative results of all 93 brands in seven kinds of oils were analyzed as follows.
First, the statistical analysis of quantitative results of four fatty acids were shown in Table 1 and Figure 3. For the palmitic acid, except rapeseed oil, the other six kinds of edible oils were detected with high contents. The corn oil was rich in palmitic acid (mean = 16.67%, standard deviation (sd) = 0.68) compared to other oils. The rapeseed oil had the lowest content of palmitic acid (mean = 4.93%, sd = 0.57). For the stearic acid, the contents in all oil samples were less than the palmitic acid. The average content of stearic acid in seven kinds of edible oils ranged from 1.71% to 5.97%. The standard deviations of the stearic acid content in seven kinds of edible oils were less than 0.65, which represented that the stearic acid content varied very small in all kinds of the oils. For the arachidic acid and behenic acid, the contents were relatively rare in all the seven kinds of oils. Comparatively, the peanut oil was more rich in the arachidic acid and behenic acid than other oils. The variation levels of arachidic acid and behenic acid in peanut oil were relatively larger than other edible oils (as shown in Figure 3). The standard deviations of arachidic acid and behenic acid in peanut oil were 0.25 and 0.41, respectively. Moreover, the results of Wilcoxon test of four fatty acids between the pairs of the edible oils were shown in the Supplementary Materials ( Figure S3).  In addition, the quantitative data of the four fatty acids were analyzed by PCA to give an intuitive visualization of the distribution of 93 oil samples. In the PCA analysis, the quantitative data of the four fatty acids in 93 oil samples were normalized with mean 0 and variance 1. The top two principal components were extracted for visualizing the distribution of different oil samples, as shown in Figure 4. Three kinds of edible oils (rapeseed oil, peanut oil, and sunflower oil) were in their own completely independent population. The 13 brands of rapeseed oils had a high consistency in compact category. While different brands of peanut oils showed a degree of dispersion, they had distinct distinguishing features with other oils. The sunflower oils also showed a clear independent category. Three kinds of oils (corn oil, olive oil, and sesame oil) were located on their own separate groups. However, the dispersion of 15 brands of soybean oils was relatively large and crosslinked with three oil groups (corn oil, olive oil and sesame oil) . Overall, each kind of oil shows distinct categorical features. This analysis demonstrated that the quantitative analysis of the four fatty acids is effective to distinguish different kinds of edible oils.

Prediction of Fatty Acid Contents with Full Wavelengths Reflectance Spectra
The PLSR, SVM, and RF coupled with multiple pretreatments of oil spectra were used to develop the prediction models for the quantitative analysis of the content of fatty acids in different edible oils. Each of the regression methods was combined with the four pretreatment algorithms (SNV, MSC, SG smoothing, and WT) and the raw spectra to train the models in calibration set. The top two pretreatment methods yielded the best performance in each regression model were shown in Table 2, and the parameter setting of the multivariate analysis models were provided in Table S4. As seen from Table 2, as a whole, the performance of SVM model for the four fatty acids were better than PLSR and RF model. Therefore, the SVM regression model was suitable for the quantitative analysis of the content of fatty acids. For palmitic acid, the best prediction result was obtained by SVM model constructed with spectral data preprocessed by MSC pretreatment, which had R 2 C of 0.9972, RMSE C of 0.1950, R 2 P of 0.9510, and RMSE P of 0.8136. The corresponding scatter plots of prediction performance on the calibration set and prediction set were shown in Figure 5a,b, respectively. However, the second place, the model of SVM with SNV pretreatment had the very similar performance as the model of SVM with MSC pretreatment. For the stearic acid, the regression model constructed by SVM model combined with SNV pretreatment had the best prediction effect with R 2 C and RMSE C being 0.9993 and 0.0404 on the calibration set, R 2 P and RMSE P being 0.9636 and 0.2965 on the prediction set, respectively. The prediction performance was demonstrated in Figure 5c,d. For the arachidic acid, the highest prediction accuracy was obtained using SVM model with SNV pretreatment which had R 2 C = 0.9948, RMSE C = 0.0204 (as shown in Figure 5e), R 2 P = 0.9576, RMSE P = 0.0577 (as shown in Figure 5f). In addition, the model of SVM with SNV pretreatment for the spectral data provided the optimal prediction result in behenic acid group as evidenced by R 2 C = 0.9992, RMSE C = 0.0187 (as shown in Figure 5g), R 2 P = 0.9521, RMSE P = 0.1486 (as shown in Figure 5h). The complete prediction results of all the pretreatment methods are provided in the Supplementary Materials (Table S3). Overall, the performance of the regression models constructed by SNV pretreatment were generally better than those with MSC, WT, and SG smoothing pretreatment, and the raw spectra. Therefore, the SNV pretreatment was adopted as the standard method for preprocessing the original spectral data in the subsequent analysis. It was worth noting that the constructed PLSR models were with the large number of latent variables (LVs). In our experiments, we tested different numbers of latent variables (LVs) in PLSR and chose the model with the best prediction performance. However, the large number of LVs could lead to overfitting of PLSR models. It should be very careful to choose the appropriate number of LVs in PLSR in different studies. The prediction results for the four fatty acids led by Vis-NIR reflectance spectroscopy were in a good agreement with the quantitative results obtained by GS-MS but with a very significant rapid detection in a measuring time. Thus, this study presented an effective approach for rapid detection of fatty acids of edible oils. The four fatty acid contents in edible oils were accurately quantified using the multivariate data processing and analysis methods on Vis-NIR reflectance spectroscopy. : coefficient of determination in calibration set; RMSE C : root mean square error in calibration set; R 2 P : coefficient of determination in prediction set; RMSE P : root mean square error in prediction set; PLSR: partial least squares regression; RF: random forest; SVM: support vector machine; SNV: standard normal variables; MSC: multivariate scattering correction; WT: wavelet transform.

Prediction of Fatty Acid Contents with Effective Wavelengths
A substantial proportion of redundant and irrelevant information was comprised in the high dimension of Vis-NIR reflectance spectra data. Extracting effective wavelengths could stabilize the prediction model and improve the computational efficiency. In order to eliminate the redundant information in the spectra and simplify the model to develop the real-time detection instrument for the prediction of fatty acid contents in edible oils, the SPA, VIP, and PCA algorithms were used to extract the main information and reduce the dimension of the spectra of edible oils. Based on the analytical results of the full wavelengths, the SNV pretreatment was applied for preprocessing of spectra data. The processed spectra were used to develop the regression models using RF, SVM, and PLSR. The best models with SPA, VIP, and PCA algorithms in each of the fatty acids were shown in Table 3, and the parameter setting of multivariate analysis models were provided in Table S5. The effective wavelengths selected by the SPA, VIP, and PCA were shown in Figure 6. The VIP algorithm only picked out few continuous wavelengths in visible band in all the four fatty acid experiments with very poor prediction results (as shown in Table 3), which demonstrated that the visible wavelengths were not enough to distinguish the components in edible oils. The PCA inclined to select the wavelengths in peaks and troughs of the spectra. Similarly, the effective wavelengths selected by PCA algorithm did not help to generate the good prediction results in all the four fatty acids (as shown in Table 3). It indicated that the peaks and troughs of the spectra were not good indicators for discrimination of components in edible oil based on Vis-NIR reflectance spectroscopy. In contrast, the SPA algorithm was capable to select the finite number of discrete wavelengths including visible and NIR bands as the effective wavelengths. Comparing with the effective wavelengths selected by VIP and PCA, the regression models constructed with the effective wavelengths selected by SPA yielded a significant improvement of prediction effects on the content of all four fatty acids in edible oils.    Figure 5. The results of palmitic acid on calibration set (a) and prediction set (b) using full wavelengths in the optimal regression model; the prediction results of stearic acid on calibration set (c) and prediction set (d) using full wavelengths in the optimal regression model; the prediction results of arachidic acid on calibration set (e) and prediction set (f) using full wavelengths in the optimal regression model; the prediction results of behenic acid on calibration set (g) and prediction set (h) using full wavelengths in the optimal regression model. (R 2 C : coefficient of determination in calibration set; R 2 P : coefficient of determination in prediction set; RMSE C : root mean square error in calibration set; RMSE P : root mean square error in prediction set).   C and RMSE C were 0.9868 and 0.0732, respectively. The R 2 P and RMSE P were 0.9214 and 0.1630, respectively. The prediction results were shown in Figure 7g,h. In summary, the prediction results of the simplified regression model constructed by the effective wavelengths proved that the characteristic wavelength selection method SPA was effective to construct a useful model for the rapid quantitative analysis of four fatty acids in edible oils. This result is in accordance with the previous research findings [39]. In addition, the simplified regression models can also be used to design a fast and portable Vis-NIR reflectance spectroscopy system to detect the fatty acid contents in edible oils.   Figure 7. The results of palmitic acid on calibration set (a) and prediction set (b) using the selected wavelengths in the optimal regression model; the results of stearic acid on calibration set (c) and prediction set (d) using the selected wavelengths in the optimal regression model; the results of arachidic acid on calibration set (e) and prediction set (f) using the selected wavelengths in the optimal regression model; the results of behenic acid on calibration set (g) and prediction set (h) using the selected wavelengths in the optimal regression model. (R 2 C : coefficient of determination in calibration set; R 2 P : coefficient of determination in prediction set; RMSE C : root mean square error in calibration set; RMSE P : root mean square error in prediction set)

Conclusions
In this study, an efficient and non-destructive method using Vis-NIR spectroscopy was presented for fatty acids identification of edible oils. The GC-MS was used to determine the content of palmitic acid, stearic acid, arachidic acid and behenic acid in 93 brands of edible oils as the reference values. The Vis-NIR spectroscopy of the 93 oil samples in different brands were measured by a constructed hyperspectral spectroradiometer system for model establishment and prediction. Overall, the prediction results showed that the SVM regression model with SNV pretreatment on full wavelengths had the best predictive effect on the four fatty acids. For the prediction of palmitic acid, the model of SVM with SNV pretreatment had R 2 P = 0.9504 and RMSE P = 0.8181, the best model for stearic acid was SVM with SNV pretreatment which had R 2 P = 0.9636 and RMSE P = 0.2965, the best model for arachidic acid was SVM with SNV pretreatment which had R 2 P = 0.9576 and RMSE P = 0.0577, and the best model for behenic acid was SVM with SNV pretreatment which had R 2 P = 0.9521 and RMSE P = 0.1486. In addition, three algorithms SPA, VIP, and PCA were evaluated systematically for the effectiveness of constructing a simplified but stabilized model by selecting the effective wavelengths. The VIP algorithm was only capable to find the continuous wavelengths in visible region. The PCA algorithm was also not an ideal algorithm for effective wavelengths screening in the Vis-NIR spectra of edible oils. In contrast, the SPA algorithm extracted the effective wavelengths distributed in visible and NIR bands. The regression model constructed with the effective wavelengths screened by SPA had ideal performance in the prediction of all the four fatty acids. Most of the results based on the effective wavelengths expressed a certain degree of degradation comparing with the results from full wavelengths. Even so, the loss of prediction accuracy was small and tolerable. Accordingly, the Vis-NIR spectroscopy with multivariate methods led to the accurate and rapid detection of fatty acid contents in edible oils. The simplified regression models constructed by the effective wavelengths are benefit for facilitating more faster and convenient analysis in the fatty acids detection of edible oils. However, the potential cause of the visible spectrum contributing to the prediction of fatty acid contents was not discussed in this paper and should be further studied in future.

Supplementary Materials:
The following are available at https://www.mdpi.com/article/10.3390/ bios11080261/s1, Figure S1: The workflow of the quantitative analysis for fatty acids in oil samples by GC-MS, Figure S2: The representative chromatograms of four fatty acids composition of sesame oil, Figure S3: The statistical tests of four fatty acids in different edible oils, Table S1: Qualitative analysis of four FAMEs by GC-MS, Table S2: The quantitative results of four fatty acids in 93 brands of edible oils by GC-MS, Table S3: Prediction results of four fatty acids in edible oils obtained using full wavelengths, Table S4: Parameter setting of multivariate analysis methods using the full wavelengths, Table S5: Parameter setting of multivariate analysis methods using the effective wavelengths.