3.1. Changes in Phenolic Compounds Content by HPLC Across Various Olive Oil Cultivars, Hydroxytyrosol Supplementation, and Deep-Frying
As shown in the results presented in
Table 3, the changes in phenolic compounds across nine different EVOO cultivars—Picual, Cornicabra, Empeltre, Arbequina, Hojiblanca, Manzanilla Cacereña, Royuela/Arróniz, Koroneiki, and Arbosana—were studied. Additionally, one EVOO mixed with refined olive oil (1° acidity), one pomace olive oil mixed with EVOO, and one virgin olive oil mixed with refined olive oil (0.4° acidity) were included, along with hydroxytyrosol supplementation and deep-frying treatments.
Regarding the cultivars, there were clear variations in hydroxytyrosol content across the different cultivars. Royuela showed the highest level (21.43 mg/kg), while Manzanilla showed the lowest (7.28 mg/kg). For tyrosol, Hojiblanca had the highest value (14.86 mg/kg), whereas Royuela showed the lowest (5.87 mg/kg).
As presented in
Table 2, the results of the full factorial experimental design demonstrate clear quantitative differences in phenolic compound concentrations among the differently treated samples. Deep-frying significantly affected the levels of specific phenols, with a general trend of degradation observed. Moreover, distinct EVOO cultivars showed notable variations in their phenolic profiles. For example, original samples from Royuela, Arbosana, and Empeltre cultivars showed high total phenolic contents of 400.63, 393.00, and 337.00 mg/kg, respectively, whereas refined samples such as Orujo and 0.4° olive oil contained significantly lower levels, at 3.89 and 26.50 mg/kg, respectively, confirming both the impact of thermal processing, refining, and cultivar differences.
These trends were also observed in all phenolic fractions in EVOO cultivars before deep frying, probably due to differences in environmental conditions, agricultural practices, and the specific cultivar characteristics, which significantly affect phenolic content in terms of both quantity and antioxidant quality.
From the same table, it can be observed that exogenous hydroxytyrosol supplementation dramatically increased the phenolic content in the supplemented oils and Control 2 samples, as well as in the deep-fried samples supplemented with hydroxytyrosol extract, especially increasing the content of hydroxytyrosol and tyrosol. Moreover, HTyr and Tyr decreased gradually with the progress of deep-frying. This extract also played a significant role in improving the stability and antioxidant potential of olive oil before and after deep frying.
3.2. NIR Spectra Interpretation of Oil Samples
In this study, 396 spectra from 132 different olive oil types were recorded using NIR spectroscopy to quantify phenolic compounds, covering hydroxytyrosol supplementation and deep-frying as well as non-fried and non-supplemented oils. The findings included prominent peaks at 1204, 1208, 1210, 1212, 1214, and 1216 nm, which are useful for evaluating quality parameters, including free fatty acid content. Additionally, significant absorption at 1388, 1390, 1392, 1394, and 1396 nm was observed; these spectral regions are linked to O-H combination bands and are valuable for quality assessment (
Figure 3).
The spectral range of 1350–1570 nm proved particularly effective for distinguishing different olive oils, aiding in authentication and quality control by identifying original olive oil and detecting primary oxidation compounds. Higher absorbance was recorded at 1408, 1410, 1412, 1414, 1416, and 1418 nm, with particularly strong signals at 1414 and 1416 nm, corresponding to overtones of C-H and O-H bonds (
Figure 3).
Furthermore, the NIR spectra of various olive oils exhibited significant absorption around 1724 nm due to the first overtone of C-H vibrations. Similarly, absorption at 1760 nm was linked to the first overtone of C-H vibrations and lipid oxidation, enabling the detection of primary oxidation products. Higher absorbance was also detected at 1860 and 1862 nm, particularly at 1860 nm. Additionally, strong absorbance was noted at 1890, 1892, 1894, 1896, 1898, 1900, 1902, and 1904 nm, with a prominent peak at 1900 nm, indicating oxidation and degradation.
High absorbance around 2144 nm corresponded to C=O stretching (carbonyl compounds), indicating the formation of aldehydes and ketones from lipid degradation. Similarly, a distinct absorption peak at approximately 2226 nm was linked to O-H and C-H combination bands, making it valuable for assessing hydrolysis and secondary oxidation. Significant absorption at 2250, 2252, 2254, and particularly at 2256 nm varied among the samples and was useful for distinguishing oxidation stability while also associated with O-H and C-H combination bands.
The 1700–2500 nm region was strongly correlated with lipid oxidation and hydrolytic degradation (
Figure 1). NIR devices also demonstrated effectiveness in classifying EVOO based on spectral variations. These findings confirm the broad applicability of NIR spectroscopy for ensuring the authenticity, freshness, and overall quality of EVOO [
42,
43,
44].
These findings highlight the importance of selecting stable olive oils for frying and employing NIR spectroscopy for real-time monitoring to ensure the safety and quality of fried extra virgin olive oils, virgin olive oils, and their blends. NIR not only serves as a valuable tool for tracking EVOO oxidation during deep frying, as specific wavelength regions correspond to compounds formed during oil degradation, but it can also be used for rapid quantification of phenolic compounds in olive oils instead of conventional methods like Folin–Ciocalteu and HPLC approaches. This enables effective assessment of oil quality and safety. As shown in
Table 4, these critical wavelengths play a key role in monitoring oil degradation and quality indices. Oxidation results in the formation of peroxides, aldehydes, and other degradation products that significantly impact the oil’s quality, taste, and safety during deep frying.
3.3. SELECT-OLS Models for Quantification of Phenolic Compounds
The findings reveal that the SELECT approach identified only 12 key infrared (IR) wavelengths out of 700 for quantifying hydroxytyrosol (HTyr) across various olive oil categories (
Table 5). Among these, 1962 nm ranked first, followed by 1856 nm, which ranked second. The 1962 nm wavelength exhibited strong absorbance in most EVOO varieties supplemented with HTyr and in non-fried olive oil. Comparing its absorption intensity with pure EVOO spectra can help detect dilution with olive fruit extract-supplemented oils. This suggests that NIR and SELECT can effectively distinguish between supplemented/non-fried olive oil and deep-fried samples.
EVOO samples subjected to continuous deep frying at 210 °C for 6 h showed significantly lower HTyr levels than those fried at 170 °C for 3 h. Higher HTyr content was observed in HTyr-supplemented samples, emphasizing the protective effect of lower frying temperatures against oxidation [
51]. Moreover, HTyr in supplemented oils maintained a more stable phenolic composition during prolonged high-temperature frying than other phenolic fractions.
The SELECT algorithm identified distinct wavelengths, highlighting that spectral variations and oxidation features depended on EVOO variety, supplementation, frying conditions (temperature/duration), and HTyr stability. The optimization process achieved remarkable data compression—selecting only 12 wavelengths from 700 predictors—to model HTyr content in EVOO, refined VOO, and mixed EVOO, whether supplemented or not or deep-fried or not.
The high reliability and robustness of the resulting OLS models underscore the effectiveness of SELECT-OLS for feature selection and correction. The progressive reduction in residual variance, tracked from the initial stage through successive decorrelation cycles, led to an optimal model with negligible residual variance (
Figure 4A–D), using hydroxytyrosol, tyrosol, caffeic acid, and oleocanthal as response variables, respectively. The SELECT algorithms output the retained original variables, demonstrating their efficiency in refining predictive models [
36,
37].
Regarding tyrosol, the findings indicate that the SELECT approach identified 14 key spectral variables out of 700 for predicting tyrosol content across various olive oil categories, considering hydroxytyrosol supplementation and different deep-frying conditions (
Table 6). Among these, 1962 nm was most significant, followed by 1856 nm. The results show that the 1962 nm wavelength exhibited high absorbance in most EVOO varieties supplemented with HTyr and in blends of original olive oil with HTyr-supplemented olive oil. This suggests that NIR and SELECT can effectively distinguish between supplemented and non-supplemented olive oils, as well as between non-fried and deep-fried samples.
Furthermore, EVOO samples subjected to continuous deep frying at 210 °C for 6 h exhibited significantly lower tyrosol levels compared with those fried at 170 °C for 3 h, indicating the impact of frying temperature and duration on tyrosol degradation. A similar trend was observed in HTyr-supplemented samples, highlighting the protective effect of HTyr extract in preserving tyrosol content in olive oil. Additionally, pomace olive oil and olive oil with 0.4° acidity contained zero tyrosol, indicating their low phenolic compound content compared with EVOO varieties.
In addition, the SELECT approach identified 15 significant wavelengths out of 700 for quantifying caffeic acid (
Table 7), with the 1968 nm wavelength selected as the first-order variable. The 1968 nm wavelength is associated with overtones and combination bands of molecular vibrations, primarily related to C-H, O-H, and aromatic functional groups found in phenolic compounds, including caffeic acid in olive oil. Caffeic acid, a key phenolic compound, contributes to the antioxidant properties, bitterness, and stability of olive oil [
52,
53]. The 1968 nm absorption can provide insights into phenolic content, oxidation status, and potential degradation due to thermal processing or prolonged storage. Additionally, it helps differentiate natural phenolic profiles across various EVOO varieties and can be used to assess adulteration or dilution with lower-quality oils. Comparing 1968 nm spectral data with reference EVOO spectra enables quality control, authenticity verification, and monitoring of phenolic stability under different processing and storage conditions.
The SELECT procedure efficiently narrowed down the relevant variables, selecting only a subset of wavelengths from the full NIR spectral range. The final model, which uses just these 15 wavelengths, has been optimized for prediction while avoiding overfitting and unnecessary complexity. This procedure provides a more streamlined model with better predictive performance than using the full 700-variable NIR dataset. By focusing on a relatively small number of key wavelengths, the model becomes both efficient and effective for predicting caffeic acid content in various olive oil categories. This method allows rapid, non-destructive analysis of olive oil quality and can be easily implemented in quality control and certification processes within the olive oil industry.
Secoiridoids, though rare in most plants, are abundant in
Olea europaea leaves and fruits. However, due to their oil insolubility, only a small portion transfer to EVOO during extraction [
54]. Despite this, they are key micronutrients, contributing to EVOO’s sensory properties and health benefits. The most common secoiridoids in EVOO include oleuropein and ligstroside aglycones [
7,
54,
55].
The SELECT approach identified 30 significant wavelengths out of 700 for quantifying decarboxymethyl ligstroside aglycone in dialdehyde form (oleocanthal) (
Table 8), with the 2084 nm having the first order of selection, followed by 1220 nm. In the first order of selection, the results revealed that lower-quality olive oils, such as pomace olive oil, olive oil with °1 and °0.4 acidity, as well as Hojiblanca, exhibited higher absorbance at 2084 nm, indicating increased oxidation and lower stability compared with EVOO varieties like Manzanilla, Picual, Koroneiki, Arbosana, and Royuela. This suggests that EVOOs may differ in stability due to their higher phenolic content, which contributes to both oxidative resistance and unique sensorial attributes, as reported earlier by Mehany et al. [
40].
The wavelength around 2084 nm is also associated with the –COOR and C–H stretching vibrations, along with C=O stretching, which are sensitive to oxidative changes and degradation in olive oil [
20,
46]. This further reinforces the role of 2084 nm in detecting oxidative stress and quality degradation, which is particularly relevant when monitoring the stability of olive oils under varying processing conditions.
By incorporating this wavelength into the chemometric model, the SELECT-OLS method enables reliable prediction of oleocanthal in various olive oil types, providing an efficient and accurate way to monitor this compound under different processing conditions, such as supplemented and non-supplemented oils and deep-fried versus non-fried samples. This demonstrates the potential of NIR spectroscopy combined with variable selection for quantifying specific phenolic compounds like oleocanthal in olive oils.
Moreover, the current results demonstrate that the SELECT approach successfully identified 30 spectral markers out of 700 variables for quantifying oleacein content in various olive oil categories (
Table 9). The first-order selection was at 2006 nm, followed by 2216 nm. The results indicated that lower-quality olive oils, such as pomace olive oil, olive oils with °1 and °0.4, and Hojiblanca, exhibited higher absorbance at 2006 nm, suggesting increased oxidation and lower stability compared with EVOO varieties like Cornicabra, Picual, Manzanilla, Arbequina, and Royuela. These findings imply that EVOOs may vary in stability due to their higher phenolic content, which contributes to both oxidative resistance and distinctive sensorial attributes. Indeed, oleacein, a key secoiridoid compound in EVOO, has been shown to enhance mitochondrial function by increasing mitochondrial mass, DNA content, respiration, and ATP production in colorectal cancer cells. It triggers a protective cellular response involving antioxidant pathways mediated by AMPK, NRF2, and PGC-1α. Oleacein acts as a partial agonist of PPARγ, a receptor involved in regulating mitochondrial metabolism. Its beneficial effects on mitochondrial pathways and antioxidant defense are significantly mediated through PPARγ activation [
56].
The wavelength around 2006 nm is associated with –COOR and C–H stretching vibrations, as well as C=O stretching, which are sensitive to oxidative changes and degradation in olive oil. Additionally, the same trend of high oxidation was observed in oil samples fried at a high temperature (210 °C) for an extended time (6 h). This reinforces the role of 2006 nm in detecting oxidative stress and quality degradation, which is particularly relevant for monitoring the stability of olive oils under different processing conditions.
Moreover, the continuous reduction in residual variance, observed from the initial stage before variable selection through each decorrelation cycle until the optimal model complexity was reached, further validates the method’s effectiveness in minimizing residual variance (
Figure 5A–D), using oleacein, homovanillic acid, pinoresinol, and OAOAH as response variables, respectively.
Additionally, SELECT identified 21 key variables out of 700 wavelengths for quantifying homovanillic acid across various olive oil categories (
Table 10). Among these, the 1962 nm wavelength had the first order of selection, followed by 1862 nm. The 1962 nm wavelength in NIR is significant due to its association with the O–H (hydroxyl) group and C=O (carbonyl) stretching vibrations, which are sensitive to secondary oxidation products such as aldehydes and ketones. These functional groups are key indicators of oxidative degradation in olive oil, and their presence can be used for assessing the oil’s quality [
45]. Thus, the 1962 nm wavelength plays a critical role in monitoring secondary oxidation, especially in detecting aldehydes and ketones that form during the oxidative process. These oxidation products contribute to the rancidity and degradation of olive oils, making this wavelength valuable for quality control.
The 1962 nm wavelength showed strong absorbance, especially in HTyr-supplemented and non-fried EVOO, making it effective for distinguishing between pure and supplemented oils. Variations in absorbance at this wavelength can indicate dilution effects from mixing non-virgin oils with fruit extract, which may affect quality. Comparing absorption intensities across pure, supplemented, fried, and non-fried samples confirmed the utility of NIR spectroscopy combined with SELECT for differentiating treatment conditions. This wavelength is also sensitive to oxidation products, aiding in the detection of oxidative degradation. Most EVOO varieties, including Picual, Cornicabra, Koroneiki, Royuela, Arbequina, and Manzanilla, showed lower oxidation values, reflecting higher stability that is likely to have been due to their rich phenolic profiles and distinct sensory characteristics [
40].
Regarding pinoresinol, the SELECT method identified 30 key wavelengths out of 700 for its quantification under varying supplementation and frying conditions. As detailed in
Table 11, the most significant wavelengths were 1932 nm and 1922 nm. The 1932 nm wavelength played a dominant role, presumably due to its sensitivity to molecular vibrations associated with phenolic compounds like pinoresinol. The 1922 nm wavelength also contributed by capturing complementary spectral features, refining the model’s accuracy. These wavelengths, along with the others selected, form the foundation of the SELECT-OLS model for pinoresinol prediction, demonstrating high accuracy with minimal input variables. The method’s efficiency is further validated by the consistent decline in residual variance during variable selection and decorrelation cycles, confirming the model’s robustness and utility for precise olive oil quality assessment. In addition, our quantitative analysis (
Table 3) confirmed that pinoresinol content was significantly higher in the EVOO compared with the refined and blended olive oils. For instance, pomace (orujo) oil contained no detectable pinoresinol, while olive oil 0.4° showed 2.58 mg/kg. These results align with previous findings by Cecchi et al. [
57], who reported that pinoresinol is relatively stable and less susceptible to degradation during the refining process.
From the 700 NIR-recorded variables, the SELECT method identified 30 key wavelengths for quantifying OAOAH content in olive oils under different supplementation and frying conditions (
Table 12). The most critical wavelength was 1854 nm, followed by 2028 nm. The 1854 nm wavelength is particularly significant, reflecting molecular interactions linked to oxidative degradation and phenolic content. Its high absorbance in the HTyr-supplemented oils suggests that it detected externally added antioxidants rather than those intrinsic to EVOO. This confirms the capability of NIR combined with SELECT to differentiate between supplemented and non-supplemented oils. The efficiency of this approach is further supported by the continuous reduction in residual variance through the decorrelation cycle. The SELECT-OLS model, using only the most relevant variables, delivers high predictive accuracy while remaining cost-effective and suitable for routine quality assessment. This method is especially valuable for predicting OAOAH levels across extra virgin, virgin, and refined olive oils, serving as a reliable marker of oxidative stability and the oil’s authenticity. Its rapid, non-destructive nature supports online and in-process quality control, with minimal preparation and high accuracy.
Furthermore, from the 700 variables recorded in the NIR system, the SELECT method identified 30 key variables for quantifying LAOAH in olive oils under different supplementation and frying conditions (
Table 13). The results showed that 1860 nm was the most significant wavelength, followed by 1110 nm. Particularly high absorbance at 1860 nm was observed in olive oils supplemented with HTyr, suggesting that the extract was added during processing rather than being part of the original EVOO content. This underscores NIR and SELECT’s ability to differentiate between oils supplemented with external antioxidants and those that are not. The method’s efficiency is further validated by the continuous reduction in residual variance throughout the decorrelation cycle. This consistent decrease, observed from the initial stage before variable selection through each successive cycle until optimal model complexity was achieved, demonstrates the method’s effectiveness in minimizing residual variance (
Figure 6A,B), using LAOAH and TPC as response variables, respectively.
In the quantification of total phenolic content, the SELECT method identified 18 key wavelengths from the 700 NIR-recorded variables across various olive oil supplementation and frying conditions (
Table 14). The most significant wavelength was 1518 nm, followed by 2016 nm. The 1518 nm band is linked to C-H stretching vibrations in lipids and water and is commonly used in food oil analysis. In olive oil, this wavelength provides information about fatty acid composition and oxidative degradation, particularly the presence of oxidized lipids such as aldehydes and ketones. It is also sensitive to hydrogen bonding and can detect changes due to storage or heat exposure. High absorbance at 1518 nm was observed in HTyr-supplemented oils, reflecting their elevated phenolic content and allowing clear differentiation from pure EVOO. This highlights the effectiveness of this wavelength in identifying supplementation effects and oxidative changes. Moreover, pomace olive oil and Hojiblanca, when fried at high temperatures for extended durations, showed the highest levels of aldehydes and ketones, confirming their lower oxidative stability. In contrast, EVOOs, especially those with HTyr supplementation, maintained better stability due to their natural antioxidant content.
3.4. SELECT-OLS Models Validation Assessment
The statistical characteristics of the developed NIR and SELCT-OLS models for quantifying phenolic compounds in various olive oils, including HTyr supplementation and deep frying, are illustrated in
Table 15. The standard deviation of the error (SDE) reflects the average deviation between observed and predicted values in the model, with a lower SDE indicating better model fit and more accurate predictions. For instance, hydroxytyrosol has an SDE of 19.51, attributed to the high HTyr content in the analyzed samples, particularly in olive oils supplemented with olive fruit extract, a rich source of HTyr and its derivatives. In contrast, compounds like homovanillic acid show better precision with an SDE of 3.89. The mean absolute error (MAE), which measures the average magnitude of prediction errors, similarly indicates better accuracy when smaller. Caffeic acid, with the smallest MAE of 3.19, demonstrated more accurate predictions compared with compounds like LAOAH (MAE = 69.87). This difference is likely to have been due to variations in the content of phenolic compounds and their dynamics under deep-frying conditions. The multiple correlation coefficient (R) assesses the strength and direction of the linear relationship between observed and predicted values. An R value close to 1 indicates a strong correlation; the results for most of the compounds, including hydroxytyrosol, tyrosol, and caffeic acid (R = 0.98), demonstrated excellent prediction accuracy. Pinoresinol, with a slightly lower R value of 0.91, still demonstrated a good prediction model. Leave-one-out residual variance (LOORV) measured the model’s stability when one data point was left out at a time, with lower values indicating better generalizability. Caffeic acid had the lowest LOORV of 4.57%, indicating high stability. Similarly, the leave-one-out residual standard deviation (LOORSD), measured in standard deviation units, for caffeic acid was relatively low at 4.87%, suggesting consistent predictions. The leave-one-out explained variance (LOOEV) shows how much of the data’s total variance the model explains, with most compounds showing high values (above 79%) and hydroxytyrosol reaching 95%. The leave-one-out mean prediction error (LOOMPE) measured the average error when one data point was excluded, with caffeic acid again performing best with a LOOMPE of 3.64. In conclusion, most phenolic compounds were predicted excellently by the models, as evidenced by their high correlation coefficients (R = 0.91–0.98) and low prediction errors. These compounds were quantified with high precision and low residual variance, making the models ideal for reliable quantification of olive oil. Total phenolic content (TPC), with an R value of 0.96, showed a high explained variance (90.14%) and low mean prediction error, suggesting a strong and accurate model. Overall, the models for quantifying phenolic compounds in olive oils are robust, particularly for hydroxytyrosol, tyrosol, and caffeic acid, for which these models are highly reliable.
Overall, the results demonstrate the robustness and flexibility of SELECT-OLS as an effective feature selection and correction method for quantifying phenolic content across various olive oil types, including both fried and non-fried oils, and those supplemented with exogenous phenolic compounds or without supplementation. By systematically reducing dimensionality while preserving high predictive performance, SELECT-OLS optimizes model complexity, minimizes residual variance, and improves the accuracy of calibration models for phenolic substances in complex oil matrices. The consistency observed across different phenolic compounds further highlights the reliability of SELECT-OLS for spectral data analysis and quantitative modeling.
The goal of this study was achieved by developing regression models to quantify the phenolic content of different olive oil samples based on measured spectral data. In SELECT-OLS, predictions are made by transforming inter-correlated variables into a set of independent factors, known as latent variables (LVs), which capture the maximum covariance between spectral data and response variables such as HTyr, Tyr, caffeic acid, oleocanthal, oleacein, homovanillic acid, pinoresinol, OAOAH, LAOAH, and TPC. Each SELECT-OLS model’s LVs are statistically independent (uncorrelated) and contain all relevant information necessary for stable predictions. As noted in recent studies, only the first few LVs account for the majority of variation in the original variables, while the remaining LVs primarily represent random noise or linear dependencies [
58,
59]. This also aids in understanding multivariate data analytics algorithms [
60,
61].