Next Article in Journal
Fabricating Zein-OSA Starch Complexes as Multifunctional Carriers for Carrot Oil
Previous Article in Journal
Microbiological Assessment and Production of Ochratoxin A by Fungi Isolated from Brazilian Dry-Cured Loin (Socol)
Previous Article in Special Issue
Artificial Intelligence-Driven Food Safety: Decoding Gut Microbiota-Mediated Health Effects of Non-Microbial Contaminants
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Adulteration Detection of Multi-Species Vegetable Oils in Camellia Oil Using SICRIT-HRMS and Machine Learning Methods

1
Ganzhou General Inspection and Testing Institute, China National Quality and Inspection Center for Se-Rich and Camellia Oleifera Products (Jiangxi), Ganzhou 341000, China
2
School of Public Health and Health Management, Gannan Medical University, Ganzhou 341000, China
3
Key Laboratory of Development and Utilization of Gannan Characteristic Food Function Component of Ganzhou, Gannan Medical University, Ganzhou 341000, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Foods 2026, 15(3), 434; https://doi.org/10.3390/foods15030434 (registering DOI)
Submission received: 16 December 2025 / Revised: 15 January 2026 / Accepted: 22 January 2026 / Published: 24 January 2026

Abstract

We aimed to establish a rapid and precise method for identifying and quantifying multi-species vegetable oil (corn oil, olive oil (OLO), soybean oil, and sunflower oil (SUO)) adulterations in camellia oil (CAO), using soft ionization by chemical reaction in transfer–high-resolution mass spectrometry (SICRIT-HRMS) and machine learning methods. The results showed that SICRIT-HRMS could effectively characterize the volatile profiles of pure and adulterated CAO samples, including binary, ternary, quaternary, and quinary adulteration systems. The low m/z region (especially 100–300) exhibited importance to oil classification in multiple feature-selection methods. For qualitative detection, binary classification models based on convolutional neural networks (CNN), Random Forest (RF), and gradient boosting trees (GBT) algorithms showed high accuracies (98.70–100.00%) for identifying CAO adulteration under no dimensionality reduction (NON), principal component analysis (PCA), and uniform manifold approximation and projection (UMAP) strategies. The RF algorithm exhibited relatively high accuracy (96.25–99.45%) in multiclass classification. Moreover, the five models, including CNN, RF, support vector machines (SVM), logistic regression (LR), and GBT, exhibited different performances in distinguishing pure and adulterated CAO. Among 1093 blind oil samples, under NON, PCA, and UMAP: 10, 5, and 67 samples were misclassified by CNN model; 6, 7, and 41 samples were misclassified by RF model; 8, 9, and 82 samples were misclassified by SVM model; 17, 18, and 78 samples were misclassified by LR model; 7, 9, and 43 samples were misclassified by GBT model. For quantitative prediction, the PCA-CNN model performed optimally in predicting adulteration levels in CAO, especially with respect to OLO and SUO, exhibiting a high coefficient of determination for calibration ( R C 2 , 0.9664–0.9974) and coefficient of determination for prediction ( R p 2 , 0.9599–0.9963) values, low root mean square error of calibration (RMSEC, 0.9–5.3%) and root mean square error of prediction (RMSEP, 1.1–5.8%) values, and RPD (5.0–16.3) values greater than 3.0. These results indicate that SICRIT-HRMS combined with machine learning can rapidly and accurately identify and quantify multi-species vegetable oil adulterations in CAO, which provides a reference for developing non-targeted and high-throughput detection methods in edible oil authenticity.

Graphical Abstract

1. Introduction

Camellia oil (CAO), extracted from the seeds of Camellia oleifera Abel, is rich in unsaturated fatty acids, with a content as high as 85–97% [1]. It is often termed “eastern olive oil” as its fatty acid composition is similar to that of olive oil [2]. It is a high-quality edible oil with high medicinal and nutritional value, as it also contains a multitude of bioactive components such as sterol, squalene, polyphenols, sasanquasaponin, tocopherols, γ-tocopherol, and so on [1]. It has a variety of pharmacological effects and biological activities, such as hypolipidemic, neuroprotective, skin repair, anti-obesity, antioxidant, anti-inflammatory, antimicrobial, hepatoprotective, and anti-tumor activities [3]. These properties make CAO popular among consumers and give it a higher market price than other vegetable oils. To make it more economically valuable, CAO is often adulterated with cheaper vegetable oils, such as corn oil (COO), sunflower oil (SUO), soybean oil (SOO), and olive oil (OLO) [4]. This practice undermines CAO’s unique quality and nutritional value, and can even destabilize local CAO market economies. Therefore, it is crucial to detect and prevent such adulterations to guarantee the orderly development of the CAO industry and market health.
Various analytical methods have been designed and used to detect CAO authentication, most of which are chromatographic [5,6,7,8] and spectroscopic [2,9,10,11,12,13,14,15] techniques and their combinations [16]. In terms of chromatographic techniques, characteristic volatile component gas chromatography–mass spectrometry (GC-MS) fingerprints combined with chemometrics have successfully been used for CAO grading adulteration detection [8]. Detection of CAO adulteration with palm superolein, refined OLO, high oleic-SUO, SUO, COO, rice bran oil, rice oil, peanut oil, sesame oil, SOO, and rapeseed oil has been performed using chemometrics based on fatty acid GC fingerprints and phytosterol GC-MS fingerprints [5]. Multiplex camellia oil adulteration with COO, SUO, SOO, and rapeseed oil has been identified and quantified based on 11 characteristic lipids using ultra-performance liquid chromatography (UPLC)-Q-Orbitrap-MS [6]. Multiplex CAO adulteration with COO, SUO, SOO, and peanut oil has been identified based on lipidomic fingerprints using laser-assisted rapid evaporative ionization mass spectrometry [7]. Regarding spectroscopic techniques, proton nuclear magnetic resonance [2], near-infrared spectroscopy [9,11,15,17,18], Fourier-transform infrared spectroscopy [12], Raman spectroscopy [10], Surface-enhanced Raman spectroscopy [14], and excitation–emission matrix fluorescence spectroscopy [13] have been used to detect CAO adulteration with COO, SUO, SOO, OLO, rapeseed oil, peanut oil, sesame oil, palm oil, rice oil, linseed oil, and/or maize oil. Although widely used for detecting CAO adulteration, chromatographic and spectroscopic techniques have limitations. For instance, chromatographic techniques have been reported to be tedious and destructive, and they often require complicated sample pretreatment [2,4]. Spectroscopic techniques are expensive and require time-consuming data analysis, generate only spatial information, require a reference technique for the calibration of equipment in every instance, cannot identify the individual chemical properties of the adulterants present, etc. [4,19]. Therefore, further efforts to discover novel techniques for CAO adulteration detection are needed. Yet, current studies mainly focus on binary adulteration detection (CAO and one other oil), and very few methods for detecting multi-species vegetable oil adulterations in CAO have been developed. Since it is more difficult to achieve precise and effective results with multi-species vegetable oil detection relative to binary adulteration detection [15], establishing novel methods for identifying and quantifying the adulteration of multi-species vegetable oils in CAO is of great significance for guaranteeing the orderly development of the CAO industry and market health.
Ambient ionization mass spectrometry (AIMS) has attracted significant attention in the field of food analysis due to its remarkable ability to analyze solid, liquid, and gas samples with simplicity and high throughput [20]. AIMS has been employed in determining food adulteration because it allows for fast and convenient analysis [21]. For example, AIMS with principal component analysis has been employed for the rapid characterization and classification of edible oils, in which non-edible and gutter oils could be clearly distinguished from edible oils [22]. Soft ionization by chemical reaction in transfer (SICRIT) is an ambient ionization technique based on flowing dielectric barrier discharge. It enables direct, high-sensitivity, and wide-coverage mass spectrometric detection of volatile organic compounds and semi-volatile substances without the need for sample pretreatment or the use of solvents or auxiliary gases. High-resolution mass spectrometry (HRMS) is a reliable tool for determining food authenticity, as HRMS detectors offer advantages in terms of their resolving power, sensitivity, robustness, extended dynamic range, easier mass calibration, and tandem mass capabilities [23]. HRMS has been used to detect whether OLO has been adulterated with SUO, SOO, hazelnut oil, and avocado oil [24], as well as with soft-refined oils [25]. Furthermore, the SICRIT-HRMS method might be a greener and faster alternative for food quality assessment relative to conventional chromatographic techniques [26]. Moreover, it has successfully been used for detailed chemical analysis of a fully formulated oil [27]. Therefore, based on the above background, the SICRIT-HRMS method might be a promising new approach for identifying CAO adulteration.
CAO adulteration (especially multi-component adulteration) detection usually involves a time-consuming data-processing stage. Machine learning, an advanced data analysis technique, provides a promising strategy for the automated data analysis of perplexing relationships between various samples [7]. Combinations of machine learning and spectroscopic techniques, including Raman spectroscopy [10], Surface-enhanced Raman spectroscopy [14], near-infrared spectroscopy [11,15,18], and laser-assisted rapid evaporative ionization mass spectrometry [7], have been used to efficiently identify various forms of CAO adulteration. Moreover, deep learning models have outperformed chemometric methods in quantitatively predicting the adulteration level of CAO [10]. Furthermore, SICRIT-HRMS and machine learning have been combined for the assessment of production waste samples from a dismantled synthetic drug laboratory [28]. Thus, combining SICRIT-HRMS and machine learning might be a suitable route for detecting CAO adulteration.
Hence, in this study, combinations of SICRIT-HRMS and machine learning methods were developed for identifying and quantifying multi-species vegetable oil (COO, OLO, SOO, and SUO) adulteration in CAO. A SICRIT-HRMS method combined with a volatile metabolomics approach was employed to capture fingerprints of CAO along with its adulterated oils. Moreover, machine learning models, including convolutional neural networks (CNN), Random Forest (RF), support vector machines (SVM), logistic regression (LR), and gradient boosting trees (GBT) models, were constructed for qualitative and classification detection of adulterated CAO. Furthermore, the CNN model was developed to quantitatively predict adulteration levels in adulterated CAO. This study establishes a non-targeted, rapid, and highly sensitive identification technology for CAO adulteration, which is expected to provide ideas and references for developing the authenticity identification method of CAO.

2. Materials and Methods

2.1. Samples and Preparation

CAO samples (n = 107), originating from the Jiangxi, Guangxi, and Hunan Provinces (three major CAO production regions in China), were obtained via hydraulic cold pressing in our laboratory or purchased from CAO production enterprises. Four other types of commercially available refined vegetable oils, including COO (n = 15, from twelve different brands), OLO (n = 15, from fourteen different brands), SOO (n = 15, from nine different brands), and SUO (n = 15, from twelve different brands), were bought from JD.com. Information regarding the samples of these five vegetable oils is listed in Table S1 (available in the Supplementary Materials).
To establish adaptable models, adulterated CAO samples with more extensive information were prepared. Specifically, thirty CAO samples were selected from among the above-mentioned CAO samples and then mixed to prepare a blended CAO sample. Meanwhile, COO, OLO, SOO, and SUO from different brands were mixed to prepare blended COO, OLO, SOO, and SUO samples, respectively. Finally, these blended CAO, COO, OLO, SOO, and SUO samples were used to prepare binary, ternary, quaternary, and quinary adulteration systems. In the binary adulteration system, the blended COO, OLO, SOO, or SUO samples were added to the blended CAO sample in percentages ranging from 3% to 90% (3%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%, v/v). In the ternary adulteration system, the blended COO, OLO, SOO, and SUO samples were first mixed in pairs at a volume ratio of 1:1, and then the mixtures were added into the blended CAO sample in the same percentages. For the quaternary adulteration system, combinations of three of the blended COO, OLO, SOO, and SUO samples were first mixed at a volume ratio of 1:1:1, and then the mixtures were added into the blended CAO sample in the same percentages. In the quinary adulteration system, the blended COO, OLO, SOO, and SUO samples were first mixed in pairs at a volume ratio of 1:1:1:1, and then the mixtures were added to the blended CAO sample at the same percentages. Each adulterated gradient had three replicates, and the volume of each adulterated CAO sample was 10 mL. In total, 165, 198, 132, and 33 adulterated CAO samples were prepared in the four adulteration systems noted above, respectively. All the samples were subjected to SICRIT-HRMS measurement.

2.2. SICRIT-HRMS Measurement

Adulterated CAO samples were detected under positive-ion mode using an SICRIT-HRMS system consisting of an SICRIT SC-30X ion source (Plasmion GmbH, Augsburg, Germany) equipped with a Thermo Scientific Orbitrap Exploris 120 MS (Waltham, MA, USA), according to the study of Basham et al. [27] with slight modifications. Samples (2.0 mL) were placed in a 5 mL centrifuge tube and incubated at 50 °C for 5 min in a water bath. Then, the centrifuge tube was placed at the inlet end of the SICRIT ion source for 5 s to collect MS data. The operating conditions of the SICRIT ion source were as follows: a collection voltage of 1.5 kV and a frequency of 15 kHz. MS acquisition was operated using Full MS/data-dependent MS2 (Full MS/dd-MS2) mode acquisition in a scan range of m/z 75–1000. The instrument’s resolution was set to 60,000 full width at half-maximum (FWHM) for full MS scans and 15,000 FWHM for MS2 scans. The MS operating conditions were the following: a spray voltage, sheath gas flow rate, and auxiliary gas flow rate of zero; a capillary temperature of 320 °C; and experiments were performed with dynamic exclusion to avoid unnecessary MS/MS information acquisition.

2.3. Data Preprocessing

Raw MS files were converted into mzXML format using MSConvert from the Proteowizard Suite (proteowizard.sourceforge.net). Subsequent data processing and machine learning model training/testing were performed in Python (version 3.6.8). A schematic workflow of the data-processing pipeline is provided in Figure 1. The raw mass spectrometry data (m/z 75–1000) were first subjected to dynamic binning across 92 predefined intervals based on characteristic lipid compounds in edible oils. Each interval, defined by a start value, an end value, and a step size (e.g., 0.01 Da in the interval 74.99–75.11), was divided into equal sub-bins according to the specified step size. Signal intensities within each sub-bin were summed, yielding a total of 666 binned features. These features were then aggregated along the time axis using five statistical parameters (maximum, mean, median, standard deviation, and sum), resulting in a feature matrix with 666 × 5 = 3330 dimensions. Although SICRIT-HRMS is an ambient ionization technique, a total ion chromatogram (TIC) was still constructed from the time-series scanning data. Eight chromatographic features were extracted from the TIC, including the total peak area, the retention time of the main peak, the signal-to-noise ratio, the number of peaks, and shape descriptors (skewness and kurtosis). The final feature vector was obtained by concatenating the time-aggregated binned features (3330) with the chromatographic features (8), resulting in a total dimension of 3338. After feature augmentation and data shuffling, the dataset was split into training and test sets (8:2 ratio).
Moreover, two distinct dimensionality reduction techniques, principal component analysis (PCA) and uniform manifold approximation and projection (UMAP), were implemented to enhance computational efficiency and optimize feature representation. The resulting reduced-dimension features were systematically stored as independent datasets, thereby establishing three comparative input configurations (original features, PCA-transformed, and UMAP-processed) to rigorously evaluate the influence of different feature representations on model performance.

2.4. Machine Learning Algorithms

CNN, RF, SVM, LR, and GBT models were run in Python (version 3.6.8), and deep model training was carried out using the Python library Keras.

2.4.1. CNN Model

CNN has emerged as a highly effective and popular tool for feature extraction and model building [29]. Convolutional layers, activation layers, pooling layers, and fully connected layers are all essential components of a CNN [30]. In this experiment, the CNN included an input layer, two convolutional layers (numbers were 64 and 256; kernel sizes were 5 and 3), channel attention and spatial attention mechanisms for feature recalibration, a global max pooling layer, and a multilayer perceptron branch with residual connections. The input layer of the CNN was a one-dimensional vector with 3338 data points. After feature extraction through dual-branch interactions and attention-based refinement, the adulterated CAO was identified through hierarchical projection in fully connected layers with softmax activation.

2.4.2. RF Model

An RF is a collection of numerous trained decision trees, generally based on a bagging algorithm [31]. The RF algorithm presents interesting properties, such as a remarkable capacity for handling mixed or greatly unbalanced datasets, flexibility with no formal assumptions regarding data structure, and the ability to address complex nonlinear systems [32]. In this experiment, features underwent preprocessing, including label encoding, data expansion, normalization, and data splitting (test_size = 0.2, random_state = 42), before the RF algorithm was used. Then, an RF model was constructed (n_estimators = 100, random_state = 42), trained, and used for prediction, and the accuracy of the RF model was calculated.

2.4.3. SVM Model

The SVM model can exhibit good performance with relatively low computational overhead, making it a valuable tool in the machine learning toolkit for spectral data analysis [33]. After undergoing label encoding, data expansion, normalization, and data splitting, an SVM model was constructed (kernel = ‘linear’, class_weight = ‘balanced’, random_state = 42) and trained, and its performance was predicted and evaluated.

2.4.4. LR Model

The LR algorithm is a fundamental supervised learning model used to identify a dependent parameter based on the independent parameters [34]. After label encoding, data expansion, normalization, and data splitting were conducted, an LR model was created, trained, saved, and used for prediction, and its performance was evaluated. Finally, some of the prediction results were output and compared with the test labels.

2.4.5. GBT Model

GBT is a method that is applied in conjunction with another machine learning algorithm, and involves two models (a “weak” learning model, usually a decision tree, and a “strong” learning model, composed of several weak learning models) [35]. GBT is a kind of synthesis algorithm combined with basis function weights, which can form a strong classifier via multiple weak classifiers through the GBT algorithm [36]. After label encoding, data expansion, normalization, and data splitting, a GBT model was constructed (n_estimators = 100, learning_rate = 0.1, max_depth = 3, random_state = 42), trained, and saved. Afterwards, this model was used for prediction, and its accuracy was calculated.

2.5. Training and Testing of Machine Learning Models

2.5.1. Qualitative Models

Five classification models of CNN, RF, SVM, LR, and GBT were constructed. The CNN employed 1D convolutional layers to extract local spectral features, along with global pooling and fully connected layers for probability output. The other four models directly processed the reduced-dimensional features. All the models were trained and evaluated on the same training and testing sets to ensure fair comparison.

2.5.2. Quantitative Models

A hybrid neural network model (Conv–Attention–MLP) was used to predict the percentage compositions of five types of oils. The model processed the reduced-dimensional features through two parallel branches: a Multilayer Perceptron (MLP) branch for learning global patterns, and a 1D convolutional branch equipped with attention mechanisms to highlight important local spectral features. Features from both branches were fused and fed into a Softmax output layer. The model was trained using the Adam optimizer with a Mean Squared Error (MSE) loss function. To prevent overfitting, an independent validation set (20% of the training data) was used for hyperparameter tuning and early cessation. The reported accuracies are based on a held-out test set (20% of the total data) that was not used during training or validation.

2.6. Classifier and Model Evaluation

2.6.1. Qualitative Model Evaluation

The performance of the classification models was evaluated using a set of standard metrics. For binary classification, four core metrics were calculated, in accordance with the study by Song et al. [7]: accuracy, recall (sensitivity), precision (positive predictive value), and F1-score (the harmonic mean of precision and recall). These were supplemented with the area under the curve (AUC) of the ROC (AUC-ROC) to evaluate model discrimination capability across all decision thresholds. In multiclass classification, class-specific metrics were computed alongside macro-averaged and micro-averaged aggregates. Multiclass AUC values were determined using a one-versus-rest (OvR) approach.

2.6.2. Quantitative Model Evaluation

All samples were used to construct quantitative models for predicting the concentration levels of CAO, COO, OLO, SOO, and SUO. Based on the study of Song et al. [7], quantitative models between the oil concentrations and spectral features were evaluated with the coefficient of determination for calibration ( R C 2 ), the root mean square error of calibration (RMSEC), the coefficient of determination for prediction   ( R p 2 ) , the root mean square error of prediction (RMSEP), and the ratio of prediction to deviation (RPD = SD/RMSEP), where SD is calculated from the prediction set. The models were trained using Adam optimization with learning rate decay and early cessation, with performance metrics recorded over 10 independent runs to ensure robustness.

3. Results and Discussion

3.1. SICRIT-HRMS Fingerprint

The representative SICRIT-HRMS fingerprints of the five pure vegetable oils (CAO, COO, OLO, SOO, and SUO) are shown in Figure 2. The volatile and semi-volatile compounds in these oils can largely be found in the relative molecular mass range of 75–700 Da in the mass spectra (Figure 2a–e). Previous studies have shown that the main volatile compounds in CAO are hydrocarbons (10.35%), alcohols (31.71%), aldehydes (44.54%), and esters (3.93%) [37]. The four other vegetable oils (COO, OLO, SOO, and SUO) have also been found to be rich in volatile compounds in varying proportions [38,39]. Although some differences were observed in the mass spectra profiles of these five pure vegetable oils (CAO, COO, OLO, SOO, and SUO), further data mining is still required to improve the classification potential of volatile compounds in them. Notably, the low m/z region (approximately 100–300) was consistently identified as highly important in subsequent feature selection and regression analyses. This region predominantly corresponds to ions from low-molecular-weight volatile and semi-volatile organic compounds, which are key contributors to the aroma, flavor, and oxidative stability profiles of edible oils [37,39]. The distinct composition of CAO—particularly its higher proportion of alcohols and aldehydes such as nonanal and octanal—compared to common adulterants such as SOO and SUO (which generate different oxidation-derived volatiles, e.g., (E,E)-2,4-decadienal) [38], likely generates differentiable spectral patterns within this m/z window. Therefore, the feature importance observed in the 100–300 region is underpinned by tangible chemical differences in volatile composition between CAO and potential adulterants, aligning with information on known authenticity markers [3,4,37,38].
Thus, in response to the demand for MS-based discrimination of adulterated CAO, a comprehensive framework for feature extraction and importance analysis was established in this study. To ensure the robustness and relevance of the extracted features, dynamic binning, temporal feature aggregation, chromatographic behavior extraction, and multi-method feature importance evaluation were integrated in this framework. After dynamic binning processing, the binning results of all intervals were concatenated to form the final feature vector. Moreover, to precisely identify adulterated CAO, MS data-based multi-level feature extraction, combined with multiple feature importance analysis techniques, was adopted to ensure the validity and reliability of the features. After feature extraction, the time-aggregated and chromatographic features were fused to form a final feature vector with a total dimension of 3338. In the feature importance analysis stage, a full-range feature acquisition strategy was adopted: the m/z range from 1 to 1100 was divided into bins with a step size of 0.01, and different importance evaluation methods were utilized for the analysis. For each feature scoring file, only the top five features with the highest scores (corresponding to the m/z values) were retained, resulting in a total of 240 feature intervals being selected. Subsequently, all the feature points were expanded and sorted, and a small window of ±0.03 around each point was employed as the feature interval. Finally, these intervals were merged to form a final bin configuration with 92 settings.
Similarly, the SICRIT-HRMS fingerprints of the binary, ternary, quaternary, and quinary adulteration systems were acquired. The representative mass spectra (VB/VA = 50%:50%, with VA and VB representing the volumes of CAO and adulterated oil, respectively) of these systems generated in positive-ion mode via SICRIT-HRMS measurement are shown in Figures S1–S4 (available in the Supplementary Materials).

3.2. Binary Qualitative Modeling for the Identification of Adulterated CAO

The binary qualitative model is effective in identifying whether CAO is adulterated, and thus suitable for large-scale preliminary identification. Analysis of variance (ANOVA), F-value selection, and mutual information ranking methods have been applied for feature selection in optimizing the classification of pork adulteration in beef [40]. Meanwhile, the Random Forest binary classification method has exhibited considerable superiority in authentication problems relative to partial least squares for discriminant analysis [32]. Thus, ANOVA F-value selection, mutual information ranking, and Random Forest importance methods were used to evaluate the m/z features for oil classification, as shown in Figure 3. ANOVA F-value selection (Figure 3a) demonstrated that features in the low m/z region (0~400) exhibited concentrated data distribution with prominent max values, indicating the corresponding features with significant differences among different types of oil might be specific molecular markers. In contrast, the high m/z region (>600) showed lower statistical significance with minimal classification contribution. Mutual information ranking (Figure 3b) revealed strong feature-class correlations in both the low m/z (<200) and high m/z (800~1000) regions, as evidenced by their elevated max values, suggesting these ranges might contain small or large molecular biomarkers, respectively. The high m/z (800~1000) could be attributed to triglycerides [22]. The intermediate region (m/z 400~600) displayed moderate mean values, indicating weaker associations. Random Forest importance (Figure 3c) showed that the max and mean values in the m/z 100~300 range were relatively high, constituting the key classification features, whilst the m/z > 500 region displayed limited discriminatory power. In general, the low m/z region (especially 100~300) exhibited importances to oil classification in the above three methods, indicating this region might be the key feature region for identifying oil. The m/z 256 mass chromatogram was found to mainly contain C2 chrysenes [41]. The typical ions m/z 190, 203, and 218 have been identified to be δ-amyrin [42]. Mass fragments of m/z 57, 69, 82, and 83 were identified as straight, branched, or cycloalkanes [43]. Furthermore, the results obtained using the three methods above showed differences, reflecting distinct analytical perspectives and the inherent characteristics of each statistical method.
Subsequently, the dataset obtained using the binary qualitative model was subjected to dimensionality reduction using PCA and UMAP methods, as shown in Figure 4a,b. PCA reduces the dimensionality of a complex dataset, extracts the most important information according to the spectral features of the tested samples, and identifies outliers in the dataset [44]. In Figure 4a, the adulterated CAO samples (red points) and pure CAO samples (blue points) revealed significant distributional overlap. The pure CAO samples are mainly distributed in regions with low principal component values, whereas the adulterated CAO samples display a stratified distribution along the principal component axes. Obviously, the PCA plot preserved key features while discarding some discriminative information. Thus, it is insufficient for the precise classification of adulterated and pure CAO samples. As PCA does not preserve the local structure, UMAP was developed as an innovative new practical strategy for dimensionality reduction that greatly preserves the local and global structure of the original data [45]. In the UMAP visual plot (Figure 4b), the distributions of the adulterated CAO samples (red points) and pure CAO samples (blue points) show clear cluster structures, and the boundaries for these two types of samples are clear. The pure CAO samples formed relatively independent and discrete clusters, whilst the adulterated CAO samples exhibited a more dispersed distribution. Notably, there are fewer overlapping regions in the UMAP plot than in the PCA plot. Therefore, the UMAP plot was effective in accentuating the feature-space differences between the adulterated and pure CAO samples, thereby allowing for the sample classifications to be intuitively distinguished.
The CNN, RF, SVM, LR, and GBT algorithms have been used to identify oil adulteration [46]. With binary classification, the performances of these five machine learning algorithms in distinguishing pure CAO and adulterated CAO under different dimensionality reduction methods (no dimensionality reduction (NON), PCA, and UMAP) were analyzed. The results are summarized in Table 1. In terms of the NON treatment, the RF, LR, and GBT algorithms achieved perfect classification with respect to the raw data, as they had accuracies of 100.00% and AUC values of 1.0000. These results indicate that the original feature-space already possessed strong discriminative power. In contrast, the SVM algorithm performed poorly with an accuracy of only 86.26%. The CNN algorithm exhibited a minor misclassification with an accuracy of 99.95%; specifically, one pure CAO sample was misclassified as an adulterated CAO sample. Regarding PCA dimensionality reduction, the CNN, RF, and GBT algorithms exhibited accuracies of 100.00%, suggesting that the PCA effectively preserved critical linear discriminative information. Moreover, after feature simplification, the computational efficiencies of these three algorithms improved, whilst their overfitting risks reduced. However, the accuracy of the LR algorithm slightly declined to 98.51%, with 18 false negatives, implying that some nonlinear, subtle features were discarded. The accuracy of the SVM algorithm notably decreased to 81.79%, revealing its sensitivity to nonlinear feature loss. For UMAP dimensionality reduction, the RF and GBT algorithms possessed accuracies of 100.00%, F1-scores of 0.9951, and AUC values of 1.0000, demonstrating their strong adaptability to nonlinear low-dimensional spaces. The performance of the CNN algorithm was a little worse than that of the RF and GBT algorithms, which had 98.70% accuracy and a recall value of 98.21%. This finding can be explained by the convolutional layers’ localized focus on manifold features. In contrast, the accuracies of the LR (81.51%) and SVM (52.21%) algorithms significantly decreased. These two algorithms are greatly dependent on global linear decision boundaries and incompatible with the nonlinear manifolds constructed by UMAP, leading to the failure of the decision boundary.
Overall, the CNN, RF, and GBT algorithms showed high accuracies under NON, PCA, and UMAP dimensionality reductions. In previous studies, the optimal classification accuracy of fluorescence images paired with a CNN model was lower (94.2%) for identifying adulteration of OLO with other vegetable oils [47]. The classification accuracy of the RF algorithm has been proven to be 100% when using three-dimensional fluorescence spectroscopy and machine learning for the rapid detection of adulteration in CAO [48]. In spectral band selection for the nondestructive detection of edible oil adulteration using hyperspectral imaging and chemometric analysis, the GBT algorithm showed 100% training accuracy and 93% validation accuracy [49]. Meanwhile, the LR algorithm exhibited high accuracy under NON treatment and PCA dimensionality reduction. The LR algorithm has been reported to be able to classify pure oil (njangsa seed oil, palm kernel oil, or coconut oil) with an accuracy of 93% using Fourier-transform infrared spectroscopy under PCA dimensionality reduction [50]. In practical applications, if the data exhibit clear linear separability and efficiency and interpretability are prioritized, the combination of no (or PCA) dimensionality reduction with LR or an RF is preferable. Conversely, if the data demonstrate significant nonlinearity and require high accuracy, approaches such as UMAP dimensionality reduction combined with RF, GBT, or CNN are more suitable, and computational resources and training time should also be considered. If the sample size is large, dimensionality reduction (PCA or UMAP) should be prioritized to simplify computations, and then appropriate algorithms should be selected to balance accuracy and efficiency.

3.3. Multivariate Qualitative Modeling for the Identification of Adulterated CAO

Moreover, the dataset obtained using the multivariate qualitative model was also subjected to dimensionality reduction using the PCA and UMAP methods, as illustrated in Figure 3c and Figure 4d. As shown in Figure 4c, after PCA dimensionality reduction, the pure oil samples (CAO, COO, OLO, SOO, and SUO) formed independent clusters. The clusters of the binary adulteration systems, including CAO-B1(COO), CAO-B1(OLO), CAO-B1(SOO), and CAO-B1(SUO), were distributed around the pure CAO cluster. The clusters of the ternary adulteration systems, including CAO-B2(COO, OLO), CAO-B2(COO, SOO), CAO-B2(COO, SUO), CAO-B2(OLO, SOO), CAO-B2(OLO, SUO), and CAO-B2(SOO, SUO), were located between the two pure oil clusters. For example, the cluster of CAO-B2(COO, OLO) was located between the cluster of COO and that of OLO. The clusters of the quaternary adulteration systems, including CAO-B3(COO, OLO, SOO), CAO-B3(COO, OLO, SUO), CAO-B3(COO, SOO, SUO), and CAO-B3(OLO, SOO, SUO), were distributed within the triangular area formed by the three pure oil clusters. In contrast, all oil samples in the binary, ternary, quaternary, and quinary adulteration systems showed more dispersed distributions after UMAP dimensionality reduction (Figure 4d).
With multiclass classification, the performances of the five machine learning algorithms (CNN, RF, SVM, LR, and GBT) in distinguishing pure CAO and adulterated CAO under NON, PCA, and UMAP dimensionality reduction methods were analyzed. The results are listed in Table 2. In terms of the NON treatment, the LR algorithm possessed the lowest accuracy (98.44%) among the algorithms, and the RF algorithm had the highest accuracy (99.45%). Regarding PCA dimensionality reduction, the LR algorithm also showed the lowest accuracy (98.35%), while the CNN algorithm exhibited the highest accuracy (99.54%). For UMAP dimensionality reduction, the SVM algorithm had the lowest accuracy (92.50%), whereas the RF algorithm revealed the highest accuracy (96.25%). Overall, the RF algorithm maintained relatively high accuracy while the LR algorithm showed relatively low accuracy under NON, PCA, and UMAP dimensionality reduction methods. The RF algorithm previously demonstrated exceptionally high accuracy (100%) by correctly classifying coconut oil and OLO mixed with COO, SOO, and peanut oil [51].

3.4. Data Fusion Combined with Machine Learning Analysis

Misclassification primarily occurs in multi-adulteration scenarios of CAO within the predefined adulteration settings [52]. To evaluate the performance of the developed models (CNN, RF, SVM, LR, and GBT), 1093 blind samples were used for testing. The confusion matrix results are shown in Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9. Regarding the CNN model, 10, 5, and 67 samples were misclassified under NON (Figure 5a), PCA (Figure 5b), and UMAP (Figure 5c) dimensionality reductions, respectively. Under NON dimensionality reduction (Figure 5a), two pure CAO samples were misclassified as adulterated CAO samples, CAO-B2(SOO, SUO) and CAO-B3(COO, OLO, and SUO), respectively. One pure COO sample and one pure SUO sample were misclassified as SOO and COO samples, respectively. Meanwhile, five samples in ternary adulteration systems (as CAO-B2(SOO, SUO), CAO-B2(COO, SOO), and CAO-B2(OLO, SUO)) and one sample in quaternary adulteration system (as CAO-B3(OLO, SOO, SUO)) were misclassified. Under PCA dimensionality reduction (Figure 5b), one pure COO sample and one pure SUO sample were also misclassified as SOO and COO samples, respectively. Three samples in ternary adulteration systems (as CAO-B2(SOO, SUO) and CAO-B2(OLO, SUO)) were misclassified. Under UMAP dimensionality reduction (Figure 5c), six pure CAO samples were misclassified as adulterated CAO samples: CAO-B2(SOO, SUO), CAO-B3(OLO, SOO, SUO), and CAO-B3(COO, SOO, SUO), respectively. One pure SOO sample was misclassified as a CAO-B1(OLO) sample. Four pure COO samples were misclassified as two SOO and two SUO samples, respectively. Fifteen samples in binary adulteration systems (as CAO-B1(OLO), CAO-B1(COO), and CAO-B1(SUO)), twenty-four ternary samples in adulteration systems (as CAO-B2(OLO, SOO), CAO-B2(SOO, SUO), CAO-B2(OLO, SUO), and CAO-B2(COO, SUO)), sixteen samples in quaternary adulteration systems (as CAO-B3(COO, OLO, SOO), CAO-B3(OLO, SOO, SUO), CAO-B3(COO, SOO, SUO), and CAO-B3(COO, OLO, SOO)), and one sample in quinary adulteration system (as CAO-B4(COO, OLO, SOO, SUO)) were misclassified.
Regarding the RF model, 6, 7, and 41 samples were misclassified under NON (Figure 6a), PCA (Figure 6b), and UMAP (Figure 6c) dimensionality reductions, respectively. Under NON dimensionality reduction (Figure 6a), one pure COO sample and one pure SUO sample were misclassified as SOO and COO samples, respectively. One sample in binary adulteration system (as CAO-B1(OLO)) and three samples in ternary adulteration systems (as CAO-B2(SOO, SUO) and CAO-B2(OLO, SUO)) were misclassified. Under PCA dimensionality reduction (Figure 6b), one pure COO sample and one pure SUO sample were also misclassified as SOO and COO samples, respectively. One sample in binary adulteration system (as CAO-B1(OLO)) and four samples in ternary adulteration systems (as CAO-B2(SOO, SUO) and CAO-B2(OLO, SUO)) were misclassified. Under UMAP dimensionality reduction (Figure 6c), three pure CAO samples were misclassified as adulterated CAO samples, CAO-B2(SOO, SUO) and CAO-B3(OLO, SOO, SUO). One pure SOO sample was misclassified as CAO-B1(OLO). Two COO samples were misclassified as SOO, and one SUO sample was misclassified as COO. Thirteen samples in binary adulteration systems (as CAO-B1(OLO), CAO-B1(COO), and CAO-B1(SUO)), ten samples in ternary adulteration systems (as CAO-B2(COO, OLO), CAO-B2(COO, SOO), CAO-B2(COO, SUO), CAO-B2(OLO, SOO), CAO-B2(OLO, SUO), and CAO-B2(SOO, SUO)), ten samples in quaternary adulteration systems (as CAO-B3(COO, OLO, SOO), CAO-B3(COO, OLO, SUO), CAO-B3(COO, SOO, SUO), and CAO-B3(OLO, SOO, SUO)) and one sample in quinary adulteration system (as CAO-B4(COO, OLO, SOO, SUO)) were misclassified.
For the SVM model, 8, 9, and 82 samples were misclassified under NON (Figure 7a), PCA (Figure 7b), and UMAP (Figure 7c) dimensionality reductions, respectively. Under NON dimensionality reduction (Figure 7a), one pure COO sample and one pure SUO sample were misclassified as SOO and COO samples, respectively. One binary adulteration system (as CAO-B1(SUO)), four samples in ternary adulteration systems (as CAO-B2(COO, SOO), CAO-B2(OLO, SUO), and CAO-B2(COO, OLO)), and one sample in quinary adulteration system (as CAO-B4(COO, OLO, SOO, SUO)) were misclassified. Under PCA dimensionality reduction (Figure 7b), one pure COO sample and one pure SUO sample were also misclassified as SOO and COO samples, respectively. One sample in binary adulteration system (as CAO-B1(SUO)), five samples in ternary adulteration systems (as CAO-B2(COO, SOO), CAO-B2(SOO, SUO), CAO-B2(OLO, SUO), and CAO-B2(COO, OLO)) and one sample in quinary adulteration system (as CAO-B4(COO, OLO, SOO, SUO)) were misclassified. Under UMAP dimensionality reduction (Figure 7c), six pure CAO samples were misclassified as adulterated CAO samples, CAO-B3(OLO, SOO, SUO), CAO-B3(COO, SOO, SUO), and CAO-B2(SOO, SUO). One pure SOO sample was misclassified as CAO-B1(OLO). One pure COO sample and one pure SUO sample were also misclassified as SOO and COO samples, respectively. Sixteen samples in binary adulteration systems (as CAO-B1(OLO), CAO-B1(COO), and CAO-B1(SUO)), twenty-four samples in ternary adulteration systems (as CAO-B2(SOO, SUO), CAO-B2(OLO, SUO), and CAO-B2(COO, SUO)), thirty-two samples in quaternary adulteration systems (as CAO-B3(COO, OLO, SOO), CAO-B3(COO, OLO, SUO), CAO-B3(COO, SOO, SUO), and CAO-B3(OLO, SOO, SUO)) and one sample in quinary adulteration system (as CAO-B4(COO, OLO, SOO, SUO)) were misclassified.
Regarding the LR model, 17, 18, and 78 samples were misclassified under NON (Figure 8a), PCA (Figure 8b), and UMAP (Figure 8c) dimensionality reductions, respectively. Under NON dimensionality reduction (Figure 8a), one pure COO sample and one pure SUO sample were misclassified as SOO and COO samples, respectively. Two samples in binary adulteration systems (as CAO-B1(SUO)), six samples in ternary adulteration systems (as CAO-B2(OLO, SOO), CAO-B2(SOO, SUO), CAO-B2(OLO, SUO), CAO-B2(COO, OLO), and CAO-B2(COO, SUO)), five samples in quaternary adulteration systems (as CAO-B3(COO, OLO, SOO), CAO-B3(OLO, SOO, SUO), and CAO-B3(COO, SOO, SUO)) and two samples in quinary adulteration systems (as CAO-B4(COO, OLO, SOO, SUO)) were misclassified. Under PCA dimensionality reduction (Figure 8b), one pure CAO sample was misclassified as an adulterated CAO sample (CAO-B3(COO, OLO, SUO)), and one SUO sample was misclassified as CAO-B1(OLO). One pure COO sample and one pure SUO sample were also misclassified as SOO and COO samples, respectively. Two samples in binary adulteration systems (as CAO-B1(OLO) and CAO-B1(SUO)), six samples in ternary adulteration systems (as CAO-B2(SOO, SUO), CAO-B2(OLO, SUO), CAO-B2(COO, OLO), and CAO-B2(COO, SUO)), four samples in quaternary adulteration systems (as CAO-B3(COO, OLO, SOO), CAO-B3(OLO, SOO, SUO), and CAO-B3(COO, SOO, SUO) and two samples in quinary adulteration systems (as CAO-B4(COO, OLO, SOO, SUO)) were misclassified. Under UMAP dimensionality reduction (Figure 8c), six pure CAO samples were misclassified as adulterated CAO samples (CAO-B3(OLO, SOO, SUO), CAO-B3(COO, SOO, SUO), and CAO-B2(SOO, SUO)), and one pure SOO sample was misclassified as CAO-B1(OLO). One pure COO sample and one pure SUO sample were also misclassified as SOO and COO samples, respectively. Seventeen samples in binary adulteration systems (as CAO-B1(OLO), CAO-B1(COO), and CAO-B1(SUO)), twenty-three samples in ternary adulteration systems (as CAO-B2(SOO, SUO), CAO-B2(OLO, SUO), and CAO-B2(COO, SUO)), twenty-nine samples in quaternary adulteration systems (as CAO-B3(COO, OLO, SOO), CAO-B3(COO, OLO, SUO), CAO-B3(COO, SOO, SUO), and CAO-B3(OLO, SOO, SUO)) and one sample in quinary adulteration system (as CAO-B4(COO, OLO, SOO, SUO)) were misclassified.
In terms of the GBT model, 7, 9, and 43 samples were misclassified under NON (Figure 9a), PCA (Figure 9b), and UMAP (Figure 9c) dimensionality reductions, respectively. Under NON dimensionality reduction (Figure 9a), one pure COO sample and one pure SUO sample were misclassified as SOO and COO samples, respectively. Four samples in ternary adulteration systems (as CAO-B2(SOO, SUO) and CAO-B2(OLO, SUO)) and one sample in quaternary adulteration system (as CAO-B3(COO, SOO, SUO)) were misclassified. Under PCA dimensionality reduction (Figure 9b), one pure CAO sample was misclassified as CAO-B4(COO, OLO, SOO, SUO). One pure COO sample and one pure SUO sample were also misclassified as SOO and COO samples, respectively. Three samples in ternary adulteration systems (as CAO-B2(SOO, SUO) and CAO-B2(OLO, SUO)) and three samples in quaternary adulteration systems (as CAO-B3(COO, OLO, SOO) and CAO-B3(COO, SOO, SUO)) were misclassified. Under UMAP dimensionality reduction (Figure 9c), three pure CAO samples were misclassified as adulterated CAO samples (CAO-B3(OLO, SOO, SUO) and CAO-B2(SOO, SUO)), and one pure SOO sample was misclassified as adulterated CAO-B1(OLO). Two pure COO samples and one pure SUO sample were misclassified as SOO and COO samples, respectively. Thirteen samples in binary adulteration systems (as CAO-B1(OLO), CAO-B1(COO), and CAO-B1(SUO)), eleven samples in ternary adulteration systems (as CAO-B2(COO, SOO), CAO-B2(SOO, SUO), CAO-B2(OLO, SUO), CAO-B2(COO, OLO), and CAO-B2(COO, SUO)), eleven samples in quaternary adulteration systems (as CAO-B3(COO, OLO, SOO), CAO-B3(COO, OLO, SUO), CAO-B3(COO, SOO, SUO), and CAO-B3(OLO, SOO, SUO)) and one sample in quinary adulteration system (as CAO-B4(COO, OLO, SOO, SUO)) were misclassified.
Overall, the RF model showed the lowest misclassifications (54 samples), followed by the GBT model (59 samples), under NON, PCA, and/or UMAP dimensionality reductions. Regarding the binary adulteration system, the CNN and GBT models showed the best performances under NON or PCA dimensionality reductions, while the RF and GBT models exhibited the best performances under UMAP dimensionality reduction. For the ternary adulteration system, the RF, CNN/GBT (followed by RF), and RF models presented the best performances under NON, PCA, and UMAP dimensionality reductions, respectively. For the quaternary adulteration system, the RF and SVM models exhibited the best performances under NON dimensionality reduction; the CNN, RF, and SVM models revealed the best performances under PCA dimensionality reduction; and the RF model showed the best performance under UMAP dimensionality reduction. For the quinary adulteration system, the CNN, RF, and GBT models expressed the best performances under NON and PCA dimensionality reductions, whereas the CNN, RF, SVM, LR, and GBT models exhibited the same performance under UMAP dimensionality reduction. Therefore, the GBT model under NON, PCA, and/or UMAP dimensionality reductions is preferable for a binary adulteration system. Previously, the GBT model was proven to be capable of achieving a 100% recognition rate for identifying adulterated safflower seed oil via hyperspectral spectroscopy [53]. The RF model under NON, PCA, and/or UMAP dimensionality reductions can be tentatively recommended for ternary and quaternary adulteration systems. In a previous study, the RF model was found to outperform others in the qualitative detection of CAO adulteration using an electronic nose based on wavelet decomposition humidity correction [54]. The CNN, RF, and GBT models were effective for the quinary adulteration system under NON and PCA dimensionality reductions.

3.5. Quantitative Modeling for Adulteration Level Prediction of Adulterated CAO

To realize oil quantitative classification, ANOVA F-value selection (Figure 10a), mutual information ranking (Figure 10b), and Random Forest importance (Figure 10c) methods were used to evaluate the feature importance score of regression. Meanwhile, a Pearson’s correlation coefficient analysis (Figure 10d) was performed. As shown in Figure 10a, both CAO (blue) and SOO (orange) demonstrated a pronounced accumulation of high F-value signals within the m/z 200–400 range, indicating this region was a definitive natural differentiation core zone for these two oils. As illustrated in Figure 10b, all the oils exhibited notably higher M-values within the m/z 200–600 range, implying this region was a crucial information association core zone where spectral features had optimal discriminative capacity for oil classification. As shown in Figure 10c, both CAO (blue) and SOO (orange) showed obviously higher R-values within the m/z 200–400 range, suggesting this region was particularly critical for authentication and adulteration detections for these two oils. As displayed in Figure 10d, CAO (blue) exhibited consistently positive correlation coefficients (absolute values > 0.4) within the m/z 200–400 range, hinting at a linear relationship between signal intensity in this region and CAO content percentage. This quantitative correlation suggests that the spectral features could serve as reliable indicators for CAO quantification in potential adulteration scenarios.
CNN models have been widely used to quantitatively predict adulteration levels in camellia-blended oil [48,55,56]. In order to evaluate the generalization ability of the CNN quantitative model, 80% of the samples were randomly selected as the training set, with the remaining 20% serving as the prediction set. Within the training set, 80% and 20% were used for training and validation, respectively. The programs were independently executed in 10 runs under three conditions: NON, PCA, and UMAP dimensionality reductions. The relevant statistical results regarding the CNN model are summarized in Table 3. As shown, the PCA-CNN model performed optimally in the quantitative analysis of adulteration levels of CAO.
The quantitative performance of the models was evaluated using several standard metrics, including R C 2 , R P 2 , RMSEC, RMSEP, and RPD. R2 ( R C 2 and R P 2 ) values close to 1 indicate that the model explains most of the variance in the data, with high a R P 2 signifying strong generalizability to unseen samples. RMSEC and RMSEP quantify the average prediction error, with lower values denoting higher precision; notably, RMSEP is a more stringent measure of practical utility as it is calculated on an independent test set. According to widely accepted guidelines in chemometrics [57,58], an RPD value greater than 3.0 indicates a model has excellent predictive ability suitable for practical applications. In this study, all RPD values reported in Table 3 and Table 4 substantially exceed this threshold (ranging from 5.0 to 16.3). This finding confirms that our PCA-CNN model, especially in terms of predicting OLO and SUO adulteration levels, meets the criteria for a robust quantitative predictive model. The high R P 2 values (0.9599–0.9963) and low RMSEP values (1.1–5.8%) further support the model’s accuracy and precision in quantifying adulteration levels. These results demonstrate that the developed approach is not only statistically significant but also practically applicable for quality control purposes.
Based on the results in Table 3, the PCA-CNN model was further employed for the quantitative detection of adulterants (COO, OLO, SOO, and SUO) in CAO. The statistical results are shown in Table 4. Compared to the other models, the PCA-CNN model showed optimal performance in quantitatively detecting OLO in CAO, indicating the prediction accuracy of the developed PCA-CNN model for the adulteration ratio of OLO is extremely high. This superior performance can be attributed to the high compatibility between OLO’s distinctive volatile composition and the feature extraction mechanism of PCA. OLO contains high concentrations of characteristic compounds, such as C6 and C5 derivatives from the lipoxygenase pathway, which are closely associated with its sensory and chemical profiles [59]. These compounds produce strong, concentration-linear signals in the low m/z range (100–400 Da) in mass spectrometry (Figure 10d)—a linear trend effectively captured by PCA. Consequently, the principal components extracted by PCA can accurately reflect the compositional variations between OLO and CAO with minimal feature overlap, thereby enhancing the predictive performance of the PCA-CNN framework. Furthermore, the PCA-CNN model exhibited the second-highest precision for the quantitative detection of SUO in CAO. Similarly, the CNN model has been demonstrated to enhance the accuracy of OLO adulteration detection from spectral data [60]. In the study by Liu et al. [61], the SVM model performed best for predicting adulteration of CAO with SOO, and the RF model was optimal for CAO adulterated with COO, rapeseed oil, or peanut oil when utilizing UV-Vis-NIR spectroscopy combined with feature selection methods.
Regarding the practical detection limits of the proposed method, the lowest adulteration level prepared and tested in this study was 3% (v/v). As demonstrated by the high binary classification accuracies (Table 1) and the low RMSEP values achieved in quantitative prediction (Table 4), the method showed reliable detection and quantification abilities at this level.

4. Conclusions

Developing novel techniques for detecting CAO adulteration is of great significance in maintaining the orderly development of the CAO industry and market health. In the present study, a rapid and precise method for identifying and quantifying multi-species vegetable oil adulteration in CAO was established using SICRIT-HRMS and machine learning methods. SICRIT-HRMS fingerprints of CAO, along with its adulterated oils (binary, ternary, quaternary, and quinary adulteration systems), were successfully acquired. In the mass spectra, the low m/z region (especially 100–300) exhibited importances to oil classification in ANOVA F-value selection, mutual information ranking, and Random Forest importance methods. For qualitative detection, binary classification models based on the CNN, RF, and GBT algorithms showed high accuracies for identifying CAO adulteration under NON, PCA, and UMAP dimensionality reductions. UMAP dimensionality reduction was effective in accentuating the feature-space differences between adulterated and pure CAO samples. In multiclass classification, the RF algorithm exhibited relatively high accuracy in distinguishing pure and adulterated CAO under NON, PCA, and UMAP dimensionality reduction methods. The five developed models (CNN, RF, SVM, LR, and GBT) exhibited different performances: (i) the GBT model under NON, PCA, and/or UMAP dimensionality reductions is preferable for binary adulteration system; (ii) the RF model under NON, PCA, and/or UMAP dimensionality reductions can largely be recommended for ternary and quaternary adulteration system; (iii) the CNN, RF, and GBT models are effective for quinary adulteration system under NON and PCA dimensionality reductions. For quantitative prediction, the PCA-CNN model performed optimally in the quantitative analysis of adulteration levels of CAO. In particular, the PCA-CNN model exhibited optimal performance in the quantitative detection of OLO in CAO, and exhibited the second-highest precision in the quantitative detection of SUO in CAO. In summary, this study presents a non-targeted, efficient, and scalable framework for CAO authentication with multi-species vegetable oils. The findings offer a promising tool for real-world screening and quality control in the edible oil industry. While the models performed well on our current dataset, their broader applicability must be further validated. It should be noted that this method primarily relies on volatile and semi-volatile compound profiles. For oils that have undergone deep refining processes—which may significantly diminish characteristic volatile markers—or those possessing volatile fingerprints highly similar to those of CAO, the detection sensitivity of this approach could be compromised, leading to a potential risk of false negatives. In future work, we will explicitly incorporate oils corresponding to a wider range of geographical origins, cultivars, and processing conditions to rigorously test and enhance the global applicability and robustness of the proposed framework. Furthermore, recognizing that oil adulteration is a complex, multi-component challenge, we plan to extend our analysis to non-volatile constituents and integrate complementary analytical approaches. This integrated strategy is expected to enable a more comprehensive, multifaceted assessment, ultimately strengthening the reliability and applicability of the method for real-world quality control and regulatory screening.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/foods15030434/s1, Table S1: Sample information of the selected vegetable oils. Table S2: Details of all adulterated CAO samples with specific adulteration ratios (V/V, %). Figure S1: Representative mass spectra (VB/VA = 50%:50%) of binary adulteration system generated in positive ion mode by SICRIT-HRMS measurement. Figure S2: Representative mass spectra (VB/VA = 50%:50%) of ternary adulteration system generated in positive ion mode by SICRIT-HRMS measurement. Figure S3: Representative mass spectra (VB/VA = 50%:50%) of quaternary adulteration system generated in positive ion mode by SICRIT-HRMS measurement. Figure S4: Representative mass spectra (VB/VA = 50%:50%) of quinary adulteration system generated in positive ion mode by SICRIT-HRMS measurement.

Author Contributions

Conceptualization, M.W., T.L., Q.Z., and X.-Y.W.; methodology, M.W., T.L., Q.Z., and X.-Y.W.; investigation, M.W., T.L., H.L., X.-B.L., and H.-C.L.; formal analysis, H.L., X.-B.L., and H.-C.L.; writing—original draft preparation, M.W., and T.L.; writing—review and editing, Q.Z., and X.-Y.W.; supervision, Q.Z., and X.-Y.W.; project administration, M.W., and T.L.; funding acquisition, M.W., Q.Z., and X.-Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Science and Technology Program Project of the State Administration for Market Regulation (2023MK073); the Science and Technology Innovation Program from Forestry Administration of Jiangxi Province (YCYJZX[2023]341); the Provincial Key R&D Program of Jiangxi (20233BBF64002); and the University-Level Scientific Research Projects of Gannan Medical University (QD201913, QD202128, TD202406-2, TD202313, TD202313-1).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare that they have no known competing financial interests.

References

  1. Zou, Q.; Chen, A.Q.; Huang, J.; Wang, M.; Luo, J.H.; Wang, A.; Wang, X.Y. Edible plant oils modulate gut microbiota during their health-promoting effects: A review. Front. Nutr. 2024, 11, 1473648. [Google Scholar] [CrossRef] [PubMed]
  2. Shi, T.; Zhu, M.T.; Chen, Y.; Yan, X.L.; Chen, Q.; Wu, X.L.; Lin, J.N.; Xie, M.Y. 1H NMR combined with chemometrics for the rapid detection of adulteration in camellia oils. Food Chem. 2018, 242, 308–315. [Google Scholar] [CrossRef] [PubMed]
  3. Gao, L.; Jin, L.; Liu, Q.; Zhao, K.; Lin, L.; Zheng, J.; Li, C.; Chen, B.; Shen, Y. Recent advances in the extraction, composition analysis and bioactivity of Camellia (Camellia oleifera Abel.) oil. Trend. Food Sci. Technol. 2024, 143, 104211. [Google Scholar] [CrossRef]
  4. Shi, T.; Wu, G.; Jin, Q.Z.; Wang, X.G. Camellia oil authentication: A comparative analysis and recent analytical techniques developed for its assessment. A review. Trends Food Sci. Technol. 2020, 97, 88–99. [Google Scholar] [CrossRef]
  5. Shi, T.; Wu, G.C.; Jin, Q.Z.; Wang, X.G. Detection of camellia oil adulteration using chemometrics based on fatty acids GC fingerprints and phytosterols GC–MS fingerprints. Food Chem. 2021, 352, 129422. [Google Scholar] [CrossRef]
  6. Yang, X.M.; Zhang, M.J.; Koidis, A.; Liu, X.D.; Guo, C.Z.; Xu, Z.L.; Wei, X.Q.; Lei, H.T. Identification and quantitation of multiplex camellia oil adulteration based on 11 characteristic lipids using UPLC-Q-Orbitrap-MS. Food Chem. 2025, 468, 142370. [Google Scholar] [CrossRef]
  7. Song, G.S.; Xiang, T.J.; Xu, Z.M.; Hou, H.N.; Ge, Y.C.; Lai, H.N.; Wang, D.L.; Yuan, T.L.; Li, L.; Wang, Z.Y.; et al. Rapid identification of multiplex camellia oil adulteration based on lipidomic fingerprint using laser assisted rapid evaporative ionization mass spectrometry and data fusion combined with machine learning. LWT 2025, 228, 118078. [Google Scholar] [CrossRef]
  8. Shi, T.; Dai, T.H.; Wu, G.C.; Jin, Q.Z.; Wang, X.G. Camellia oil grading adulteration detection using characteristic volatile components GC-MS fingerprints combined with chemometrics. Food Control 2025, 169, 111033. [Google Scholar] [CrossRef]
  9. Du, Q.W.; Zhu, M.T.; Shi, T.; Luo, X.; Gan, B.; Tang, L.J.; Chen, Y. Adulteration detection of corn oil, rapeseed oil and sunflower oil in camellia oil by in situ diffuse reflectance near-infrared spectroscopy and chemometrics. Food Control 2021, 121, 107577. [Google Scholar] [CrossRef]
  10. Wang, J.H.; Qian, J.J.; Xu, M.T.; Ding, J.Y.; Yue, Z.; Ding, J.L.; Zhang, Y.; Dai, H.; Liu, X.D.; Pi, F.W. Adulteration detection of multi-species vegetable oils in camellia oil using Raman spectroscopy: Comparison of chemometrics and deep learning methods. Food Chem. 2025, 463, 141314. [Google Scholar] [CrossRef]
  11. Deng, Z.W.; Zheng, Y.; Lan, T.; Zhang, L.X.; Yun, Y.H.; Song, W. Detection of camellia oil adulteration based on near-infrared spectroscopy and smartphone combined with deep learning and multimodal fusion. Food Chem. 2025, 472, 142930. [Google Scholar] [CrossRef] [PubMed]
  12. Ye, Q.; Meng, X.H. Highly efficient authentication of edible oils by FTIR spectroscopy coupled with chemometrics. Food Chem. 2022, 385, 132661. [Google Scholar] [CrossRef]
  13. Wang, T.; Wu, H.L.; Long, W.J.; Hu, Y.; Cheng, L.; Chen, A.Q.; Yu, R.Q. Rapid identification and quantification of cheaper vegetable oil adulteration in camellia oil by using excitation-emission matrix fluorescence spectroscopy combined with chemometrics. Food Chem. 2019, 293, 348–357. [Google Scholar] [CrossRef] [PubMed]
  14. Xu, P.P.; Nie, Q.L.; Huang, R.B.; Shi, J.; Ren, J.J.; You, R.Y.; Wang, H.F.; Yang, Y.; Lu, Y.D. A fast and highly efficient strategy for detection of camellia oil adulteration using machine learning assisted SERS. LWT 2024, 213, 117069. [Google Scholar] [CrossRef]
  15. Zhao, J.; Wang, R.N.; Zhang, Z.Q.; Yu, Y.; Ren, Z.Y.; Huang, Y.; Li, Z.M. Quantitative analysis of multi-component adulteration in camellia oil by near-infrared spectroscopy combined with long short-term memory neural networks algorithm. J. Food Compos. Anal. 2025, 148, 108359. [Google Scholar] [CrossRef]
  16. Hu, B.K.; Zhang, D.Y.; Geng, Y.Y.; Zhang, S.X.; Liu, Y.N.; Wang, J.H. Chemometrics analysis of camellia oil authenticity using LF NMR and fatty acid GC fingerprints. J. Food Compos. Anal. 2024, 133, 106447. [Google Scholar] [CrossRef]
  17. Wang, R.N.; Fang, Y.; Luo, W.F.; Chen, M.T.; Li, Z.M.; Yu, Y.; Ren, Z.Y.; Huang, Y.; Dong, H. Quantitative analysis of camellia oil binary adulteration using near infrared spectroscopy combined with chemometrics. Microchem. J. 2025, 217, 115018. [Google Scholar] [CrossRef]
  18. Wang, X.R.; Wei, C.J.; Wang, W.; Wang, D.; Liu, Y.; Jia, B.B.; Jiao, Y.N. Identification of camellia oil adulteration by using near infrared spectroscopy combined with two dimensional correlation spectroscopy (2DCOS) analysis. Infrared Phys. Technol. 2025, 149, 105902. [Google Scholar] [CrossRef]
  19. Rifna, E.J.; Pandiselvam, R.; Kothakota, A.; Rao, K.V.S.; Dwivedi, M.; Kumar, M.; Thirumdas, R.; Ramesh, S.V. Advanced process analytical tools for identification of adulterants in edible oils—A review. Food Chem. 2022, 369, 130898. [Google Scholar] [CrossRef]
  20. Lv, Y.G.; Zhao, J.Y.; Xue, H.Y.; Ma, Q. Ambient ionization mass spectrometry for food analysis: Recent progress and applications. TrAC Trends Anal. Chem. 2024, 178, 117814. [Google Scholar] [CrossRef]
  21. Dou, X.J.; Zhang, L.X.; Yang, R.N.; Wang, X.; Yu, L.; Yue, X.F.; Ma, F.; Mao, J.; Wang, X.P.; Zhang, W.; et al. Mass spectrometry in food authentication and origin traceability. Mass Spectrom. Rev. 2023, 42, 1772–1807. [Google Scholar] [CrossRef] [PubMed]
  22. Lee, C.Y.; Su, H.; Chang, T.H.; Ponnusamy, V.K.; Sun, W.J.; Shiea, J. Rapid characterization and classification of edible oils with ambient ionization mass spectrometry combined with principal component analysis. J. Food Compos. Anal. 2025, 140, 107256. [Google Scholar] [CrossRef]
  23. Rubert, J.; Zachariasova, M.; Hajslova, J. Advances in high-resolution mass spectrometry based on metabolomics studies for food—A review. Food Addit. Contam. A 2015, 32, 1685–1708. [Google Scholar] [CrossRef] [PubMed]
  24. Quintanilla-Casas, B.Q.; Strocchi, G.; Bustamante, J.B.; Torres-Cobos, B.; Guardiola, F.I.; Moreda, W.; Martínez-Rivas, J.M.; Valli, E.; Bendini, A.; Toschi, T.G.; et al. Large-scale evaluation of shotgun triacylglycerol profiling for the fast detection of olive oil adulteration. Food Control 2021, 123, 107851. [Google Scholar] [CrossRef]
  25. Cavanna, D.; Hurkova, K.; Džuman, Z.; Serani, A.; Serani, M.; Dall’Asta, C.; Tomaniova, M.; Hajslova, J.; Suman, M. A Non-Targeted High-Resolution Mass Spectrometry Study for Extra Virgin Olive Oil Adulteration with Soft Refined Oils: Preliminary Findings from Two Different Laboratories. ACS Omega 2020, 5, 24169–24178. [Google Scholar] [CrossRef]
  26. Ju, Z.S.; Yang, N.; Guo, C.T.; Zhang, Q.; Chen, G.; Zhang, Q.Q.; Yu, J.J.; Zhang, H.Y.; Jiang, Y.Q.; Zhang, X.Y.; et al. Rapid and Eco-Friendly Quality Grading of Sauce-Aroma Baijiu Using Soft Ionization by Chemical Reaction in Transfer-Quadrupole Orbitrap HRMS Fingerprinting. ACS Food Sci. Technol. 2025, 5, 3293–3306. [Google Scholar] [CrossRef]
  27. Basham, V.; Hancock, T.; McKendrick, J.; Tessarolo, N.; Wicking, C. Detailed chemical analysis of a fully formulated oil using dielectric barrier discharge ionisation–mass spectrometry. Rapid Commun. Mass Spectrom. 2022, 36, e9320. [Google Scholar] [CrossRef]
  28. Greif, M.; Frömel, T.; Knepper, T.P.; Huhn, C.; Wagner, S.; Pütz, M. Rapid Assessment of Samples from Large-Scale Clandestine Synthetic Drug Laboratories by Soft Ionization by Chemical Reaction in Transfer–High-Resolution Mass Spectrometry. J. Am. Soc. Mass Spectrom. 2025, 36, 1254–1263. [Google Scholar] [CrossRef]
  29. Liu, Y.; Pu, H.B.; Sun, D.W. Efficient extraction of deep image features using convolutional neural network (CNN) for applications in detecting and analysing complex food matrices. Trends Food Sci. Tech. 2021, 113, 193–204. [Google Scholar] [CrossRef]
  30. Xue, Y.C.; Jiang, H. Monitoring of chlorpyrifos residues in corn oil based on Raman spectral deep-learning model. Foods 2023, 12, 2402. [Google Scholar] [CrossRef]
  31. Tian, H.X.; Wu, D.; Chen, B.; Yuan, H.B.; Yu, H.Y.; Lou, X.M.; Chen, C. Rapid identification and quantification of vegetable oil adulteration in raw milk using a flash gas chromatography electronic nose combined with machine learning. Food Control 2023, 150, 109758. [Google Scholar] [CrossRef]
  32. de Santana, F.B.; Borges Neto, W.; Poppi, R.J. Random forest as one-class classifier and infrared spectroscopy for food adulteration detection. Food Chem. 2019, 293, 323–332. [Google Scholar] [CrossRef] [PubMed]
  33. Bavali, A.; Rahmatpanahi, A.; Chegini, R.M. Quantitative detection of adulteration in avocado oil using laser-induced fluorescence and machine learning models. Microchem. J. 2025, 211, 113080. [Google Scholar] [CrossRef]
  34. Esmi, F.; Dalai, A.K.; Hu, Y.F. Comparison of various machine learning techniques for modeling the heterogeneous acid-catalyzed alcoholysis process of biodiesel production from green seed canola oil. Energy Rep. 2024, 12, 321–328. [Google Scholar] [CrossRef]
  35. Cucos, A.M.; Iantovics, L.B. Comparative study of random forest, gradient boosted trees, feedforward neural networks and convolutional neural networks using fingerprints and molecular descriptors for adverse drug reaction prediction. Procedia Comput. Sci. 2024, 246, 1895–1904. [Google Scholar] [CrossRef]
  36. Zhang, S.Z.; Hu, Y.R.; Sun, X.R.; Liu, C.L.; Yan, S.N.; Jiang, C.Z.; Zhou, X.P.; Liu, X.C.; Zhao, K. Identification and discrimination of olive oil adulteration by oblique-incidence reflectivity difference method. J. Food Compos. Anal. 2025, 144, 107692. [Google Scholar] [CrossRef]
  37. Yang, J.Y.; Wang, M.; Zou, X.G.; Peng, B.; Yin, Y.L.; Deng, Z.Y. A Novel Aqueous Extraction for Camellia Oil by Emulsified Oil: A Frozen/Thawed Method. Eur. J. Lipid Sci. Technol. 2019, 121, 1800431. [Google Scholar] [CrossRef]
  38. Yang, H.D.; Dong, Y.; Wang, D.Y.; Wang, X.D. Differences in Oxidative Stability, Sensory Properties, and Volatile Compounds of Pepper Aromatized Sunflower Oils Prepared by Different Methods during Accelerated Storage. Eur. J. Lipid Sci. Technol. 2023, 125, 2200099. [Google Scholar] [CrossRef]
  39. Liu, X.F.; Wang, S.; Tamogami, S.; Chen, J.Y.; Zhang, H. Volatile Profile and Flavor Characteristics of Ten Edible Oils. Anal. Lett. 2021, 54, 1423–1438. [Google Scholar] [CrossRef]
  40. Malikhah, M.; Sarno, R.; Sabilla, S.I. Ensemble Learning for Optimizing Classification of Pork Adulteration in Beef Based on Electronic Nose Dataset. Int. J. Intell. Eng. Syst. 2021, 14, 44–55. [Google Scholar] [CrossRef]
  41. Sun, P.Y.; Bao, K.W.; Li, H.S.; Li, F.J.; Wang, X.P.; Cao, L.X.; Li, G.M.; Zhou, Q.; Tang, H.X.; Bao, M.T. An efficient classification method for fuel and crude oil types based on m/z 256 mass chromatography by COW-PCA-LDA. Fuel 2018, 222, 416–423. [Google Scholar] [CrossRef]
  42. Damirchi, S.A.; Savage, G.P.; Dutta, P.C. Sterol fractions in hazelnut and virgin olive oils and 4,4′-dimethylsterols as possible markers for detection of adulteration of virgin olive oil. J. Am. Oil Chem. Soc. 2005, 82, 717–725. [Google Scholar] [CrossRef]
  43. Tang, R.Z.; Song, K.; Gong, Y.Z.; Sheng, D.Z.; Zhang, Y.; Li, A.; Yan, S.Y.; Yan, S.C.; Zhang, J.S.; Tan, Y.; et al. Detailed speciation of semi-volatile and intermediate-volatility organic compounds (S/IVOCs) in marine fuel oils using GC × GC-ms. Int. J. Environ. Res. Public Health 2023, 20, 2508. [Google Scholar] [CrossRef] [PubMed]
  44. Wang, Z.Z.; Wu, Q.Y.; Kamruzzaman, M. Portable NIR spectroscopy and PLS based variable selection for adulteration detection in quinoa flour. Food Control 2022, 138, 108970. [Google Scholar] [CrossRef]
  45. McInnes, L.; Healy, J.; Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar] [CrossRef]
  46. Aqeel, M.; Sohaib, A.; Iqbal, M.; Rehman, H.U.; Rustam, F. Hyperspectral identification of oil adulteration using machine learning techniques. Curr. Res. Food Sci. 2024, 8, 100773. [Google Scholar] [CrossRef]
  47. Jiao, Z.B.; Song, L.F.; Zhang, Y.L.; Dai, J.W.; Liu, Y.W.; Zhang, Q.; Qin, W.; Yan, J. A comparative study of fluorescence hyperspectral imaging and FTIR spectroscopy combined with chemometrics for the detection of extra virgin olive oil adulteration. J. Food Meas. Charact. 2025, 19, 1761–1776. [Google Scholar] [CrossRef]
  48. Hu, Y.T.; Wei, C.J.; Wang, X.R.; Wang, W.; Jiao, Y.N. Using three-dimensional fluorescence spectroscopy and machine learning for rapid detection of adulteration in camellia oil. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2025, 329, 125524. [Google Scholar] [CrossRef]
  49. Aqeel, M.; Munawar, H.; Sohaib, A.; Khan, K.B.; Deng, Y.M. Spectral band selection for nondestructive detection of edible oil adulteration using hyperspectral imaging and chemometric analysis. J. Food Meas. Charact. 2025. [Google Scholar] [CrossRef]
  50. Tachie, C.Y.E.; Obiri-Ananey, D.; Alfaro-Cordoba, M.; Tawiah, N.A.; Aryee, A.N.A. Classification of oils and margarines by FTIR spectroscopy in tandem with machine learning. Food Chem. 2024, 431, 137077. [Google Scholar] [CrossRef]
  51. Zhao, Y.M.; Wang, Z.Y.; Liu, Z.; Xue, M.M.; Yuan, Y.Z.; Shi, H.W. Machine learning-assisted classification and adulteration detection of fatty oils using fatty acid profiles obtained via supercritical fluid chromatography. J. Pharmaceut. Biomed. 2025, 265, 116993. [Google Scholar] [CrossRef]
  52. Wei, C.J.; Wang, W.; Jiao, Y.N.; Yoon, S.C.; Ni, X.Z.; Wang, X.R.; Song, Z.W. Identification of Camellia Oil Adulteration with Excitation-Emission Matrix Fluorescence Spectra and Deep Learning. J. Fluoresc. 2025, 35, 9175–9188. [Google Scholar] [CrossRef]
  53. Zou, Z.Y.; Long, T.; Chen, J.; Wang, L.; Wu, X.W.; Zou, B.; Xu, L.J. Rapid identification of adulterated safflower seed oil by use of hyperspectral spectroscopy. Spectrosc. Lett. 2021, 54, 675–684. [Google Scholar] [CrossRef]
  54. Li, D.P.; Jiang, H.; Yang, G.; Gong, Z.L.; Wen, T. Qualitative and quantitative detection of camellia oil adulteration using electronic nose based on wavelet decomposition humidity correction. LWT 2024, 210, 116822. [Google Scholar] [CrossRef]
  55. Chen, A.Q.; Wu, H.L.; Wang, T.; Wang, X.Z.; Sun, H.B.; Yu, R.Q. Intelligent analysis of excitation-emission matrix fluorescence fingerprint to identify and quantify adulteration in camellia oil based on machine learning. Talanta 2023, 251, 123733. [Google Scholar] [CrossRef] [PubMed]
  56. Liu, H.; Ma, S.A.; Liang, N.; Wang, X. Quantitatively Detecting Camellia Oil Products Adulterated by Rice Bran Oil and Corn Oil Using Raman Spectroscopy: A Comparative Study Between Models Utilizing Machine Learning Algorithms and Chemometric Algorithms. Foods 2024, 13, 4182. [Google Scholar] [CrossRef] [PubMed]
  57. Liang, S.X.; Chen, G.Q.; Ma, C.Q.; Zhu, C.; Li, L.; Gao, H.; Yang, T.Q. Quantitative determination of acid value in palm oil during thermal oxidation using Raman spectroscopy combined with deep learning models. Food Chem. 2025, 474, 143107. [Google Scholar] [CrossRef]
  58. Moraes, I.A.; Junior, S.B.; Villa, J.E.L.; Cunha, R.L.; Barbin, D.F. Predicting oleogels properties using non-invasive spectroscopic techniques and machine learning. Food Res. Int. 2025, 207, 116044. [Google Scholar] [CrossRef]
  59. Quintanilla-Casas, B.; Bustamante, J.; Guardiola, F.; García-González, D.L.; Barbieri, S.; Bendini, A.; Toschi, T.G.; Vichi, S.; Tres, A. Virgin olive oil volatile fingerprint and chemometrics: Towards an instrumental screening tool to grade the sensory quality. LWT 2020, 121, 108936. [Google Scholar] [CrossRef]
  60. Bandiera, A.; Camerlingo, A.; Sanna, N.; Zazza, C.; Benelli, A.; Massantini, R.; Moscetti, R. Comparing deep and classical Chemometrics: Can CNN enhance the accuracy of EVOO adulteration detection from spectral data? Food Control 2026, 179, 111608. [Google Scholar] [CrossRef]
  61. Liu, Q.; Gong, Z.L.; Li, D.P.; Wen, T.; Guan, J.W.; Zheng, W.F. Rapid and Low-Cost Quantification of Adulteration Content in Camellia Oil Utilizing UV-Vis-NIR Spectroscopy Combined with Feature Selection Methods. Molecules 2023, 28, 5943. [Google Scholar] [CrossRef]
Figure 1. Schematic workflow of the SICRIT-MS data processing pipeline.
Figure 1. Schematic workflow of the SICRIT-MS data processing pipeline.
Foods 15 00434 g001
Figure 2. Mass spectra of CAO (a), COO (b), OLO (c), SOO (d), and SUO (e) generated in positive-ion mode by SICRIT-HRMS measurement. CAO—camellia oil; COO—corn oil; OLO—olive oil; SOO—soybean oil; SUO—sunflower oil.
Figure 2. Mass spectra of CAO (a), COO (b), OLO (c), SOO (d), and SUO (e) generated in positive-ion mode by SICRIT-HRMS measurement. CAO—camellia oil; COO—corn oil; OLO—olive oil; SOO—soybean oil; SUO—sunflower oil.
Foods 15 00434 g002
Figure 3. The feature importance score of classification calculated using ANOVA F-value selection (a), mutual information ranking (b), and Random Forest importance (c). Red points indicate selected feature points.
Figure 3. The feature importance score of classification calculated using ANOVA F-value selection (a), mutual information ranking (b), and Random Forest importance (c). Red points indicate selected feature points.
Foods 15 00434 g003
Figure 4. Visualization of PCA and UMAP dimensionality reduction techniques. (a,b) Binary classification; (c,d) multiclass classification. CAO—camellia oil; COO—corn oil; OLO—olive oil; SOO—soybean oil; SUO—sunflower oil; PCA—principal component analysis; UMAP—uniform manifold approximation and projection.
Figure 4. Visualization of PCA and UMAP dimensionality reduction techniques. (a,b) Binary classification; (c,d) multiclass classification. CAO—camellia oil; COO—corn oil; OLO—olive oil; SOO—soybean oil; SUO—sunflower oil; PCA—principal component analysis; UMAP—uniform manifold approximation and projection.
Foods 15 00434 g004
Figure 5. Confusion matrix results for the test set of CNN model with NON (a), PCA dimensionality reduction (b), and UMAP dimensionality reduction (c). CAO—camellia oil; COO—corn oil; OLO—olive oil; SOO—soybean oil; SUO—sunflower oil; PCA—principal component analysis; UMAP—uniform manifold approximation and projection.
Figure 5. Confusion matrix results for the test set of CNN model with NON (a), PCA dimensionality reduction (b), and UMAP dimensionality reduction (c). CAO—camellia oil; COO—corn oil; OLO—olive oil; SOO—soybean oil; SUO—sunflower oil; PCA—principal component analysis; UMAP—uniform manifold approximation and projection.
Foods 15 00434 g005
Figure 6. Confusion matrix results for the test set of RF model with NON (a), PCA dimensionality reduction (b), and UMAP dimensionality reduction (c). CAO—camellia oil; COO—corn oil; OLO—olive oil; SOO—soybean oil; SUO—sunflower oil; PCA—principal component analysis; UMAP—uniform manifold approximation and projection.
Figure 6. Confusion matrix results for the test set of RF model with NON (a), PCA dimensionality reduction (b), and UMAP dimensionality reduction (c). CAO—camellia oil; COO—corn oil; OLO—olive oil; SOO—soybean oil; SUO—sunflower oil; PCA—principal component analysis; UMAP—uniform manifold approximation and projection.
Foods 15 00434 g006
Figure 7. Confusion matrix results for the test set of SVM model with NON (a), PCA dimensionality reduction (b), and UMAP dimensionality reduction (c). CAO—camellia oil; COO—corn oil; OLO—olive oil; SOO—soybean oil; SUO—sunflower oil; PCA—principal component analysis; UMAP—uniform manifold approximation and projection.
Figure 7. Confusion matrix results for the test set of SVM model with NON (a), PCA dimensionality reduction (b), and UMAP dimensionality reduction (c). CAO—camellia oil; COO—corn oil; OLO—olive oil; SOO—soybean oil; SUO—sunflower oil; PCA—principal component analysis; UMAP—uniform manifold approximation and projection.
Foods 15 00434 g007
Figure 8. Confusion matrix results for the test set of LR model with NON (a), PCA dimensionality reduction (b), and UMAP dimensionality reduction (c). CAO—camellia oil; COO—corn oil; OLO—olive oil; SOO—soybean oil; SUO—sunflower oil; PCA—principal component analysis; UMAP—uniform manifold approximation and projection.
Figure 8. Confusion matrix results for the test set of LR model with NON (a), PCA dimensionality reduction (b), and UMAP dimensionality reduction (c). CAO—camellia oil; COO—corn oil; OLO—olive oil; SOO—soybean oil; SUO—sunflower oil; PCA—principal component analysis; UMAP—uniform manifold approximation and projection.
Foods 15 00434 g008
Figure 9. Confusion matrix results for the test set of GBT model with NON (a), PCA dimensionality reduction (b), and UMAP dimensionality reduction (c). CAO—camellia oil; COO—corn oil; OLO—olive oil; SOO—soybean oil; SUO—sunflower oil; PCA—principal component analysis; UMAP—uniform manifold approximation and projection.
Figure 9. Confusion matrix results for the test set of GBT model with NON (a), PCA dimensionality reduction (b), and UMAP dimensionality reduction (c). CAO—camellia oil; COO—corn oil; OLO—olive oil; SOO—soybean oil; SUO—sunflower oil; PCA—principal component analysis; UMAP—uniform manifold approximation and projection.
Foods 15 00434 g009
Figure 10. The feature importance score of regression (ANOVA F-value selection (a), mutual information (MI) ranking (b), and Random Forest importance (c)) and the Pearson correlation coefficients (d). Red points indicate selected feature points.
Figure 10. The feature importance score of regression (ANOVA F-value selection (a), mutual information (MI) ranking (b), and Random Forest importance (c)) and the Pearson correlation coefficients (d). Red points indicate selected feature points.
Foods 15 00434 g010
Table 1. Results of binary classification models.
Table 1. Results of binary classification models.
Dimensionality Reduction MethodModelAccuracy/%Precision/%Recall/%F1-ScoreAUC
NONCNN99.9599.91100.000.99961.0000
 RF100.00100.00100.001.00001.0000
 SVM86.2683.3591.940.87430.9449
 LR100.00100.00100.001.00001.0000
 GBT100.00100.00100.001.00001.0000
PCACNN100.00100.00100.001.00001.0000
 RF100.00100.00100.001.00001.0000
 SVM81.7987.4975.810.81230.8967
 LR98.5198.7498.390.98560.9997
 GBT100.00100.00100.001.00001.0000
UMAPCNN98.7099.2898.210.98740.9958
 RF99.4999.2099.820.99510.9952
 SVM52.2153.6858.780.56120.5890
 LR81.5186.8075.990.81030.8657
 GBT99.4999.2099.820.99510.9965
Note: NON—no dimensionality reduction; PCA—principal component analysis; UMAP—uniform manifold approximation and projection; CNN—convolutional neural networks; RF—Random Forest; SVM—support vector machines; LR—logistic regression; GBT—gradient boosting trees; AUC—area under the curve.
Table 2. Results of multiclass classification models.
Table 2. Results of multiclass classification models.
Dimensionality Reduction MethodModelAccuracy/%Precision/%Recall/%F1-ScoreAUC
NONCNN99.0997.4097.680.97491.0000
 RF99.4599.4799.450.99451.0000
 SVM99.2799.2999.270.99271.0000
 LR98.4498.4898.440.98450.9999
 GBT99.3699.3999.360.99361.0000
PCACNN99.5497.8098.030.97870.9998
 RF99.3699.3999.360.99361.0000
 SVM99.1899.2099.180.99181.0000
 LR98.3598.4298.350.98360.9999
 GBT99.1899.2099.180.99181.0000
UMAPCNN93.8788.5389.480.88670.9965
 RF96.2596.3096.250.96240.9938
 SVM92.5093.2692.500.92530.9957
 LR92.7793.5492.770.92810.9955
 GBT96.0796.1796.070.96070.9967
Note: NON—no dimensionality reduction; PCA—principal component analysis; UMAP—uniform manifold approximation and projection; CNN—convolutional neural networks; RF—Random Forest; SVM—support vector machines; LR—logistic regression; GBT—gradient boosting trees; AUC—area under the curve.
Table 3. Statistics results of machine learning modeling of adulteration levels of CAO (mean ± SD).
Table 3. Statistics results of machine learning modeling of adulteration levels of CAO (mean ± SD).
ModelTraining DatasetsPrediction DatasetsRPD
R C 2 RMSEC (%) R P 2 RMSEP (%)
NON-CNN0.9948 ± 0.00122.1 ± 0.20.9867 ± 0.00123.3 ± 0.28.7 ± 0.4
PCA-CNN0.9958 ± 0.00041.9 ± 0.10.9937 ± 0.00122.3 ± 0.212.7 ± 1.1
UMAP-CNN0.9664 ± 0.00175.3 ± 0.10.9599 ± 0.00225.8 ± 0.25.0 ± 0.1
Note: NON—no dimensionality reduction; PCA—principal component analysis; UMAP—uniform manifold approximation and projection; CNN—convolutional neural networks. R C 2 —coefficient of determination for calibration; RMSEC—root mean square error of calibration; R p 2 —coefficient of determination for prediction; RMSEP—root mean square error of prediction; RPD = SD/RMSEP.
Table 4. Statistical analysis results of PCA-CNN modeling for quantitative detection of adulterants in CAO (mean ± SD).
Table 4. Statistical analysis results of PCA-CNN modeling for quantitative detection of adulterants in CAO (mean ± SD).
TypesTraining DatasetsPrediction DatasetsRPD
R C 2 RMSEC (%) R P 2 RMSEP (%)
COO0.9845 ± 0.00212.2 ± 0.20.9765 ± 0.00612.6 ± 0.36.7 ± 0.8
OLO0.9974 ± 0.00010.9 ± 0.00.9963 ± 0.00021.1 ± 0.016.3 ± 0.4
SOO0.9844 ± 0.00222.2 ± 0.20.9794 ± 0.00562.4 ± 0.37.1 ± 0.9
SUO0.9965 ± 0.00021.0 ± 0.00.9901 ± 0.00621.7 ± 0.511.3 ± 3.1
Note: CAO—camellia oil; COO—corn oil; OLO—olive oil; SOO—soybean oil; SUO—sunflower oil; R C 2 —coefficient of determination for calibration; RMSEC—root mean square error of calibration; R p 2 —coefficient of determination for prediction; RMSEP—root mean square error of prediction; RPD = SD/RMSEP.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, M.; Liu, T.; Liao, H.; Liu, X.-B.; Zou, Q.; Liu, H.-C.; Wang, X.-Y. Adulteration Detection of Multi-Species Vegetable Oils in Camellia Oil Using SICRIT-HRMS and Machine Learning Methods. Foods 2026, 15, 434. https://doi.org/10.3390/foods15030434

AMA Style

Wang M, Liu T, Liao H, Liu X-B, Zou Q, Liu H-C, Wang X-Y. Adulteration Detection of Multi-Species Vegetable Oils in Camellia Oil Using SICRIT-HRMS and Machine Learning Methods. Foods. 2026; 15(3):434. https://doi.org/10.3390/foods15030434

Chicago/Turabian Style

Wang, Mei, Ting Liu, Han Liao, Xian-Biao Liu, Qi Zou, Hao-Cheng Liu, and Xiao-Yin Wang. 2026. "Adulteration Detection of Multi-Species Vegetable Oils in Camellia Oil Using SICRIT-HRMS and Machine Learning Methods" Foods 15, no. 3: 434. https://doi.org/10.3390/foods15030434

APA Style

Wang, M., Liu, T., Liao, H., Liu, X.-B., Zou, Q., Liu, H.-C., & Wang, X.-Y. (2026). Adulteration Detection of Multi-Species Vegetable Oils in Camellia Oil Using SICRIT-HRMS and Machine Learning Methods. Foods, 15(3), 434. https://doi.org/10.3390/foods15030434

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop