Next Article in Journal
Hydroxytyrosol Ameliorates Colon Inflammation: Mechanistic Insights into Anti-Inflammatory Effects, Inhibition of the TLR4/NF-κB Signaling Pathway, Gut Microbiota Modulation, and Liver Protection
Next Article in Special Issue
Fc-Binding Cyclopeptide Induces Allostery from Fc to Fab: Revealed Through in Silico Structural Analysis to Anti-Phenobarbital Antibody
Previous Article in Journal
Vineyard Location Impact on the Composition and Quality of Wines from International and Native Varieties Grown in Drama, Greece
Previous Article in Special Issue
Integrated Gel Electrophoresis and Mass Spectrometry Approach for Detecting and Quantifying Extraneous Milk in Protected Designation of Origin Buffalo Mozzarella Cheese
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Explainable Deep Learning to Predict Kelp Geographical Origin from Volatile Organic Compound Analysis

1
Key Laboratory of Testing and Evaluation for Aquatic Product Safety and Quality, Ministry of Agriculture and Rural Affairs, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao 266071, China
2
State Key Laboratory of Mariculture Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao 266071, China
*
Author to whom correspondence should be addressed.
Foods 2025, 14(7), 1269; https://doi.org/10.3390/foods14071269
Submission received: 11 March 2025 / Revised: 25 March 2025 / Accepted: 3 April 2025 / Published: 4 April 2025

Abstract

:
In addition to its flavor and nutritional value, the origin of kelp has become a crucial factor influencing consumer choices. Nevertheless, research on kelp’s origin traceability by volatile organic compound (VOC) analysis is lacking, and the application of deep learning in this field remains scarce due to its black-box nature. To address this gap, we attempted to identify the origin of kelp by analyzing its VOCs in conjunction with explainable deep learning. In this work, we identified 115 distinct VOCs in kelp samples using gas chromatography coupled with ion mobility spectroscopy (GC-IMS), of which 68 categories were discernible. Consequently, we developed a comprehensible one-dimensional convolutional neural network (1D-CNN) model that incorporated 107 VOCs exhibiting significant regional disparities (p < 0.05). The model successfully discerns the origin of kelp, achieving perfect metrics across accuracy (100%), precision (100%), recall (100%), F1 score (100%), and AUC (1.0). SHapley Additive exPlanations (SHAP) analysis highlighted the impact of features such as 1-Octen-3-ol-M, (+)-limonene, allyl sulfide-D, 1-hydroxy-2-propanone-D, and (E)-2-hexen-1-al-M on the model output. This research provides deeper insights into how critical product features correlate with specific geographic information, which in turn boosts consumer trust and promotes practical utilization in actual settings.

Graphical Abstract

1. Introduction

Seaweed has been consumed for centuries in coastal regions around the world and serves as a staple in the daily diets of many cultures. Approximately 600 species of seaweed are utilized for human consumption, with brown seaweed being the most consumed type [1]. Among brown seaweeds, kelp is the most extensively produced variety and plays a significant role in East Asian diets. Several studies have indicated that seaweeds, including kelp, are highly nutritious foods with elevated levels of proteins, essential amino acids, minerals, fiber, and phenolic compounds, while being low in fat and possessing a favorable Na/K ratio [1,2,3,4,5,6]. However, from an alternative perspective, numerous incidents related to seaweed quality have emerged, including both chemical contamination and food safety risks such as the presence of inorganic arsenic compounds, excessive iodine levels, and harmful pathogens [7,8,9]. Additionally, the occurrence of dyed kelp has also been documented [10]. In recent years, individuals selecting kelp have increasingly prioritized flavor in addition to its nutritional value and potential hazards [11]. These factors are somewhat related to the origin of kelp. In this context, misleading labeling regarding kelp’s origin is unjust to consumers and may pose potential health risks. Additionally, certain products with protected geographical indications (PGIs), such as Rongcheng kelp, are susceptible to counterfeiting due to inadequate geographical origin traceability technology. Therefore, it is imperative to develop effective methods for determining the origin of kelp. This underscores the necessity of enhancing traceability techniques for kelp origin, with an emphasis on ensuring product quality, safety, and the protection of trademarks.
To safeguard the quality and safety of seaweed and maintain consumer confidence, various methods have been implemented to determine its origin. For instance, mineral elemental fingerprinting has been employed to ascertain the origin of red seaweed (Neopyropia yezoensis), green seaweed (Ulva spp.), and brown seaweed (Fucus vesiculosus) [12,13]. Additionally, near-infrared spectroscopy has been employed as an efficient tool for the swift identification of brown seaweed (Sargassum fusiforme) [14]. Stable isotope technology has also been applied to determine the origin of brown seaweed (Undaria pinnatifida) [15]. Nonetheless, there has been no prior documentation on the use of volatile organic compounds (VOCs) for tracing the origins of seaweed, especially kelp. Gas chromatography–ion mobility spectrometry (GC-IMS) is a rapid and widely used technique for analyzing VOCs in various food products. This technology provides several significant advantages, including a straightforward device setup, non-destructive analysis, environmental friendliness, continuous operation, and no need for sample pretreatment [16,17]. Furthermore, GC-IMS has proven effective in identifying the origin and species of various foods, including honey, soy sauce, and rice [17,18]. Accordingly, the analysis of kelp’s VOCs using GC-IMS is essential. It not only meets consumer demand for the flavor of kelp but also addresses the gap in utilizing this technology for tracing the origin of kelp.
As artificial intelligence continues to evolve, machine learning and deep learning techniques have become increasingly prevalent for data processing applications in food chemistry, especially in the field of food authenticity and traceability [19,20]. Convolutional neural networks (CNNs), widely used deep learning algorithms, have achieved significant success compared to traditional machine learning in image recognition tasks and classification problems [21,22]. To date, CNNs have been utilized for the traceability of various food products, demonstrating excellent model performance in items such as sea cucumbers, wolfberries, and semen ziziphi spinosae [22,23,24]. Nevertheless, the black-box nature of deep learning models, which refers to the characteristic that the internal workings or decision-making processes of a model are hidden or difficult for external observers to understand, often restricts their practical implementation in real-world scenarios [25]. The SHapley Additive exPlanations (SHAP) framework, which is a post hoc interpretability technique, has been employed to clarify the outputs generated by machine learning models [26], all while preserving the original performance of the trained models [27]. Additionally, SHAP provides a comprehensive understanding of how feature inputs influence model outputs, thereby enhancing the transparency of black-box models. Nevertheless, studies focusing on explainable machine learning or deep learning within the context of origin traceability remain exceedingly rare to this day [20,25,28].
This study aimed to develop an effective method for the accurate and rapid identification of kelp origins by integrating GC-IMS with explainable deep learning techniques. The primary research activities are outlined as follows: (1) identifying the VOCs present in kelp from different origins; (2) designing deep learning models using a one-dimensional convolutional neural network (1D-CNN) framework and assessing their feasibility; and (3) elucidating the contributions of key VOCs to the model decision-making process through SHAP explainability analysis at both global and local levels. This study seeks to significantly enhance the practical application of origin traceability techniques and predictive modeling by improving the speed and accuracy of kelp origin identification, as well as increasing the transparency and interpretability of the models employed.

2. Materials and Methods

2.1. Sampling

In 2023, it is estimated that China’s kelp production will surpass 1.78 million tons, with Fujian, Liaoning, and Shandong as the leading provinces for kelp aquaculture in the country [29]. These provinces have been designated as key monitoring regions. To ensure the authenticity of the samples, dried kelp (Laminaria japonica) was collected directly from different production enterprises in Dalian City (n = 30) in Liaoning Province, Rongcheng City (n = 30) in Shandong Province, and Xiapu City (n = 30) in Fujian Province in 2024 (Figure 1). For each batch of samples, 1 kg of dried kelp was carefully collected, placed in food-grade plastic bags, and sealed before transportation to the laboratory. All fresh kelp used for dried kelp production was sourced from coastal aquaculture farms near the production enterprises.

2.2. Sample Preparation and GC–IMS Analysis

The dried kelp was finely powdered using a grinding machine and then transferred into plastic bags. Subsequently, it was kept in a desiccator at ambient temperature until analysis. The VOCs in 90 kelp samples were analyzed using a GC-IMS system. This system consisted of a 490 GC unit from Agilent Technologies Inc. (Palo Alto, CA, USA) coupled with an IMS detector from Flavourspec® (G.A.S., Dortmund, Germany). Approximately 1.5 g of the specimen was transferred into a 20 mL headspace vial and maintained at 60 °C with orbital shaking (500 rpm) for 15 min using an automated sampling system (CTC Analytics AG, Zwingen, Switzerland; CTC-PAL 3 model). Following this, a volume of 500 μL of gas was injected using a syringe that was heated to 85 °C. The gas chromatography process utilized an MXT-WAX column (15 m × 0.53 mm × 1.0 μm, Restek, Bellefonte, PA, USA) for separation, operating at a column temperature of 60 °C and with a total run time of 30 min. Nitrogen (N2, purity ≥ 99.999%) served as the carrier gas. Its flow rate was initially set at 2.0 mL/min for the first 2 min, was subsequently ramped up to 10.0 mL/min over the next 8 min, and then was linearly increased to 100 mL/min within 10 min, where it was held constant for an additional 10 min. The IMS conditions were as follows: the drift tube length was 98 mm, the temperature was set at 45 °C, and the drift gas was nitrogen (N2, purity ≥ 99.999%) with a flow rate of 150 mL/min. The ionization source was deuterium, and the ionization mode was positive ion. The retention index (RI) for each VOC was determined using n-ketones (C4–C9) as external standards, chosen for their minimal response in ion mobility spectrometry (IMS). The target VOCs were qualitatively analyzed by referencing the built-in GC RI database (NIST, 2020) and the IMS drift time (Dt) database in VOCal software (0.4.07). The VOC concentrations were quantitatively compared using peak volume signal intensity from the Laboratory Analytical Viewer (LAV) [30]. Meanwhile, fingerprints and differential profiles of volatile molecules in kelp were generated using the Reporter and Gallery plugins in VOCal software [31,32].

2.3. Modeling Procedure

The primary procedure for model building is outlined as follows: Firstly, a dataset was assembled from VOCs exhibiting marked regional discrepancies, organized into a 90-sample matrix. The matrix was structured with 90 rows indicative of discrete instances and 107 columns corresponding to individual attributes. The target matrix, configured as a 90 × 1 array, contained a column titled “origin label”, signifying the cities of Rongcheng, Dalian, and Xiapu, each designated with the unique identifiers 0, 1, and 2.
Secondly, the dataset (n = 90) was initially partitioned into training and testing subsets in a 7:3 ratio before any preprocessing operations were conducted. This approach was specifically adopted to avert data leakage during the model development phase, as suggested by Kapoor and Narayanan [33]. Following this division, the Standard Scaler was chosen for normalization. The fit () function was applied exclusively to the training set to calculate the scaling parameters, specifically the mean (μ) and standard deviation (σ), for each feature. Subsequently, the transform () function was utilized to standardize both the training and testing sets based on these parameters, ensuring that the data were normalized consistently across both subsets. For the training data, each feature value was transformed using the formula: X_train_normalized = (X_train − μ)/σ. For the test data, the same μ and σ values obtained from the training data were applied, with the formula being X_test_normalized = (X_test − μ)/σ. This process mitigates the adverse effects caused by variations in the concentrations of VOCs in kelp [34].
Thirdly, the preprocessed training subset (n = 63) was utilized to train the models, while their performance was assessed using the test subset (n = 27). A range of statistical indices were employed to comprehensively evaluate the models, including accuracy, precision, recall, and F1 score. These metrics are defined as follows:
A c c u r a c y = T P + T N T P + T N + F P + F N
R e c a l l = T P T P + F N  
P r e c i s i o n = T P T P + F P
F 1   s c o r e = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l
In this context, TP refers to true positives, FP indicates false positives, TN signifies true negatives, and FN denotes false negatives. Furthermore, AUC represents the Area Under the Curve, which relates to the receiver operating characteristic (ROC) curve. This curve is created by plotting the FP rate against the TP rate [28]. Additionally, the computational complexity of the proposed model was evaluated based on the number of model parameters and floating-point operations (FLOPs).

2.4. One-Dimensional Convolutional Neural Network (1D-CNN)

One-dimensional convolutional neural networks (1D-CNNs) share similarities with their two-dimensional counterparts (2D-CNNs) in that both utilize convolutional operations to extract features from input data. However, 1D-CNNs are particularly well suited for processing one-dimensional signals, such as spectral data. This advantage is especially pronounced when the available training data are limited or when the application is highly specialized, as 1D-CNNs can effectively capture relevant features and patterns within such data [21,22]. In this research, a 1D-CNN model was formulated to ascertain the geographical origin of kelp based on VOC data. The architecture of the 1D-CNN is depicted in Figure 2. More precisely, the 1D-CNN architecture features two convolutional layers, two max pooling layers, one flattened layer, and two dense layers. The input layer receives processed VOC data in a (107 × 1) format. During the convolution and max pooling operations, each convolutional kernel and max pooling filter retains dimensions of (3 × 1). The model employed a rectified linear unit (ReLU) activation function to introduce nonlinearity, thereby reducing computational complexity and alleviating the “dying ReLU” issue, which can impede the convergence of machine learning models during training [18,35]. The flattening layer converts the extracted and pooled VOC features into an 88-element one-dimensional vector, which is subsequently fed into a fully connected dense layer containing 32 neurons. At the network’s output, a softmax activation function is employed in the fully connected layer to assign sample labels by evaluating the predicted probabilities.
The Adam optimizer, an advanced variant of stochastic gradient descent [36], was utilized to train the 1D-CNN model for minimizing cross-entropy loss. The discrepancies between predicted and actual values were assessed using a categorical cross-entropy loss function [37].

2.5. Shapley Additive Explanations (SHAP)

We utilize a game theory-based SHAP method for the interpretative analysis of our deep learning model [26]. This method enables us to assess the impact of each feature on both the overall model and each predicted class. In this research, DeepExplainer, which computes SHAP values for deep learning models by leveraging the connections between SHAP and the DeepLIFT algorithm, was utilized. The contribution of each variable to the model’s output is evaluated by the SHAP algorithm, utilizing both the model itself and the input dataset, as detailed below [38]:
φ i = S F \ i S ! | F | | S | 1 ! F ! f S i x S i f S ( x S )       
In this scenario, φ i , F, and S denote the contribution of each individual feature, the complete set of features, and the subset of features that excludes the i t h feature, respectively. Subsequently, two models are retrained: f S { i } , with the inclusion of the i t h feature, and f S , with its exclusion. The predictions from these models are compared using the equation f S i ( x S i ) f S ( x S ) , where x s represents the values of the input features within the set S.

2.6. Computing Implementation

All the computational steps outlined above, which cover the preprocessing of raw data, model development, and interpretation, were carried out with Python 3.8 and PyCharm Community Edition 2021.3.1, along with the scikit-learn 0.24.2 (available at GitHub: https://github.com/scikit-learn (accessed on 10 October 2024)), TensorFlow 2.2.0 (available at GitHub: https://github.com/tensorflow/tensorflow (accessed on 11 October 2024)), keras-flops 0.1.0 (available at GitHub: https://github.com/tokusumi/keras-flops (accessed on 24 March 2025)), and SHAP 0.41.0 libraries (available at GitHub: https://github.com/slundberg/shap (accessed on 15 October 2024)).

3. Results and Discussion

3.1. Identification of VOCs in Kelp by GC-IMS

The VOCs present in kelp sourced from three cities were analyzed using GC-IMS. The abscissa represents the relative ion drift time (Dt), whereas retention time is plotted along the ordinate. The crimson reference marker positioned at x = 1.0 delineates reactive ion peak (RIP) baselines, while a chromatic gradient correlates with VOC signal magnitudes through an intensity-dependent colorimetric scale (blue < red progression) [39]. Our analysis demonstrates predominant spectral features are confined to retention time (100–1100 s) and drift time (Dt 1.0–1.8 ms) operational windows, while co-detected analyte signatures exhibit temporal synchronization across both separation dimensions (Figure 3, upper panel). This observation underscores the variations in properties including volatility, polarity, molecular weight, and charge state of the VOCs [40].
To achieve a more comprehensive grasp of the regional disparities in VOCs among kelp, the spectral data of Rocheng kelp were utilized as a benchmark for comparison with other samples’ spectra. When the VOCs in the samples align with those of the reference, the resulting background after subtraction is white; in contrast, red denotes a VOC concentration surpassing the reference level, while blue indicates a reduced concentration, with darker hues signifying a more pronounced difference. Our analysis clearly illustrates that the VOC profiles vary among the three cities (Figure 3, bottom panel).
GC-IMS analysis reveals significant diversity in the VOCs present in kelp, with 115 distinct VOCs identified. Each compound exhibits detection signals across all samples, albeit at varying concentration levels (Figure 4). Among the 115 detected VOCs, 96 were identified. These identified VOCs can be categorized into 68 distinct species, which include 19 aldehydes, 14 ketones, 12 alcohols, 6 esters, 6 acids, 3 furans, 2 pyrazines, 2 ethers, and 4 others (Table 1). However, because of the insufficient data in the database, 19 VOCs were without qualitative results, as shown by their numerical labels. It is worth noting that several VOCs with higher levels, such as 3-methylbutanoic acid, 2,3-dimethyl-5-ethylpyrazine, (E,E)-2,4-heptadienal, 1-octen-3-ol, allyl sulfide, allyl isothiocyanate, and 1-hydroxy-2-propanone, generated monomer and dimer compounds that had similar retention times but different drift times (Figure 4) [41]. The majority of VOCs identified in kelp align with those reported in prior research [39,42,43]. The identified allyl isothiocyanate in kelp is primarily generated from precursor glucosinolates through enzymatic hydrolysis [44]. Aldehydes are significant secondary products of lipid oxidation, originating from the decomposition of hydroperoxides. In addition to this, certain aldehydes can also be generated through the degradation of amino acids induced by the Maillard reaction [31]. Alcohols identified in seaweed are predominantly derived from the peroxidation of unsaturated fatty acids, while ketones are generated through the oxidation or breakdown of both unsaturated fatty acids and amino acids [11]. Esters are synthesized by esterases produced by Monascus spp., facilitating the esterification of acids and alcohols [45].
Flavor, a complex perceptual factor encompassing both odor and taste, is crucial for food acceptance. As a marine-derived food product, kelp exhibits a unique oceanic flavor shaped by its growth environment [39]. Although the VOCs detected in kelp vary across different studies, it is evident that these compounds significantly influence its flavor [43]. The odor descriptors assigned to each VOC detected via GC-IMS in this research are documented within Table 1 and are visually represented as word clouds in Figure 5. The word clouds visualize the frequency of descriptors through increasing size and font weight [46]. Overall, the odor of the detected VOCs in kelp is characterized by a dominant perception of “fruity” and “green”, along with moderate perceptions of “sweet”, “fresh”, and “pungent”. For instance, “fruity” and “green” were supplemented with a variety of descriptors, including those typically associated with acids, aldehydes, alcohols, esters, and ketones (e.g., cheese, banana, citrus, peas), as well as unexpected ones (e.g., foot sweat, sulfury, garlic). Similarly, “sweet”, “fresh”, and “pungent” were augmented by a range of descriptors, including those expected for aldehydes, alcohols, furan, and esters (e.g., citrus, wine, caramel), as well as unexpected ones (e.g., tobacco, acetone, grassy). According to Wei et al. [39], sensory attributes, including “green”, “fatty”, and “cucumber”, are identified as significant contributors to the development of fishy odors in kelp. This diversity and non-uniformity in odor descriptors often provide more references for consumers when selecting different origins of kelp.
Given that Figure 4 fails to clearly depict the variations in VOCs among kelp from different origins, the concentration intensity signals and standard deviations of the VOCs are displayed in Table 1. The one-way analysis of variance indicates that, with the exception of the volatile substances 3-methylbutanoic acid-M, 3-methylbutanoic acid-D, 2-methylbutanoic acid, 2-methylpropanoic acid-D, (E)-2-heptenal-M, 2,3-butanedione, and two unidentified VOCs denoted as numbers 3 and 12, the remaining 107 VOCs exhibit significant regional differences (p < 0.05). Previous research has identified factors including species, collection season, geographical origin, and pretreatment procedures as influencing the VOCs of seaweed samples [47]. Consequently, we hypothesize that the 107 VOCs, which exhibit significant regional differences, may serve as effective markers for identifying the origin of kelp. Chu et al. [25] adopted an inverse methodology, initially modeling the data via a data-driven approach and subsequently examining the learned weights to pinpoint features with substantial significance or potential for classification. Therefore, in the next section, we implemented an explainable, data-driven 1D-CNN model to validate our hypothesis.

3.2. Model Performance of 1D-CNN

The 1D-CNN model was assessed using the test set assigned in the preceding section. The evaluation results revealed that the model achieved perfect performance, with discrimination accuracy, recall, and F1 score all reaching 100%. To further assess the 1D-CNN model’s performance and its ability to distinguish between geographical origins, we documented and visualized the training accuracy, cross-entropy loss, AUC, and confusion matrix for each origin in Figure 6. As depicted in Figure 6a, the training set’s accuracy improved with the increase in epochs, while the cross-entropy loss declined, demonstrating that the 1D-CNN model rapidly converged within 100 epochs. As illustrated in Figure 6b, the confusion matrix reveals that all 27 test set samples were accurately classified. Additionally, the diagnostic ability of the 1D-CNN model was evaluated using the ROC curve presented in Figure 6c, with the AUC for samples from all three cities reaching 1.00. Furthermore, the proposed 1D-CNN model exhibits lower computational complexity, with FLOPs totaling only 16.1 K and model parameters comprising just 3.1 K, making it easy to reproduce elsewhere. Compared to the predictive accuracy of 95.8% achieved by earlier studies using fatty acid fingerprinting technology to trace the origin of kelp, our method achieves an accuracy of 100% [48]. Consequently, the 1D-CNN model developed in this study is highly effective in identifying the origin of kelp. In summary, the model presents remarkable precision, swift convergence, and robust generalization in determining the geographical origins of kelp from major cities in China.

3.3. Interpretation of 1D-CNN Model with SHAP Values

Evaluating the practicality of deep learning models hinges on transparency and interpretability. Yet, models employing complex nonlinear algorithms tend to be convoluted, potentially entailing intricate interactions among numerous factors or features, which complicates users’ grasp of the output rationale. Typically, post hoc explanations are utilized for sophisticated machine learning models [49]. Consequently, SHAP is utilized to discern the core of specific predictions, grasp the interplay between variables and model outcomes, and shed light on particular instances. By evaluating the mean incremental contribution of features through SHAP values, this method deepens our grasp of predictions within specific contexts. It also furnishes important insights into both singular cases and overarching trends at both global and local levels [50]. The utilization of SHAP interpretation has been crucial in building trust within food traceability systems, such as those for lotus and oysters [28,51]. Overall, this method is crucial for evaluating models and enhancing their practicality.

3.3.1. Global Feature Interpretation

The SHAP toolkit was applied to the trained 1D-CNN model, generating a matrix of SHAP values. To pinpoint the most influential features for the 1D-CNN model, we computed the mean absolute SHAP values for each feature and generated a stacked bar plot, with different geographical origins represented by distinct colors (Figure 7a). In this plot, the top 20 features were ranked according to the absolute sum of their impacts on the model. Notably, the features A22 (1-Octen-3-ol-M), A108 ((+)-limonene), A111 (allyl sulfide-D), A36 (1-hydroxy-2-propanone-D), and A46 ((E)-2-hexen-1-al-M) were identified as the top five features influencing the model output. Among these, feature A22 (1-Octen-3-ol-M) exhibited a dominant position, accounting for nearly 50% of the variance. This finding suggests that the concentration of A22 (1-Octen-3-ol-M) is the most crucial factor in the 1D-CNN model’s classification of kelp origin, particularly for Xiapu kelp.
Since SHAP values are derived from an individualized model interpretation approach, the contributions of features for each sample can be obtained from this interpretation. Figure 7 provides detailed plots illustrating the feature contributions to the model output for different origin classifications, with color representing the feature values. In the predictions for Rongcheng samples, A111 (allyl sulfide-D), A108 ((+)-limonene), A3 (1), A22 (1-octen-3-ol-M), and A58 (11) emerged as the top five features with the highest contributions. It is evident that samples with high feature values of A111 (allyl sulfide-D), A3 (1), and A58 (11) positively impacted the model, whereas contributions from features such as A56 (2-octanone) and A27 ((E, E)-2,4-hexadienal-M) were not significant (Figure 7b). For the predictions of Dalian samples, A108 ((+)-limonene), A111 (allyl sulfide-D), A36 (1-hydroxy-2-propanone-D), A3 (1), and A58 (11) again ranked among the top five features with the highest contributions. Here, samples with high feature values of A111 (allyl sulfide-D), A3 (1), and A58 (11) negatively impacted the model, while contributions from other features, such as A18 (2-acetylfuran) and A87 (methyl propanoate), were also minimal (Figure 7c). In the case of predictions for Xiapu samples, the top five features with the highest contributions were A22 (1-octen-3-ol-M), A46 ((E)-2-hexen-1-al-M), A100 (butanol-D), A48 (8), and A27 ((E, E)-2,4-hexadienal-M). Notably, samples with high feature values of A22 (1-octen-3-ol-M), A46 ((E)-2-hexen-1-al-M), and A100 (butanol-D) positively impacted the model, while high feature values of A27 ((E, E)-2,4-hexadienal-M) had a negative impact (Figure 7d). The findings suggest that the discriminative power of the 1D-CNN model for Rongcheng, Dalian, and Xiapu is affected by samples with high or low feature values.

3.3.2. Local Feature Interpretation

Interpretation of individual instances was executed with a SHAP force diagram (Figure 8), which displays samples drawn from the actual estimation outcomes for each city. In the graph, the baseline value signifies the mean model estimation results derived from the training dataset. When the model’s output lies to the right of the baseline value, the sample is categorized as belonging to the selected city; in contrast, if it lies to the left, the sample is not categorized as belonging to that city. This study focuses exclusively on the top five features that significantly influence the model output. The outputs of the model for three instances are shown to be positioned to the right of the base value, implying that these samples pertain to the selected city. However, the significance of each feature varies across different instances. In the case of Rongcheng, the base value is 0.3334, while the selected samples display a relatively higher prediction of 0.52 compared to the actual label of 0. This illustrates how certain values of A30 (allyl isothiocyanate-M) and A113 (1-propanol, 2-methyl) are associated with negative SHAP values, which consequently diminish the likelihood of samples belonging to Rongcheng. In contrast, the values of A3 (1), A34 (6), A108 ((+)-limonene), A47 ((E)-2-hexen-1-al-D), and A33 (5) are linked to positive SHAP values, thus increasing the probability that samples belong to Rongcheng (Figure 8a). For the Dalian cases, the values of A108 ((+)-limonene), A58 (11), A111 (allyl sulfide-D), A3 (1), and A10 (2) are associated with positive SHAP values (Figure 8b), while A87 (methyl propanoate), A38 (1-octanal-M), A37 (2-butanone, 3-hydroxy-M), A88 (acetic acid propyl ester), and A76 (1-hexanal) exert a negative influence on the Dalian cases (Figure 8b). The selected samples have a notably higher predicted value of 0.48 for Xiapu, with A11 (1-butanoic acid), A48 (8), A22 (1-octen-3-ol-M), A12 (propanoic acid-D), and A21 (acetic acid) related to positive SHAP values (Figure 8c). In summary, the tendency observed in the interpretation of local features is in parallel with that of global features.

4. Conclusions

This research revealed that combining GC-IMS with explainable deep learning offers a viable and efficient method for determining the geographical origin of kelp in China. A total of 115 VOCs were identified, with 107 showing marked regional variations. Moreover, the findings suggest that the proposed 1D-CNN model achieves an accuracy rate of 100%, good convergence, and excellent generalization ability when classifying kelp from three different cities. Additionally, the SHAPs for the 1D-CNN model highlighted the contribution of 1-octen-3-ol-M, (+)-limonene, allyl sulfide-D, 1-hydroxy-2-propanone-D, and (E)-2-hexen-1-al-M in the decision-making process of the model. The research findings reveal causal linkages among various features, thereby bolstering user confidence and guaranteeing the dependable practical implementation of the models. This pioneering work goes beyond the conventional development and application of deep learning, significantly promoting interpretability in the context of seaweed origin identification. In the final analysis, this study will aid in combating food fraud and safeguarding food security. Despite the successful accomplishment of the research goals, subsequent studies will also concentrate on more in-depth qualitative and quantitative investigations of unidentified VOCs. Furthermore, other technologies such as Fourier Transform Infrared Spectroscopy (FT-IR) and Ultraviolet–Visible Spectrophotometry (UV-Vis) should be considered in future research for qualitative and quantitative investigations of VOCs.

Author Contributions

Conceptualization, X.K., Y.Z. and Y.G.; methodology, X.K.; software, X.K.; validation, X.K.; formal analysis, X.K.; investigation, X.K., Y.Z. and X.S.; resources, L.Y.; data curation, X.K.; writing—original draft preparation, X.K.; writing—review and editing, X.K. and Y.G.; visualization, X.K.; su-pervision, L.Y. and Z.T.; project administration, X.K. and Y.G.; funding acquisition, X.K., Y.G. and Z.T. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Natural Science Foundation of China (32202156), the Shandong Provincial Natural Science Foundation (ZR2022QC067), the earmarked fund for CARS (CARS-50, CARS-49), and the Central Public-interest Scientific Institution Basal Research Fund, CAFS (2023TD76, 2023TD28).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Salido, M.; Soto, M.; Seoane, S. Seaweed: Nutritional and gastronomic perspective. A review. Algal Res. 2023, 77, 103357. [Google Scholar]
  2. Gómez-Ordóñez, E.; Jiménez-Escrig, A.; Rupérez, P. Dietary fiber and physicochemical properties of several edible seaweeds from the northwestern Spanish coast. Food Res. Int. 2010, 43, 2289–2294. [Google Scholar] [CrossRef]
  3. Syad, A.N.; Shunmugiah, K.P.; Kasi, P.D. Seaweeds as nutritional supplements: Analysis of nutritional profile, physicochemical properties and proximate composition of G. acerosa and S. wightii. Biomed. Prev. Nutr. 2013, 3, 139–144. [Google Scholar] [CrossRef]
  4. Fernández-Segovia, I.; Lerma-Garcia, M.J.; Fuentes, A.; Barat, J.M. Characterization of Spanish powered seaweeds: Composition, antioxidant capacity and technological properties. Food Res. Int. 2018, 111, 212–219. [Google Scholar] [CrossRef]
  5. Wang, Y.X.; Guo, Y.Y.; Li, N.; Wang, L.Z.; Xu, J.C. Effects of processing methods on the nutritional quality of kelp. J. Food Saf. Qual. 2020, 11, 8229–8234. (In Chinese) [Google Scholar]
  6. Matos, J.; Cardoso, C.; Serralheiro, M.L.; Bandarra, N.M.; Afonso, C. Seaweed bioactives potential as nutraceuticals and functional ingredients: A review. J. Food Compos. Anal. 2024, 133, 106453. [Google Scholar]
  7. Aakre, I.; Solli, D.D.; Markhus, M.W.; Mæhre, H.K.; Dahlm, L.; Henjum, S.; Alexander, J.; Korneliussen, P.A.; Madsen, L.; Kjellevold, M. Commercially available kelp and seaweed products-valuable iodine source or risk of excess intake? Food Nutr. Res. 2021, 65, 7584. [Google Scholar] [CrossRef]
  8. Banach, J.L.; Koch, S.J.I.; Hoffmans, Y.; van den Burg, S.W.K. Seaweed value chain stakeholder perspectives for food and environmental safety hazards. Foods 2022, 11, 1514. [Google Scholar] [CrossRef]
  9. Prashant, N.; Sangwan, M.; Singh, P.; Das, P.; Srivastava, U.; Bast, F. Ati-nutritional factors and heavy metals in edible seaweeds: Challenges, health implications, and strategies for safer consumption. J. Food Compos. Anal. 2025, 140, 107283. [Google Scholar]
  10. Lin, H.; Zhou, Q.; Mu, Q.G.; Fu, X.T. Quality safety issues and countermeasures of Chinese seaweed food. J. Food Sci. Technol. 2015, 33, 8–11. [Google Scholar]
  11. Nie, J.L.; Fu, X.T.; Wang, L.; Xu, J.C.; Gao, X. Impact of Monascus purpureus fermentation on antioxidant activity, free amino acid profiles and flavor properties of kelp (Saccharina japonica). Food Chem. 2023, 400, 133990. [Google Scholar] [CrossRef] [PubMed]
  12. Duarte, B.; Mamede, R.; Caçador, I.; Melo, R.; Fonseca, V.F. Trust your seaweeds: Fine-scale multi-elemental traceability of edible seaweed species harvested within an estuarine system. Algal Res. 2023, 70, 102975. [Google Scholar] [CrossRef]
  13. Toyoda, K.; Wu, H.; Aktar, Z. Europium anomaly as a robust geogenic fingerprint for the geographical origin of aquatic products: A case study of nori (Neopyropia yezoensis) from the Japanese market. Food Chem. 2025, 464, 141719. [Google Scholar] [CrossRef] [PubMed]
  14. Yang, Y.; Yang, L.C.; He, S.Y.; Cao, X.Q.; Huang, J.M.; Ji, X.L.; Tong, H.B.; Zhang, X.; Wu, M.J. Use of near-infrared spectroscopy and chemometrics for fast discrimination of Sargassum fusiforme. J. Food Compos. Anal. 2022, 110, 104537. [Google Scholar] [CrossRef]
  15. Suzuki, Y.; Kokubun, A.; Edura, T.; Nakayama, K. Tracing the Geographical Origin of Blanched and Salted Wakame (Undaria Pinnatifida) from Japan (Naruto and Sanriku), China, and South Korea, Based on Stable Carbon, Nitrogen, and Oxygen Isotopic Composition. Nippon Shokuhin Kagaku Kogaku Kaishi 2013, 60, 1–10. (In Japanese) [Google Scholar] [CrossRef]
  16. Yao, L.Y.; Liang, Y.; Sun, M.; Song, S.Q.; Wang, H.T.; Dong, Z.B.; Feng, T.; Yue, H. Characteristic volatile fingerprints of three edible marine green algae (Ulva spp.) in China by HS-GC-IMS and evaluation of the antioxidant bioactivities. Food Res. Int. 2024, 162, 112109. [Google Scholar] [CrossRef]
  17. Zhao, Z.Y.; Lian, F.Y.; Jiang, Y.J. Recognition of rice species based on gas chromatography-ion mobility spectrometry and deep learning. Agriculture 2024, 14, 1552. [Google Scholar] [CrossRef]
  18. Feng, Y.H.; Wang, Y.; Beykal, B.; Qiao, M.Y.; Xiao, Z.L.; Luo, Y.C. A mechanistic review on machine learning-supported detection and analysis of volatile organic compounds for food quality and safety. Trends Food Sci. Technol. 2024, 143, 104297. [Google Scholar] [CrossRef]
  19. Tseng, Y.J.; Chuang, P.J.; Appell, M. When machine learning and deep learning come to the big data in food chemistry. ACS Omega 2023, 8, 15854–15864. [Google Scholar] [CrossRef]
  20. Deng, Z.W.; Wang, T.; Zheng, Y.; Zhang, W.L.; Yun, Y.H. Deep learning in food authenticity: Recent advances and future trends. Trends Food Sci. Technol. 2024, 144, 104344. [Google Scholar] [CrossRef]
  21. Kiranyaz, S.; Avci, O.; Abdeljaber, O.; Ince, T.; Gabbouj, M.; Inman, D.J. 1D convolutional neural networks and applications: A survey. Mech. Syst. Signal Process. 2021, 151, 107398. [Google Scholar]
  22. Sun, Y.; Liu, N.; Zhao, L.; Liu, Q.; Wang, S.S.; Sun, G.H.; Zhao, Y.F.; Zhou, D.Q.; Cao, R. Attenuated total reflectance-flourier transformed infrared spectroscopy (ATR-FTIR) coupled with deep learning: A rapid method for geographical origin identification of sea cucumber Apostichopus japonicas. Microchem. J. 2024, 204, 111037. [Google Scholar]
  23. Jiang, X.N.; Liu, Q.C.; Yan, L.; Cao, X.D.; Chen, Y.; Wei, Y.Q.; Wang, F.; Xiong, H. Hyperspectral imaging combined with spectral-imagery feature fusion convolutional neural network to discriminate different geographical origins of wolfberries. J. Food Compos. Anal. 2024, 132, 106259. [Google Scholar]
  24. Zhao, X.; Liu, X.; Xie, P.X.; Ma, J.Y.; Shi, Y.N.; Jiang, H.Z.; Zhao, Z.L.; Wang, X.Y.; Li, C.H.; Yang, J. Identification of geographical origin of semen ziziphi spinosae based on hyperspectral imaging combined with convolutional neural networks. Infrared Phys. Technol. 2024, 136, 104982. [Google Scholar] [CrossRef]
  25. Chu, Y.H.; Wu, J.J.; Yan, Z.; Zhao, Z.Z.; Xu, D.M.; Wu, H. Towards generalizable food source identification: An explainable deep learning approach to rice authentication employing stable isotope and elemental marker analysis. Food Res. Int. 2024, 179, 113967. [Google Scholar] [CrossRef]
  26. Lundberg, S.M.; Lee, S.L. A unified approach to interpreting model predictions. In Proceedings of the 31st Conference on Neural Information Processing Systems NIPS, Long Beach, CA, USA; 2017. [Google Scholar]
  27. Tan, A.D.; Zhou, F.T.; Chen, H. Post-hoc part-prototype networks. In Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria; 2024; Volume 235. [Google Scholar]
  28. Kang, X.M.; Zhao, Y.F.; Yao, L.; Tan, Z.J. Explainable machine learning for predicting the geographical origin of Chinese Oysters via mineral elements analysis. Curr. Res. Food Sci. 2024, 8, 100738. [Google Scholar]
  29. Ministry of Agriculture of China (MOAC). China Fisheries Yearbook; China Agriculture Publisher: Beijing, China, 2024. [Google Scholar]
  30. Wei, S.B.; Wu, Q.; Wang, Z.M.; Yu, X.L.; Jiao, J.; Dong, X.P. Determination of key volatile fishy substances of sea cucumber powder during the processing and their removal by supercritical fluid extraction. Food Res. Int. 2024, 190, 114603. [Google Scholar] [CrossRef]
  31. Li, W.Q.; Chen, Y.P.; Blank, I.; Li, F.Y.; Li, C.B.; Liu, Y. GC × GC-ToF-MS and GC-IMS based volatile profile characterization of the Chinese dry cured hams from different regions. Food Res. Int. 2021, 142, 110222. [Google Scholar]
  32. Yao, L.Y.; Sun, J.Y.; Liang, Y.; Feng, T.; Wang, H.T.; Sun, M.; Yu, W.G. Volatile fingerprints of Torreya grandis hydrosols under different downstream processes using HS-GC–IMS and the enhanced stability and bioactivity of hydrosols by high pressure homogenization. Food Control 2022, 139, 109058. [Google Scholar]
  33. Kapoor, S.; Narayanan, A. Leakage and the reproducibility crisis in machine learning-based science. Patterns 2023, 4, 100804. [Google Scholar] [CrossRef]
  34. Singh, D.; Singh, B. Feature wise normalization: An effective way of normalizing data. Pattern Recognit. 2022, 122, 108307. [Google Scholar]
  35. Li, Z.K.; Lan, Y.F.; Lin, W.W. Footbridge damage detection using smartphone-recorded responses of micromobility and convolutional neural networks. Autom. Constr. 2024, 166, 105587. [Google Scholar] [CrossRef]
  36. Lauritsen, S.M.; Kristensen, M.; Olsen, M.V.; Larsen, M.S.; Lauritsen, K.M.; Jørgensen, M.J.; Lange, J.; Thiesson, B. Explainable artificial intelligence model to predict acute critical illness from electronic health records. Nat. Commun. 2020, 11, 3852. [Google Scholar] [PubMed]
  37. Pradhan, B.; Lee, S.; Dikshit, A.; Kim, H. Spatial flood susceptibility mapping using an explainable artificial intelligence (XAI) model. Geosci. Front. 2023, 14, 101625. [Google Scholar]
  38. Lundberg, S.M.; Nair, B.; Vavilala, M.S.; Horibe, M.; Eisses, M.J.; Adams, T.; Liston, D.E.; Low, D.K.W.; Newman, S.F.; Kim, J.; et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2018, 2, 749–760. [Google Scholar]
  39. Wei, R.; Jiang, B.; Chen, J.J.; Xiang, L.B.; Liu, X.Y. Removal of fishy flavor in kelp (Laminaria japonica) by natural antioxidant soaking combined with microbial fermentation. Food Biosci. 2024, 60, 104212. [Google Scholar]
  40. Jiang, H.J.; Dai, A.N.; Yan, L.Q.; Zhang, Z.C.; Ding, B.; Bai, J.L.; Yang, J.T.; Gao, D.D.; Liu, H.N. Analysis of volatile organic compounds (VOCs) in yak ghee from different pastoral areas of China based on GC-IMS. Int. Dairy J. 2025, 160, 106098. [Google Scholar]
  41. Jiang, S.; Jiang, P.F.; Feng, D.D.; Jin, M.R.; Qi, H. Characterization of flavor substances in cooking and seasoned cooking brown seaweeds by GC-IMS and E-nose. Food Chem. X 2024, 22, 101325. [Google Scholar]
  42. Zhu, W.Y.; Jiang, B.; Zhong, F.; Chen, J.J.; Zhang, T. Effect of microbial fermentation on the fishy-odor compounds in kelp (Laminaria japonica). Foods 2021, 10, 2532. [Google Scholar] [CrossRef]
  43. Li, S.; Hu, M.J.; Tong, Y.P.; Xia, Z.Y.; Tong, Y.C.; Sun, Y.Q.; Cao, J.X.; Zhang, J.H.; Liu, J.L.; Zhao, S.; et al. A review of volatile compounds in edible macroalgae. Food Res. Int. 2023, 165, 112559. [Google Scholar]
  44. Seo, Y.S.; Bae, H.N.; Eom, S.H.; Lim, K.S.; Yun, I.H.; Chuang, Y.H.; Jeon, J.M.; Kim, H.W.; Lee, M.S.; Lee, Y.B.; et al. Removal of off-flavors from sea tangle (Laminaria japonica) extract by fermentation with Aspergillus oryzae. Bioresour. Technol. 2012, 121, 475–479. [Google Scholar] [CrossRef] [PubMed]
  45. Srianta, I.; Ristiarini, S.; Nugerahani, I.; Sen, S.K.; Zhang, B.B.; Xu, G.R.; Blanc, P.J. Recent research and development of Monascus fermentation products. Int. Food Res. J. 2014, 21, 1–12. [Google Scholar]
  46. Wang, C.M.; Yu, J.W.; Gallagher, D.L.; Byrd, J.; Yao, W.C.; Wang, Q.; Guo, Q.Y.; Dietrich, A.M.; Yang, M. Pyrazines: A diverse class of earthy-musty odorants impacting drinking water quality and consumer satisfaction. Water Res. 2020, 182, 115971. [Google Scholar] [CrossRef]
  47. Mirzayeva, A.; Castro, R.; Barroso, C.G.; Durán-Guerrero, E. Characterization and differentiation of seaweeds on the basis of their volatile composition. Food Chem. 2021, 336, 127725. [Google Scholar] [CrossRef]
  48. Zhang, H.W.; Li, Y.J.; Cao, W.; Li, J.W.; Zhou, X.J.; Qiu, Z.M.; Chen, T.T.; Xu, Y.J.; Sun, L.Q.; Cui, Y.M. Study on the origin traceability of Laminaria japonica based on fatty acid fingerprints. Lab. Test. 2023, 1, 11–20. (In Chinese) [Google Scholar]
  49. Ekanayake, I.U.; Meddage, D.P.P.; Rathnayake, U. A novel approach to explain the black-box nature of machine learning in compressive strength predictions of concrete using Shapley additive explanations (SHAP). Case Stud. Constr. Mater. 2022, 16, e01059. [Google Scholar] [CrossRef]
  50. Lundberg, S.M.; Erion, G.; Chen, H.; Degrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
  51. Huang, J.X.; Li, Z.; Zhang, W.; Lv, Z.Y.; Dong, S.Y.; Feng, Y.; Liu, R.X.; Zhao, Y. Explainable machine learning-assisted origin identification: Chemical profiling of five lotus (Nelumbo nucifera Gaertn.) parts. Food Chem. 2023, 404, 134517. [Google Scholar] [CrossRef]
Figure 1. Spatial distribution of kelp samples collected from three provinces in China.
Figure 1. Spatial distribution of kelp samples collected from three provinces in China.
Foods 14 01269 g001
Figure 2. The architecture of the proposed 1D-CNN model for identification of kelp.
Figure 2. The architecture of the proposed 1D-CNN model for identification of kelp.
Foods 14 01269 g002
Figure 3. Gas chromatography–ion mobility spectrometry spectra of VOCs. The left panel: Rongcheng kelp; middle panel: Dalian kelp; right panel: Xaipu kelp. Note: The background of the figure is blue, and the color indicates the peak intensity of the VOCs, with deeper colors from blue to red indicating higher peak intensity. A comparison is given with reference sample spectra (Rongcheng kelp), with red or blue indicating VOC concentrations higher or lower than the reference, while white indicates the same concentration as the reference.
Figure 3. Gas chromatography–ion mobility spectrometry spectra of VOCs. The left panel: Rongcheng kelp; middle panel: Dalian kelp; right panel: Xaipu kelp. Note: The background of the figure is blue, and the color indicates the peak intensity of the VOCs, with deeper colors from blue to red indicating higher peak intensity. A comparison is given with reference sample spectra (Rongcheng kelp), with red or blue indicating VOC concentrations higher or lower than the reference, while white indicates the same concentration as the reference.
Foods 14 01269 g003
Figure 4. Gallery plot of VOCs in kelp from different origins. M: monomer; D: dimer. Each row represents one sample and each column represents a VOC. The upper 30 samples are from Rongcheng, the middle 30 samples are from Dalian, and the bottom 30 samples are from Xiapu.
Figure 4. Gallery plot of VOCs in kelp from different origins. M: monomer; D: dimer. Each row represents one sample and each column represents a VOC. The upper 30 samples are from Rongcheng, the middle 30 samples are from Dalian, and the bottom 30 samples are from Xiapu.
Foods 14 01269 g004
Figure 5. Word clouds of odor descriptors for VOCs in kelp.
Figure 5. Word clouds of odor descriptors for VOCs in kelp.
Foods 14 01269 g005
Figure 6. Performance of 1D-CNN model: (a) accuracy and cross-entropy loss curves during training; (b) receiver operating characteristic (ROC) curves of the model for each class; (c) confusion matrix of test set for each class.
Figure 6. Performance of 1D-CNN model: (a) accuracy and cross-entropy loss curves during training; (b) receiver operating characteristic (ROC) curves of the model for each class; (c) confusion matrix of test set for each class.
Foods 14 01269 g006
Figure 7. Global interpretation of the 1D-CNN model based on SHAP values: (a) the feature importance for the 1D-CNN model; (bd) the feature contributions for predicting samples from Rongcheng, Dalian, and Xiapu, respectively. The color bar, ranging from blue to red, indicates the magnitude of feature values from low to high. Additionally, the position of the points on the horizontal axis denotes the positive or negative association between the features and target variables (refer to Table 1 for feature names).
Figure 7. Global interpretation of the 1D-CNN model based on SHAP values: (a) the feature importance for the 1D-CNN model; (bd) the feature contributions for predicting samples from Rongcheng, Dalian, and Xiapu, respectively. The color bar, ranging from blue to red, indicates the magnitude of feature values from low to high. Additionally, the position of the points on the horizontal axis denotes the positive or negative association between the features and target variables (refer to Table 1 for feature names).
Foods 14 01269 g007
Figure 8. Local interpretation based on SHAP values for instances selected from (a) Rongcheng, (b) Dalian, and (c) Xiapu. Red feature attributions push the prediction higher than the base value (i.e., the mean model prediction over the training dataset), while blue feature attributions push the prediction lower. The size of the bars depicts importance (refer to Table 1 for feature names).
Figure 8. Local interpretation based on SHAP values for instances selected from (a) Rongcheng, (b) Dalian, and (c) Xiapu. Red feature attributions push the prediction higher than the base value (i.e., the mean model prediction over the training dataset), while blue feature attributions push the prediction lower. The size of the bars depicts importance (refer to Table 1 for feature names).
Foods 14 01269 g008
Table 1. GC-IMS integration parameters of volatile compounds in kelp from different cities.
Table 1. GC-IMS integration parameters of volatile compounds in kelp from different cities.
FeaturesCompoundRIRt [sec]Dt [a.u.]Signal Intensitiesp ValueOdor Description
RongchengDalianXiapu
Acids
A63-Methyl butanoic acid-M1704.91517.8611.234031755.2 ± 3254.1 a707.6 ± 102.0 a1758.4 ± 936.9 a0.062sour, foot sweat, cheese
A73-Methyl butanoic acid-D1705.21519.0951.497861254.2 ± 1993.5 a1088.8 ± 191.3 a1232.0 ± 210.2 a0.837
A52-Methylbutanoic acid1704.21515.6321.20697917.1 ± 1550.6 a470.9 ± 96.8 a880.5 ± 445.2 a0.127pungent and spicy cheese, fruity
A82-Methyl propanoic acid-M1578.11136.9131.159791705.9 ± 2698.7 ab864.3 ± 177.4 b2534.7 ± 1173.2 a0.001yogurt, rancid cream
A92-Methyl propanoic acid-D15791139.1171.383451026.4 ± 1904.3 a615.5 ± 110.0 a815.3 ± 369.7 a0.370
A21Acetic acid1470.1888.6621.054095158.9 ± 378.3 b4404.3 ± 402.2 c8421.3 ± 728.6 a<0.001spicy
A2Propanoic acid-M1548.11061.7271.11323851.4 ± 1081.0 a1395.8 ± 210.0 b3565.2 ± 1331.5 a<0.001yogurt, vinegar
A12Propanoic acid-D1548.11061.7271.27443698.1 ± 401.3 a248.5 ± 36.8 b548.7 ± 465.0 a<0.001
A111-Butanoic acid1647.11330.541.17401442.5 ± 247.0 b390.6 ± 71.5 b739.5 ± 483.0 a<0.001strong acetic acid, cheese, butter, fruity
Pyrazines
A162,3-Dimethyl-5-ethylpyrazine-D1507.9968.6641.754523989.6 ± 1749.1 a1442.4 ± 280.9 b994.2 ± 421.7 b<0.001burnt popcorn, roasted cocoa
A152,3-Dimethyl-5-ethylpyrazine-M1510.5974.3781.236815998.3 ± 825.3 a4251.7 ± 379.3 b3192.7 ± 922.7 c<0.001
A232,3,5,6-Tetramethylpyrazine1460.7869.9861.20921148.4 ± 203.0 a1002.0 ± 167.5 b374.9 ± 122.7 c<0.001beef, fermented soy
Aldehydes
A19(E, E)-2,4-Heptadienal-M1480.8910.7031.202774181.0 ± 489.5 a3231.2 ± 228.4 b1916.9 ± 776.0 c<0.001fatty, oily, aldehyde, vegetable, cinnamon
A20(E, E)-2,4-Heptadienal-D1480.4909.8871.630913028.0 ± 1261.9 a1200.4 ± 236.6 b674.4 ± 375.4 c<0.001
A17Benzaldehyde-M1500.4952.3371.159781759.6 ± 206.0 a1555.0 ± 118.3 b1395.3 ± 236.2 c<0.001bitter almond, cherry, nutty
A55Benzaldehyde-D1498.5948.2551.47865503.9 ± 184.6 a306.3 ± 35.2 b276.6 ± 86.8 b<0.001
A24(E)-2-Octenal-M1427.8807.0271.337132655.6 ± 374.3 a2123.9 ± 236.8 b1372.5 ± 726.6 c<0.001fresh cucumber, fatty, green herbal, banana, green leaf
A25(E)-2-Octenal-D1427.4806.2111.829761602.5 ± 668.1 a845.8 ± 247.3 b526.1 ± 459.1 c<0.001
A281-Nonanal-M1397752.3321.484021931.3 ± 237.0 a1701.6 ± 271.8 b1782.3 ± 424.6 ab0.023rose, citrus, strong oily
A291-Nonanal-D1398.9755.5971.95516741.1 ± 298.0 a454.5 ± 114.7 b463.1 ± 242.6 b<0.001
A145-Methyl furfural1557.91085.7911.13658206.6 ± 54.4 b170.7 ± 26.8 b306.7 ± 177.0 a<0.001spices, caramel wood
A27(E, E)-2,4-Hexadienal-M1405.7767.4431.120771291.0 ± 218.0 a902.4 ± 115.6 b498.5 ± 199.8 c<0.001sweet, green, floral, citrus
A57(E, E)-2,4-Hexadienal-D1406.5768.7331.45577595.3 ± 302.4 a216.9 ± 61.0 b113.6 ± 70.8 c<0.001
A63(E)-2-Heptenal-M1336.2654.9591.25583744.0 ± 124.3 a768.4 ± 140.0 a720.2 ± 177.3 a0.458spicy, green vegetables, fresh, fatty
A49(E)-2-Heptenal-D1336.2654.9591.676233101.0 ± 865.8 a1952.0 ± 390.4 b1001.2 ± 761.1 c<0.001
A72Heptaldehyde-M1191.2419.5991.340791518.4 ± 210.9 b1708.4 ± 137.0 a1559.6 ± 153.1 b<0.001fresh, aldehyde, fatty, green herbs, wine, fruity
A73Heptaldehyde-D1191.2419.5991.701792101.4 ± 269.1 a1725.6 ± 187.8 b1276.9 ± 539.7 c<0.001
A761-Hexanal1094.8297.5521.568919942.1 ± 532.5 a10087.0 ± 215.3 a6551.3 ± 2462.1 b<0.001fresh, green, fat, fruity
A82n-Pentanal994.8219.9571.429163180.9 ± 171.2 a3016.9 ± 121.4 a2361.7 ± 528.9 b<0.001green grassy, faint banana, pungent
A1013-Methyl butanal925.3183.1391.408763.9 ± 24.0 b71.6 ± 12.0 b1043.7 ± 991.7 a<0.001chocolate, fat
A97(E)-2-Methyl-2-butenal-M1108.9312.5961.09283581.3 ± 103.0 c678.0 ± 59.0 b919.1 ± 177.0 a<0.001
A92(E)-2-Methyl-2-butenal-D1107.8311.4091.349623436.3 ± 1152.5 a2576.4 ± 718.5 b1381.0 ± 511.7 c<0.001
A110(E)-2-Pentenal-M1143.2353.3731.1042243.0 ± 58.7 c433.2 ± 57.0 b575.4 ± 151.6 a<0.001potato, peas
A112(E)-2-Pentenal-D1142.2352.1861.366662885.9 ± 284.6 a2944.2 ± 215.2 a2250.4 ± 557.5 b<0.001
A46(E)-2-Hexen-1-al-M1224.9470.3051.181852112.4 ± 282.8 c2444.0 ± 112.4 b2813.3 ± 334.5 a<0.001green, banana, fat
A47(E)-2-Hexen-1-al-D1226.6473.1631.525715560.2 ± 1109.4 a3992.7 ± 766.5 b4026.1 ± 1971.6 b<0.001
A703-Methyl-2-butenal-M1207.6443.6241.09281688.1 ± 123.6 a536.8 ± 61.3 c604.1 ± 124.9 b<0.001fruity
A713-Methyl-2-butenal-D1208.5445.0531.36405628.5 ± 303.6 a250.7 ± 53.7 b253.5 ± 204.1 b<0.001
A91Propanal800.4131.7261.14565194.1 ± 136.1 a5008.8 ± 217.8 a4134.7 ± 801.0 b<0.001pungent, green grassy
A79(E)-2-Butenal1057.6265.8531.202284939.1 ± 931.4 a3604.7 ± 280.9 b2657.7 ± 818.8 c<0.001
A381-Octanal-M1295.2596.5091.409751473.4 ± 229.0 a1288.3 ± 199.1 b899.4 ± 278.4 c<0.001aldehyde, waxy, citrus, orange, fruity, fatty
A391-Octanal-D1297.6599.7631.82402697.4 ± 316.2 a384.0 ± 118.7 b289.6 ± 167.4 b<0.001
A932-Butenal, 2-methyl-D1120.2325.5421.36981038.4 ± 475.9 a611.4 ± 132.7 b184.7 ± 163.3 c<0.001
A962-Butenal, 2-methyl-M1122.3327.8971.11449463.3 ± 100.0 a514.6 ± 112.2 a275.9 ± 94.9 b<0.001
Alcohols
A221-Octen-3-ol-M1457862.5381.161574006.9 ± 129.7 b4000.7 ± 264.4 b4565.8 ± 390.3 a<0.001mushroom, lavender, rose, hay
A591-Octen-3-ol-D1457862.5381.605831842.4 ± 383.1 a1265.9 ± 231.0 b1257.5 ± 474.7 b<0.001
A4Linalool1549.81065.8091.227851905.6 ± 582.5 a1153.8 ± 176.9 b1182.1 ± 286.8 b<0.001citrus, rose, woody, blueberry
A801-Propanol1046.1256.7751.258721757.2 ± 330.5 a1844.2 ± 187.8 a1105.3 ± 409.1 b<0.001alcohol, pungent
A1071-Penten-3-ol1166.9384.6490.939442660.7 ± 253.7 c3093.6 ± 213.5 b3302.9 ± 204.0 a<0.001ethereal, green, tropical fruity
A421-Pentanol-M1259527.9551.255832333.4 ± 400.3 c2918.4 ± 261.3 b3308.2 ± 547.3 a<0.001balsamic
A431-Pentanol-D1258.3526.5261.52164883.0 ± 321.2 a4574.2 ± 552.3 ab4276.7 ± 860.1 b0.001
A85Ethanol940.9190.7991.136953377.6 ± 347.3 b3790.7 ± 595.2 a2907.5 ± 625.4 c<0.001aromaticity
A1131-Propanol, 2-methyl1102.8305.8061.1734643.4 ± 6.17 b58.3 ± 17.1 b195.2 ± 114.6 a<0.001fresh, alcoholic, leather
A75Butanol-M1153.4366.4951.18517626.2 ± 103.7 c936.1 ± 132.4 b1324.6 ± 202.1 a<0.001wine
A100Butanol-D1153366.0581.39075431.5 ± 84.6 c584.5 ± 108.4 b690.2 ± 101.0 a<0.001
A602-Furanmethanethiol-M1436.6823.4581.10216340.1 ± 41.1 c583.2 ± 129.5 b1293.7 ± 510.0 a<0.001sulfury, coffee, fat, smoky
A692-Furanmethanethiol-D1436.1822.4671.35824111.4 ± 14.5 b126.0 ± 16.2 b346.4 ± 244.0 a<0.001
A321-Hexanol1369.8707.1321.32947613.9 ± 162.7 a466.2 ± 80.7 b548.7 ± 138.6 a<0.001fresh, fruity, wine, sweet, green
A413-Methyl-3-buten-1-ol1267.3542.8251.17532945.4 ± 190.5 a847.2 ± 198.1 a624.4 ± 212.7 b<0.001sweet, fruity
A651-Butanol, 3-methyl1213.6452.7481.24885379.4 ± 116.5 b373.0 ± 79.8 b606.4 ± 163.8 a<0.001whiskey, banana, fruity
Ether
A532-Furfurylmethylsulfide1529.91018.4611.14007669.3 ± 369.7 a374.0 ± 109.5 b256.2 ± 74.5 c<0.001pungent, onion, garlic
A94Allyl sulfide-M1135.3343.5221.12292534.2 ± 102.8 c657.0 ± 142.8 b727.3 ± 155.4 a<0.001garlic
A111Allyl sulfide-D1136.9345.4751.324333624.2 ± 608.1 a2616.3 ± 410.0 b1835.6 ± 330.6 c<0.001
Furan
A182-Acetylfuran1489.8929.4791.11321897.4 ± 335.0 a1626.7 ± 113.4 b1778.7 ± 631.9 a0.047fatty, sweet, caramel, nutty, tobacco
A132-Acetyl-5-methylfuran1623.41260.6671.16629700.2 ± 146.5 a553.7 ± 76.2 b740.9 ± 200.1 a<0.001nut
A452-Pentyl furan1235.8488.0561.2542734.8 ± 552.9 a2467.0 ± 627.9 ab2174.8 ± 1138.2 b0.033bean, fruity, earthy, green, vegetable
Esters
A30Allyl isothiocyanate-M1382.5727.8411.097082224.6 ± 509.5 a1789.8 ± 317.3 b1168.6 ± 388.0 c<0.001sulfur, pungent, garlic
A31Allyl isothiocyanate-D1382727.0251.36758802.9 ± 394.7 a353.6 ± 186.4 b201.9 ± 196.2 c<0.001
A78Acetic acid butyl ester-M1080.8285.2591.239751187.5 ± 136.8 a1062.5 ± 112.8 b629.4 ± 347.9 c<0.001fruity
A77Acetic acid butyl ester-D1080.2284.6881.62402804.8 ± 349.3 a460.0 ± 148.3 b141.8 ± 111.0 c<0.001
A104Butyl propanoate1148.6360.3671.7279397.2 ± 249.3 a240.8 ± 71.5 b223.0 ± 143.2 b<0.001earthy, sweet rose
A88Acetic acid propyl ester958.3199.7731.47595623.0 ± 170.8 a351.2 ± 61.7 b182.8 ± 60.9 c<0.001fruity, pear
A87Methyl propanoate922.7181.8731.32964642.4 ± 165.9 b636.7 ± 55.9 b1541.0 ± 505.2 a<0.001fruit, rum
A105Butyl 2-propenoate1180.4403.7351.69254421.9 ± 236.4 a190.8 ± 53.8 b162.1 ± 94.7 b<0.001pungent, fruity
Ketones
A26D-Fenchone1410.5775.8271.297621482.5 ± 200.8 a1101.7 ± 159.2 b876.2 ± 228.1 c<0.001
A351-Hydroxy-2-propanone-M1314.8623.6951.068822376.7 ± 500.7 b2584.8 ± 277.9 b3100.3 ± 547.8 a<0.001pungent, caramel, fresh
A361-Hydroxy-2-propanone-D1315.2624.3211.234082912.7 ± 1080.5 b3798.6 ± 2480.9 a2493.5 ± 479.2 b0.007
A372-Butanone, 3-hydroxy-M1292.5591.1811.070271147.4 ± 267.3 b1091.4 ± 255.2 b1898.0 ± 402.0 a<0.001butter, cream
A522-Butanone, 3-hydroxy-D1292.8591.8061.33411867.2 ± 137.5 b693.9 ± 51.9 b1209.0 ± 612.9 a<0.001
A742-Heptanone-D1187.5414.1831.633881386.2 ± 415.6 a1069.9 ± 337.6 b917.3 ± 553.1 b<0.001pear, banana, fruity, slight medicinal fragrance
A992-Heptanone-M1187413.411.262161115.5 ± 113.4 ab1176.8 ± 68.1 a1083.2 ± 203.7 b0.036
A811-Penten-3-one1034.7248.0661.312123760.1 ± 1135.9 a2606.2 ± 339.6 b1217.0 ± 861.9 c<0.001strong pungent odors
A832-Pentanone989.3216.791.36783361.5 ± 783.3 a2641.8 ± 589.5 b1946.6 ± 995.9 c<0.001acetone, fresh, sweet fruity, wine
A953-Penten-2-one-M1135.9344.2681.07465160.7 ± 54.4 b268.5 ± 47.8 a263.6 ± 39.5 a<0.001fruity, turns into spicy during storage
A983-Penten-2-one-D1138.4347.4351.346215146.3 ± 484.6 a4336.9 ± 242.5 b2673.7 ± 890.8 c<0.001
A902-Propanone831.1142.8551.1177411380.0 ± 788.2 a11109.0 ± 518.0 a10040.0 ± 1425.8 b<0.001acetone, fresh, sweet fruity, wine
A862-Butanone912.6177.1011.248395011.7 ± 897.3 a4299.3 ± 831.5 b3314.8 ± 1568.1 c<0.001fruity, camphor
A501-Octen-3-one1313.6621.991.687551028.1 ± 297.8 a676.2 ± 209.9 b381.8 ± 316.0 c<0.001strong earthy, mushroom, vegetable, fishy, chicken
A562-Octanone1289.3584.7231.76483257.4 ± 113.9 a165.5 ± 56.6 b139.2 ± 48.2 b<0.001moldy, ketone, milk, cheese, mushroom
A642-Hydroxy-2-methyl-4-pentanone1369705.7861.13888231.7 ± 31.4 b187.1 ± 26.1 b354.4 ± 155.3 a<0.001mild, pleasant
A1034-Methyl-2-pentanone1013.6232.7461.17473501.6 ± 64.0 b535.1 ± 96.0 b781.5 ± 242.5 a<0.001ketone
A1152,3 Butanedione980.5211.7861.17268325.6 ± 67.6 a367.8 ± 101.0 a338.2 ± 62.4 a0.110butter, popcorn, sweet taste, sour rice
Others
A66Dimethyl trisulfide1376.2717.4871.30222141.8 ± 48.4 b137.6 ± 65.4 b306.8 ± 250.8 a<0.001fresh onion, mint, spicy
A672,4,5-Trimethylthiazole1389.7739.9571.15652375.2 ± 137.0 a230.0 ± 36.8 b122.1 ± 31.0 c<0.001cocoa, chocolate, caramel, nutty
A1091,2-Dimethylbenzene1228.7476.4881.06399252.5 ± 50.1 a279.8 ± 75.5 a140.3 ± 27.8 b<0.001geranium
A108(+)-Limonene1202.9436.6581.21284236.8 ± 59.9 b451.6 ± 99.5 a223.5 ± 32.7 b<0.001lemon, sweet, orange, pine oil
A31 1265.5 ± 955.5 a797.5 ± 120.4 b916.4 ± 207.9 b0.006
A102 648.9 ± 155.0 b664.3 ± 91.7 b1495.3 ± 535.9 a<0.001
A13 2306.4 ± 440.9 a2437.1 ± 265.8 a2153.8 ± 872.4 a0.178
A444 4517.7 ± 510.3 a3394.4 ± 454.0 b2735.5 ± 1034.8 c<0.001
A335 3059.3 ± 311.8 a2951.0 ± 133.8 a2306.8 ± 455.2 b<0.001
A346 3197.1 ± 890.5 a1972.7 ± 345.8 b1222.2 ± 669.1 c<0.001
A407 859.7 ± 167.3 a614.2 ± 83.7 b357.3 ± 155.4 c<0.001
A488 1113.8 ± 247.8 a884.4 ± 123.4 b448.4 ± 176.4 c<0.001
A519 694.9 ± 129.0 a498.8 ± 88.4 b344.5 ± 153.5 c<0.001
A5410 825.4 ± 401.4 a380.7 ± 95.6 b267.3 ± 76.6 b<0.001
A5811 1300.1 ± 524.9 a648.4 ± 136.5 b493.9 ± 156.0 b<0.001
A6212 472.5 ± 27.4 a459.9 ± 97.4 a458.2 ± 132.5 a0.820
A6113 394.1 ± 89.1 b291.7 ± 46.1 b850.8 ± 608.7 a<0.001
A6814 126.0 ± 22.5 b122.4 ± 16.0 b218.4 ± 125.8 a<0.001
A8415 2092.8 ± 277.4 b2468.1 ± 109.6 a2074.8 ± 218.7 b<0.001
A10216 251.9 ± 39.6 c358.6 ± 29.2 b434.1 ± 114.8 a<0.001
A8917 1392.4 ± 120.9 a1366.5 ± 72.2 a1159.4 ± 326.2 b<0.001
A10618 1105.1 ± 316.6 a783.6 ± 147.0 b532.8 ± 220.8 c<0.001
A11419 446.1 ± 84.7 b367.7 ± 36.4 c521.9 ± 85.8 a<0.001
Note: RI represents relative retention index, Rt represents retention time, and Dt represents relative migration time. Signal intensities were shown as mean ± standard deviation (n = 30). Significant differences were determined by Duncan’s test (p < 0.05) and indicated by superscript letters. Odor description obtained from https://www.thegoodscentscompany.com/index.html (accessed on 20 June 2024).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kang, X.; Tan, Z.; Zhao, Y.; Yao, L.; Sheng, X.; Guo, Y. Explainable Deep Learning to Predict Kelp Geographical Origin from Volatile Organic Compound Analysis. Foods 2025, 14, 1269. https://doi.org/10.3390/foods14071269

AMA Style

Kang X, Tan Z, Zhao Y, Yao L, Sheng X, Guo Y. Explainable Deep Learning to Predict Kelp Geographical Origin from Volatile Organic Compound Analysis. Foods. 2025; 14(7):1269. https://doi.org/10.3390/foods14071269

Chicago/Turabian Style

Kang, Xuming, Zhijun Tan, Yanfang Zhao, Lin Yao, Xiaofeng Sheng, and Yingying Guo. 2025. "Explainable Deep Learning to Predict Kelp Geographical Origin from Volatile Organic Compound Analysis" Foods 14, no. 7: 1269. https://doi.org/10.3390/foods14071269

APA Style

Kang, X., Tan, Z., Zhao, Y., Yao, L., Sheng, X., & Guo, Y. (2025). Explainable Deep Learning to Predict Kelp Geographical Origin from Volatile Organic Compound Analysis. Foods, 14(7), 1269. https://doi.org/10.3390/foods14071269

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop