Seismic Facies Recognition Based on Multimodal Network with Knowledge Graph

Yan, Binpeng; Li, Mutian; Pan, Rui; Zhao, Jiaqi

doi:10.3390/app152011087

Open AccessArticle

Seismic Facies Recognition Based on Multimodal Network with Knowledge Graph

Department of Petroleum, China University of Petroleum-Beijing at Karamay, Karamay 834000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(20), 11087; https://doi.org/10.3390/app152011087

Submission received: 15 September 2025 / Revised: 8 October 2025 / Accepted: 14 October 2025 / Published: 16 October 2025

Download

Browse Figures

Versions Notes

Abstract

Seismic facies recognition constitutes a fundamental task in seismic data interpretation, playing an essential role in characterizing subsurface geological structures, sedimentary environments, and hydrocarbon reservoir distributions. Conventional approaches primarily depend on expert interpretation, which often introduces substantial subjectivity and operational inefficiency. Although deep learning-based methods have been introduced, most rely solely on unimodal data—namely, seismic images—and encounter challenges such as limited annotated samples and inadequate generalization capability. To overcome these limitations, this study proposes a multimodal seismic facies recognition framework named GAT-UKAN, which integrates a U-shaped Kolmogorov–Arnold Network (U-KAN) with a Graph Attention Network (GAT). This model is designed to accept dual-modality inputs. By fusing visual features with knowledge embeddings at intermediate network layers, the model achieves knowledge-guided feature refinement. This approach effectively mitigates issues related to limited samples and poor generalization inherent in single-modality frameworks. Experiments were conducted on the F3 block dataset from the North Sea. A knowledge graph comprising 47 entities and 12 relation types was constructed to incorporate expert knowledge. The results indicate that GAT-UKAN achieved a Pixel Accuracy of 89.7% and a Mean Intersection over Union of 70.6%, surpassing the performance of both U-Net and U-KAN. Furthermore, the model was transferred to the Parihaka field in New Zealand via transfer learning. After fine-tuning, the predictions exhibited strong alignment with seismic profiles, demonstrating the model’s robustness under complex geological conditions. Although the proposed model demonstrates excellent performance in accuracy and robustness, it has so far been validated only on 2D seismic profiles. Its capability to characterize continuous 3D geological features therefore remains limited.

Keywords:

seismic facies recognition; knowledge graph; multimodal fusion

1. Introduction

Seismic facies represent a foundational concept in seismic data interpretation and serve as critical indicators for characterizing the geometry, depositional environment, and evolutionary history of subsurface geological bodies [1,2,3]. Through the analysis of seismic facies attributes, geoscientists can more accurately identify depositional facies belts, structural units, and the distribution of hydrocarbon reservoirs. Such analyses significantly improve the understanding of subsurface geological frameworks and sedimentary systems [4,5,6]. Conventional seismic facies analysis relies extensively on interpreters’ expertise and subjective judgment. This heavy dependence frequently results in interpretations that are neither unique nor consistent. These limitations become particularly evident when dealing with large volumes of complex geological data, where maintaining both efficiency and accuracy poses considerable challenges [7,8,9].

With the advancement of artificial intelligence, particularly deep learning methods, the automatic identification and intelligent analysis of seismic facies have emerged as pivotal research directions in seismic interpretation [10,11,12]. These approaches substantially enhance both processing efficiency and predictive accuracy [13,14,15]. Since Waldeland et al. [16] first introduced the U-Net architecture for seismic facies recognition, numerous enhancements have been proposed. Examples include Res-U-Net, the incorporation of residual modules in the encoder, and the integration of multi-scale frequency-domain transformations within the U-Net framework [17,18,19]. These modifications have significantly improved network performance and generalization capability. Additionally, recent architectures leveraging multi-level wavelet transforms and multi-resolution Transformers have been developed to extract multi-scale seismic features, further boosting recognition accuracy [20]. Moreover, unsupervised deep domain adaptation techniques show promising potential in data-scarce scenarios by enabling effective prediction even in the absence of labeled data [1].

The methods described above primarily employ unimodal networks, relying exclusively on seismic images in a purely data-driven manner. Although effective in certain scenarios, such models exhibit critical limitations due to their inability to integrate supplementary geological, geophysical, or semantic information. This inadequacy hinders their practical application in industrial settings. The lack of explicit semantic modeling among different facies types further constrains their performance in complex geological settings [21]. These challenges are exacerbated in situations with limited training data or sparse annotations, where unimodal methods often demonstrate poor generalization capability and suboptimal segmentation accuracy.

To address these issues, researchers have increasingly turned to multimodal fusion strategies that integrate diverse data sources—such as seismic attributes, well logs, and geological interpretations—within a unified deep learning framework [22,23,24,25]. For example, Amendola et al. [26] incorporated both seismic attributes and facies annotations to improve the interpretation of complex structural features. Similarly, Yi et al. [27] utilized multiple modalities including well logs, petrophysical measurements, textual descriptions, and core images to enhance facies prediction, achieving promising results. The multimodal fusion paradigm not only strengthens the representational power of the model but also improves its geological interpretability, thereby offering a more comprehensive and reliable foundation for seismic facies segmentation.

To address the aforementioned issues, this study proposes a multimodal seismic facies recognition network that integrates knowledge graphs—named GAT-UKAN. The framework combines a U-shaped Kolmogorov–Arnold Network (U-KAN) backbone with a Graph Attention Network (GAT) to incorporate structured geological knowledge, thereby mitigating the limitations of conventional image-only perception methods [28,29]. In this paper, multimodal specifically denotes the integration of two heterogeneous data types: (1) the seismic image modality, which captures spatial and textural features from seismic profiles, and (2) the knowledge graph modality, which encodes geological semantics such as depositional environment, lithology, and facies relationships. By integrating these complementary modalities, the proposed network effectively combines data-driven visual features with domain knowledge, enhancing both classification accuracy and geological interpretability. This fusion allows the model to attain a degree of geological semantic awareness. The U-KAN serves as the image processing branch, extracting multi-scale semantic features from seismic data. Simultaneously, a structured seismic facies knowledge graph is constructed, incorporating entities and attributes such as seismic properties and depositional environments. A GAT is employed to learn embeddings of the entities and relations within this graph, producing high-level semantic representations. A mid-level fusion strategy [30] is adopted to combine intermediate features from the U-KAN backbone with the knowledge embeddings generated by the GAT. The incorporation of knowledge graphs not only supplies semantic priors for facies recognition but also enhances the model’s capacity to represent spatial distributions and geological structures [31].

Variations in seismic data acquisition methods and geological conditions often lead to significant differences in feature distributions across datasets. These distribution shifts can reduce model sensitivity when applied to new domains, adversely affecting prediction accuracy—particularly in datasets with lower signal-to-noise ratios, where performance degradation is often pronounced. Moreover, seismic facies recognition, which involves semantic segmentation of seismic profiles, requires discriminating among multiple complex geological structures and typically entails a larger number of classes than conventional segmentation tasks. Direct application of models trained on one domain to another frequently results in substantial prediction errors. Therefore, integrating transfer learning into deep learning frameworks is highly valuable. By utilizing pre-trained models, transfer learning can considerably reduce training expenses for new tasks while maintaining high accuracy, even with limited annotated samples. Based on these considerations, this study employs a transfer learning strategy to improve model adaptability and performance across diverse seismic facies recognition tasks [32].

The remainder of this paper is structured as follows: Section 2 provides a detailed overview of related network architectures. Section 3 presents the experimental results and the application of transfer learning. Finally, Section 4 and Section 5 present the discussion and conclusions of the study, respectively.

2. Materials and Methods

2.1. Architecture Overview

Figure 1 illustrates the overall architecture of the proposed multimodal network. The backbone is structured as a two-stage encoder–decoder. The encoder comprises three convolutional modules followed by tokenized KAN modules. The decoder maintains a symmetric structure, consisting of two tokenized KAN modules and three convolutional modules. For knowledge graph encoding, a GAT is employed to extract semantic features from the graph and generate embedding vectors corresponding to target seismic facies. Feature fusion is conducted at the mid-level of the encoder, specifically after the convolutional modules. At this stage, the intermediate image features derived from the backbone are fused with the embedding vectors produced by the GAT. This integrated feature representation is then propagated to subsequent network layers. Through comprehensive multi-source information fusion, the architecture facilitates domain knowledge-guided feature enhancement, thereby improving semantic comprehension and enabling knowledge-driven recognition.

In this framework, the image branch takes 2D seismic sections or slices as input, whereas the graph branch receives a knowledge graph constructed from geological expertise, comprising entities and their relational connections. The output consists of pixel-wise predictions of seismic facies categories, aligning with the target of geological interpretation. The convolutional modules extract local textural features, while the tokenized KAN modules enhance nonlinear feature representation and high-order semantic modeling. Simultaneously, the GAT generates knowledge-aware embeddings by dynamically capturing interdependencies among geological entities. Feature fusion is performed at the intermediate network layers to preserve spatial detail while integrating semantic priors, thereby balancing low-level features with high-level abstractions. The overall architecture leverages the multi-scale contextual extraction capability of convolutional networks, the nonlinear approximation strengths of KAN, and the knowledge-guided constraints of GAT. This design achieves an effective integration of data-driven and knowledge-driven methodologies, significantly improving the accuracy, interpretability, and generalization capacity of seismic facies classification.

2.2. Convolutional Block

This convolutional module consists of two consecutive

3 \times 3

convolutional layers. Each convolutional layer is followed by Batch Normalization (BN) and a ReLU activation function. The convolutions use a stride of 1 and padding of 1, ensuring that the spatial resolution of the input is preserved throughout the module. The computation in each convolutional layer can be expressed as:

Y = R e L U (B N (C o n v 2 D (X; K, b)))

(1)

where X denotes the input feature map, and Y denotes the output feature map. Conv2D represents the convolution operation. K and b refer to the kernel weights and biases, respectively.

2.3. GAT

First, to obtain sufficient representational capacity for transforming input features into higher-level representations, the network performs a linear transformation on the input node features

h \in R^{N \times C_{i n}}

, where

N

denotes the number of nodes. To achieve this, a shared linear transformation parameterized by a learnable weight matrix

W \in R^{C_{i n} \times C_{o u t}}

is applied to each node. Here,

C_{i n}

and

C_{o u t}

represent the input and output feature dimensions, respectively.

Next, we apply a self-attention mechanism to the nodes using the attention function a, in order to compute attention coefficients. Here,

a \in R^{2 C_{o u t}}

is a learnable attention weight vector. Nonlinearity is introduced through the LeakyReLU activation function:

e_{i j} = (L e a k y R e L U (a^{T} [W h_{i} ∥ W h_{j}]))

(2)

where

\cdot^{T}

represents transposition and

∥

is the concatenation operation.

These coefficients represent the importance of node j’s features to node i, and are dynamically adjusted through learnable attention parameters. For node j, node i is one of its first-order neighbors. This allows the network to adaptively focus on the most relevant neighbors, thereby enhancing feature extraction capability. To enable comparison of attention coefficients across different nodes, a softmax function is applied over all neighbors of node j to normalize the coefficients:

α_{i j} = s o f t m a x (e_{i j}) = \frac{e x p (L e a k y R e L U (a^{T} [W h_{i} ∥ W h_{j}]))}{\sum_{k \in N_{i}} e x p (L e a k y R e L U (a^{T} [W h_{i} ∥ W h_{k}]))}

(3)

After obtaining the attention coefficients, we use them to compute a weighted linear combination of the corresponding features. This results in the final output feature representation for each node:

h_{i}^{'} = \sum_{j \in N_{i}} α_{i j} W h_{j}

(4)

In this network, only the feature embedding vectors corresponding to the six target seismic facies are outputted. These embedding vectors are concatenated with the image feature vectors and then fed into the subsequent network layers.

2.4. Tokenized KAN Module

In this module, the fused feature vector Z, obtained by combining the knowledge graph embeddings and image feature vectors, is first flattened into a sequence of embedded feature units, denoted as

Z^{'}

:

Next, the embedded feature units are passed through a series of KAN layers. Each KAN layer is followed by a depthwise separable convolution (DwConv), BN and a ReLU activation function. A residual connection is applied by adding the original features to the processed features. Finally, the output is normalized using layer normalization (LN) before being passed to the next module. The output of the k-th tokenized KAN module can be expressed as:

Z_{k}^{'} = L N (Z_{k - 1}^{'} + D w C o n v (K A N (Z_{k - 1}^{'})))

(5)

K A N (Z) = (Φ_{k - 1} \circ Φ_{k - 2} \circ \cdot \cdot \cdot \circ Φ_{1} \circ Φ_{0}) Z

(6)

where

Φ_{i}

denotes a learnable activation function.

2.5. Decoder

In the backbone network, skip connections are employed to recover low-level details. For a given downsampled feature

Z_{k}

at layer k, and the corresponding upsampled input feature

Z_{k + 1}^{'}

from layer k + 1, the upsampled output feature at layer k is computed as:

Z_{k}^{'} = C a t (Z_{k + 1}^{'}, (Z_{k}))

(7)

where

C a t (\cdot)

denotes the feature concatenation operation.

3. Results

3.1. Experimental Setup and Evaluation Matrices

In this study, the seismic image dataset was obtained from the publicly available F3 block in the North Sea, offshore the Netherlands, covering an area of approximately 384 km². The dataset includes 3D post-stack seismic volumes and 26 well logs. The original data comprise 7 geological layers, which have been reinterpreted to define 6 seismic facies classes. The dataset has dimensions of 512 × 512 × 512.

As shown in Figure 2, the seismic facies in this area are categorized into six types, which from top to bottom are: the Upper North Sea Group, the Middle North Sea Group, the Lower North Sea Group, the Rijnland Group, the Scruff Group, and the Zechstein Group. For simplicity, these six facies are referred to here as Class 1 to Class 6. Each class is assigned a distinct color for visualization purposes.

The model utilized in this study is based on a 2D architecture; therefore, the 3D seismic volume was sliced along the inline and crossline directions to generate two-dimensional profiles. As the original number of profile images is insufficient for training deep neural networks, data augmentation was applied to the training set. Specifically, Gaussian noise injection, random rotation, and horizontal flipping were introduced to augment the training samples. This strategy reduces the model’s reliance on specific spatial locations and enhances its robustness and generalization capability. The augmentation process not only increases the diversity of the training data but also helps alleviate overfitting, leading to improved performance in seismic facies recognition.

The knowledge graph utilized in this study was constructed using publicly available data from the F3 block, offshore the Netherlands. It integrates seismic attributes, facies types, and geological knowledge—including depositional environments, lithology, and structural context. The graph comprises 47 entities and 12 relation types. To ensure accuracy and completeness, entities and relationships were systematically compiled and encoded based on geological reports, published literature, and expert interpretation. Furthermore, the graph captures semantic associations and hierarchical structures among entities, offering rich semantic support for multimodal fusion. This structure enables the model to better comprehend complex relationships among seismic facies, thereby enhancing recognition accuracy and interpretability.

In this study, the U-Net, U-KAN, and GAT-UKAN were implemented using the PyTorch framework and trained on a single NVIDIA RTX 3080 Ti GPU. The Adam optimizer was employed with a batch size of 4 and an initial learning rate of 0.0001 [33]. To ensure stable convergence, an exponential learning rate scheduler was applied, which reduced the learning rate by 4% after each epoch. The Binary Cross-Entropy (BCE) loss function was adopted due to its stable gradient updates and convenience in multi-class segmentation tasks [34]. Training was conducted over 50 epochs, which proved sufficient for convergence in all experiments. For the data partition, training dataset was randomly divided into 90% for training and 10% for validation, to ensure both diversity and representativeness of the two sets. Throughout the process, both training and validation losses were monitored to prevent overfitting and ensure model robustness. All experiments were performed under identical hardware and software conditions to guarantee reproducibility and fairness.

Four evaluation metrics—Pixel Accuracy (PA), Mean Class Accuracy (MCA), Mean Intersection over Union (MIU), and Frequency Weighted Intersection over Union (FIU)—were employed to assess the semantic segmentation performance of the models. PA measures the overall correctness of pixel-level predictions, offering an intuitive assessment of global accuracy. MCA computes the mean accuracy across all classes, thereby giving more weight to balanced performance among categories. MIU quantifies the spatial overlap between predicted and ground-truth regions, making it a stringent metric widely adopted in segmentation tasks. FIU extends MIU by incorporating pixel frequency per class as a weighting factor, emphasizing the influence of larger classes in the overall evaluation. Together, these metrics provide a comprehensive evaluation across multiple aspects, including global accuracy, class-balanced performance, spatial consistency, and sample-weighted fairness. The formal definitions of these metrics are given as follows:

P A = \frac{\sum_{k = 1}^{K} n_{k k}}{\sum_{i = 1}^{K} \sum_{j = 1}^{K} n_{i j}}

(8)

M C A = \frac{1}{K} \sum_{k = 1}^{K} \frac{n_{k k}}{n_{k}}

(9)

M I U = \frac{1}{K} \sum_{k = 1}^{K} \frac{n_{k k}}{n_{k} + p_{k} - n_{k k}}

(10)

F I U = \frac{1}{\sum_{i = 1}^{K} n_{i}} \sum_{k = 1}^{K} n_{k} \cdot \frac{n_{k k}}{n_{k} + p_{k} - n_{k k}}

(11)

In the formula definition,

n_{i j}

denotes the number of pixels with true class i and predicted class j,

n_{k k}

denotes the number of pixels correctly predicted in class k,

n_{K} = \sum_{j = 1}^{K} n_{i j}

denotes the total number of true pixels in class k, and

p_{K} = \sum_{j = 1}^{K} n_{j k}

denotes the total number of pixels predicted to belong to category k, K denotes the total number of categories, and

\sum_{i = 1}^{K} \sum_{j = 1}^{K} n_{i j}

denotes the total number of pixels in the dataset.

3.2. Experimental Results

To evaluate the effectiveness of the proposed method, we compared the performance of three networks—U-Net, U-KAN, and GAT-UKAN—on the same dataset.

Figure 3 depicts the loss curves of the three networks. U-Net shows a more rapid decrease in loss, which can be attributed to its simpler architecture, allowing faster convergence during training. In comparison, U-KAN and GAT-UKAN exhibit higher initial loss values owing to their structural complexity and greater parameter counts, resulting in slower gradient descent. Despite these differences, all three models eventually converge effectively in both training and validation phases by the end of training.

As shown in Figure 4, the multimodal fusion strategy demonstrates evident advantages. In terms of PA, which reflects global classification performance, the GAT-UKAN curve remains consistently higher throughout the training process, exhibiting smoother convergence and ultimately reaching a superior accuracy plateau. This indicates that the model achieves the most robust holistic interpretation of seismic images. More notably, for the MCA, which assesses balanced recognition across classes, GAT-UKAN shows a particularly significant improvement. This result suggests that incorporating prior knowledge graphs via GAT enhances the model’s ability to recognize underrepresented seismic facies, effectively alleviating the performance bias typical in class-imbalanced settings. On the core segmentation metric MIU, GAT-UKAN delivers substantial gains, markedly outperforming the other two models. This demonstrates that the integration of geological knowledge graphs leads to predictions that align more closely with ground truth in spatial morphology, producing segmentation results with sharper boundaries, more coherent structures, and higher geological consistency. These improvements suggest enhanced knowledge-informed spatial reasoning rather than mere pixel-level optimization. Furthermore, the model’s superior performance on the FIU metric further confirms its overall effectiveness. Collectively, these results underscore that GAT-UKAN provides more accurate and reliable predictions for the challenging task of seismic facies recognition.

As presented in Table 1 and Table 2, GAT-UKAN achieves the best overall performance across all evaluation metrics, with the highest values highlighted in bold. Notably, it attains the highest prediction accuracy for the fifth seismic facies type, representing an improvement of nearly 10 percentage points over U-KAN. This result indicates that the multimodal architecture effectively mitigates class imbalance issues. Furthermore, GAT-UKAN reaches an MIU exceeding 70%, outperforming U-KAN by 4% and U-Net by nearly 5%. These findings demonstrate that the proposed multimodal approach significantly enhances prediction accuracy in seismic facies classification.

Figure 5 displays the prediction results of the three networks on four representative profiles from the test dataset. Both U-Net and U-KAN produce prediction maps containing numerous discontinuities and misclassified regions, particularly between seismic facies and within homogeneous facies areas. These issues are most evident in predictions involving Class 5 and Class 6. U-Net exhibits inconsistent and spatially incoherent facies predictions. The presence of incorrect facies within otherwise accurately classified zones suggests that, without knowledge-guided constraints, the network struggles to identify less frequent or complex facies types. Similarly, U-KAN predictions show fragmented and blurred boundaries between adjacent facies, reflecting limited spatial continuity. These results indicate that unimodal architectures are insufficient for capturing fine-grained geological features, resulting in the loss of critical structural information. In contrast, the GAT-UKAN model generates more geologically plausible interpretations, characterized by continuous stratigraphic layers, accurate classification, and well-delineated facies boundaries. The consistent superiority of GAT-UKAN underscores the value of integrating graph-based knowledge representations for improved seismic facies interpretation.

Based on the aforementioned analysis, the GAT-UKAN demonstrates substantial advantages in seismic facies identification, primarily attributed to its efficient multimodal architecture. Throughout the training process, GAT-UKAN consistently maintained superior performance across all evaluation metrics, exhibiting smoother convergence and higher accuracy plateaus compared to both U-Net and U-KAN. In quantitative evaluations on test data, our proposed model achieved the best overall performance: it attained an MIU exceeding 70%, representing a 4% improvement over U-KAN and nearly 5% over U-Net. Analysis of Table 2 further confirms its effectiveness in mitigating class imbalance issues. Additionally, visual results in Figure 5 demonstrate that GAT-UKAN yields segmentation results characterized by enhanced continuity, accuracy, and boundary delineation. This multimodal integration not only improves model accuracy and robustness but also strengthens geological interpretability, thereby transcending the limitations of conventional unimodal architectures. The framework effectively operates as an intelligent system that synergistically utilizes both data-driven features and structured prior knowledge. Consequently, GAT-UKAN achieves a significant advancement in knowledge-based spatial reasoning capabilities for this challenging seismic facies identification task.

3.3. Transfer Learning

To enhance the model’s adaptability to varying geological settings and improve its generalizability across different field datasets, a transfer learning strategy was adopted. The Parihaka dataset from New Zealand was used as the target domain for prediction.

The similarity between the source and target tasks plays a crucial role in the effectiveness of transfer learning. In this study, both tasks pertain to seismic facies recognition and classification and exhibit a high degree of similarity in their inherent characteristics, thereby laying a robust foundation for knowledge transfer. Specifically, the source model was trained on the publicly available F3 Block seismic dataset from the North Sea, Netherlands, which has been expertly annotated. Training samples were augmented to improve the model’s feature extraction capacity. The target task consists of seismic facies prediction based on the Parihaka dataset from New Zealand, which also comprises six seismic facies categories that are highly comparable to those in the source domain. This high degree of categorical similarity motivated the selection of this dataset for transfer learning.

To ensure experimental consistency, the input seismic profiles within the target domain were resampled to a size of 512 × 512 pixels to align with the dimensions of the source dataset. All training parameters and hyperparameters, including batch size and learning rate, were maintained consistent with those employed in the original GAT-UKAN.

Figure 6 outlines the workflow employed in the transfer learning process. The prediction procedure consists of the following steps: First, the GAT-UKAN model is trained using the F3 block dataset from the North Sea, Netherlands. Subsequently, the Parihaka dataset is partitioned, and a subset of the data is selected as training samples to fine-tune the pre-trained model. This fine-tuning process enhances the model’s adaptation to the characteristics of the target domain, resulting in a specialized transfer learning model. Finally, the adapted model is applied to predict seismic facies on the remaining portions of the Parihaka dataset, producing the final prediction outcomes.

Figure 7 presents the prediction performance of the transfer learning model on the Parihaka dataset. Overall, the prediction results exhibit strong consistency with the stratigraphic structure of the original seismic profile. Most seismic facies boundaries coincide with abrupt changes in reflection amplitude, indicating the model’s capability to capture key variations in seismic attributes.

In the shallow section, thin-bedded strata are clearly resolved, and the transitional relationships between layers align well with the actual reflections, demonstrating the model’s sensitivity to fine structural details. The middle layer group shows good lateral continuity, with predictions consistently following the trends of the same-phase axis, even in regions where lateral thickness gradually decreases. This suggests the model’s robust ability to trace interlayer extensions. In structurally complex areas, such as near faults or where significant phase-axis bending occurs, the predictions maintain continuous interfaces without large-scale misclassification. This outcome indicates that transfer learning effectively improves the model’s adaptability to complex geometric features. Meanwhile, in deep sections characterized by weak reflection amplitudes and low textural contrast, the predictions remain highly consistent with the actual reflections. Key interface locations align accurately with subtle seismic variation. In summary, these results further validate the effectiveness and generalizability of the proposed method for cross-regional seismic facies analysis.

4. Discussion

This paper integrates spatial representations from seismic images with semantic prior knowledge derived from knowledge graphs through mid-level fusion within the network. The proposed approach exhibits more robust recognition performance in scenarios involving complex structures, weak textures, and imbalanced facies categories compared to unimodal methods. The knowledge graph explicitly encodes relationships among “sedimentary environment—rock type—seismic response—seismic facies”, which are incorporated via graph attention weighting and fused with multi-scale image features. This integration reduces semantic ambiguity associated with texture- and amplitude-only discrimination, while enhancing the spatial continuity of facies bands and the clarity of stratigraphic boundaries. Consequently, the model not only delivers improved classification accuracy but also enhances geological interpretability and supports causal analysis.

The results demonstrate that the proposed multimodal framework achieves higher overall accuracy compared to the unimodal baseline. On the same dataset, our network achieved superior performance, outperforming U-KAN and U-Net by 4.2% and 4.7% in MIU, and by 1.7% and 3.1% in PA, respectively. This enhancement can be attributed to the semantic prior, which imposes soft constraints on expected geological relationships, thereby suppressing the propagation of misclassified patches and local noise. Simultaneously, the image branch supplies observed imaging evidence, mutually complementing the semantic constraints to stabilize classification performance in challenging scenarios. Furthermore, transfer learning experiments confirm that the model maintains consistent performance across different survey areas and frequency-band characteristics. These findings suggest that the proposed paradigm exhibits strong adaptability to data distribution shifts commonly encountered in real-world settings, indicating considerable potential for practical application.

Although the present study has obtained promising results in seismic facies classification, several limitations remain to be addressed. First, the current model is designed and validated exclusively on 2D seismic sections and has not yet been extended to 3D seismic volumes. As a result, its ability to characterize spatially continuous geological features in practical applications is limited. For example, when processing seismic slices from different orientations—such as inline, crossline, and horizontal sections—the segmentation results may exhibit inconsistencies. Even within the same geological unit, boundaries identified from different directions may not align perfectly, underscoring the inherent constraints of 2D architectures in representing three-dimensional geologic structures.

Second, the current knowledge graph incorporates somewhat limited geological information. As it was constructed solely based on data from the Dutch F3 block, encoded attributes—such as lithology, structural setting, and depositional environment—remain relatively narrow, which limits its ability to fully represent complex geological systems. This constraint not only affects the model’s generalizability to broader and more diverse geological contexts but also restricts the depth of semantic reasoning supported by the graph. Future work should aim to extend the knowledge graph by incorporating geological data from multiple regions, integrating multi-source geophysical information, and encoding more refined seismic facies attributes. Such enhancements would improve the graph’s comprehensiveness and diversity, thereby supplying more robust prior knowledge to support model performance.

Future work will focus on the development of an enhanced model capable of interpreting three-dimensional seismic volumes. Additionally, efforts will be devoted to enriching and expanding the geological knowledge encoded within the graph to improve its completeness and diversity, thereby providing more robust prior constraints for the model. These directions represent valuable pathways toward achieving more robust and geologically consistent interpretation performance.

5. Conclusions

In this study, we propose a multimodal seismic facies recognition network based on a knowledge graph. This architecture employs mid-level fusion to integrate image modality with knowledge modality, thereby addressing the limitations of conventional unimodal recognition approaches. It effectively combines spatial features extracted from seismic images with semantic information derived from the knowledge graph, facilitating collaborative multimodal analysis. Experimental results demonstrate that our method achieves superior accuracy in seismic facies prediction, attaining a PA of 89.7% and an MIU of 70.6%, these results outperform all single-modal baseline models. This validates the effectiveness of incorporating a knowledge graph for enhancing the accuracy and robustness of identification. Furthermore, transfer learning experiments show that the model maintains strong adaptability across different survey areas and diverse geological settings, underscoring its considerable practical utility and application potential.

Notwithstanding these promising results, this study has certain limitations. First, the model was designed and validated solely on 2D seismic sections; consequently, it may not adequately capture continuous 3D geological features. Second, the knowledge graph was constructed using data from a single block (the Dutch F3 dataset), and thus contains limited geological information. Future work will focus on extending the framework to 3D seismic networks and enriching the knowledge graph to improve the model’s robustness and general applicability.

The improvements achieved through the fusion of image features and knowledge priors offer a new perspective for advancing seismic facies identification methods. Compared to traditional approaches that rely solely on a single image modality, our work not only elevates predictive accuracy but also significantly strengthens the geological interpretability of the results. This provides a valuable reference for promoting the application of multimodal methodologies in geophysical interpretation.

Author Contributions

Conceptualization, B.Y.; methodology, M.L.; software, R.P.; validation, M.L.; formal analysis, M.L. and B.Y.; resources, B.Y.; data curation, J.Z.; writing—original draft preparation, M.L.; writing—review and editing, B.Y.; visualization, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant numbers 42204113 and 42564005, and Provincial Key Research and Development Plan of Xinjiang Uygur Autonomous Region, grant number 2024B01016.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw field seismic data underlying the conclusions of this study are available from the authors upon reasonable request.

Acknowledgments

The authors would like to thank Alaudah et al. for the synthetic seismic data and open-source code for comparison. They are also thankful to the provider of Netherlands offshore F3 seismic data used in this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Nasim, M.Q.; Maiti, T.; Srivastava, A.; Singh, T.; Mei, J. Seismic Facies Analysis: A Deep Domain Adaptation Approach. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4508116. [Google Scholar] [CrossRef]
Xu, G.; Haq, B.U. Seismic Facies Analysis: Past, Present and Future. Earth-Sci. Rev. 2022, 224, 103876. [Google Scholar] [CrossRef]
Roksandić, M.M. Seismic Facies Analysis Concepts. Geophys. Prospect. 1978, 26, 383–398. [Google Scholar] [CrossRef]
Wrona, T.; Pan, I.; Gawthorpe, R.L.; Fossen, H. Seismic Facies Analysis Using Machine Learning. Geophysics 2018, 83, O83–O95. [Google Scholar] [CrossRef]
Owusu, B.A.; Boateng, C.D.; Asare, V.-D.S.; Danuor, S.K.; Adenutsi, C.D.; Quaye, J.A. Seismic Facies Analysis Using Machine Learning Techniques: A Review and Case Study. Earth Sci. Inf. 2024, 17, 3899–3924. [Google Scholar] [CrossRef]
Song, C.; Liu, Z.; Wang, Y.; Li, X.; Hu, G. Multi-Waveform Classification for Seismic Facies Analysis. Comput. Geosci. 2017, 101, 1–9. [Google Scholar] [CrossRef]
Wang, P.; Chen, X.; Wang, B.; Li, J.; Dai, H. An Improved Method for Lithology Identification Based on a Hidden Markov Model and Random Forests. Geophysics 2020, 85, IM27–IM36. [Google Scholar] [CrossRef]
Wang, P.; Cui, Y.-A.; Zhou, L.; Li, J.-Y.; Pan, X.-P.; Sun, Y.; Liu, J.-X. Multi-Task Learning for Seismic Elastic Parameter Inversion with the Lateral Constraint of Angle-Gather Difference. Pet. Sci. 2024, 21, 4001–4009. [Google Scholar] [CrossRef]
Chen, X.; Zou, Q.; Xu, X.; Wang, N. A Stronger Baseline for Seismic Facies Classification With Less Data. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5914910. [Google Scholar] [CrossRef]
Zhang, H.; Chen, T.; Liu, Y.; Zhang, Y.; Liu, J. Automatic Seismic Facies Interpretation Using Supervised Deep Learning. Geophysics 2021, 86, IM15–IM33. [Google Scholar] [CrossRef]
You, J.; Zhao, J.; Huang, X.; Zhang, G.; Chen, A.; Hou, M.; Cao, J. Explainable Convolutional Neural Networks Driven Knowledge Mining for Seismic Facies Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5911118. [Google Scholar] [CrossRef]
Kaur, H.; Pham, N.; Fomel, S.; Geng, Z.; Decker, L.; Gremillion, B.; Jervis, M.; Abma, R.; Gao, S. A Deep Learning Framework for Seismic Facies Classification. Interpretation 2023, 11, T107–T116. [Google Scholar] [CrossRef]
Abid, B.; Khan, B.M.; Memon, R.A. Seismic Facies Segmentation Using Ensemble of Convolutional Neural Networks. Wirel. Commun. Mob. Comput. 2022, 2022, 7762543. [Google Scholar] [CrossRef]
Wang, Z.; Wang, Q.; Yang, Y.; Liu, N.; Chen, Y.; Gao, J. Seismic Facies Segmentation via a Segformer-Based Specific Encoder–Decoder–Hypercolumns Scheme. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5903411. [Google Scholar] [CrossRef]
Su-Mei, H.; Zhao-Hui, S.; Meng-Ke, Z.; San-Yi, Y.; Shang-Xu, W. Incremental Semi-Supervised Learning for Intelligent Seismic Facies Identification. Appl. Geophys. 2022, 19, 41–52. [Google Scholar] [CrossRef]
Waldeland, A.U.; Jensen, A.C.; Gelius, L.-J.; Solberg, A.H.S. Convolutional Neural Networks for Automated Seismic Interpretation. Lead. Edge 2018, 37, 529–537. [Google Scholar] [CrossRef]
Xu, T.; Zhou, H.; Liu, X.; Liu, C. Seismic facies identification based on Res-UNet and transfer learning. Processes 2024, 39, 319–333. [Google Scholar] [CrossRef]
AlSalmi, H.; Elsheikh, A.H. Automated Seismic Semantic Segmentation Using Attention U-Net. Geophysics 2024, 89, WA247–WA263. [Google Scholar] [CrossRef]
Chakraborty, S.; Routray, A.; Dharavath, S.B.; Dam, T. OrthoSeisnet: Seismic Inversion through Orthogonal Multi-Scale Frequency Domain U-Net for Geophysical Exploration. arXiv 2024, arXiv:2401.04393.2024. [Google Scholar]
Zhou, L.; Gao, J.; Chen, H. Seismic Facies Classification Based on Multilevel Wavelet Transform and Multiresolution Transformer. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5903412. [Google Scholar] [CrossRef]
Ojha, S.; Sharma, M. U-Net Based Image Segmentation Drawbacks in Medical Images: A Review. In Innovations in Sustainable Technologies and Computing; Springer Nature: Singapore, 2024; pp. 361–372. ISBN 978-981-97-1110-9. [Google Scholar]
Zhao, F.; Zhang, C.; Geng, B. Deep Multimodal Data Fusion. ACM Comput. Surv. 2024, 56, 1–36. [Google Scholar] [CrossRef]
Acosta, J.N.; Falcone, G.J.; Rajpurkar, P.; Topol, E.J. Multimodal Biomedical AI. Nat. Med. 2022, 28, 1773–1784. [Google Scholar] [CrossRef] [PubMed]
Liang, P.P.; Zadeh, A.; Morency, L.-P. Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions. ACM Comput. Surv. 2024, 56, 1–42. [Google Scholar] [CrossRef]
Lee, J.; Wang, Y.; Li, J.; Zhang, M. Multimodal Reasoning with Multimodal Knowledge Graph. arXiv 2024, arXiv:2406.02030. [Google Scholar] [CrossRef]
Amendola, A.; Gabbriellini, G.; Dell’Aversana, P.; Marini, A.J. Seismic Facies Analysis through Musical Attributes. Geophys. Prospect. 2017, 65, 49–58. [Google Scholar] [CrossRef]
Yi, Y.; Zhang, Y.; Hou, X.; Li, J.; Ma, K.; Zhang, X.; Li, Y. Sedimentary Facies Identification Technique Based on Multimodal Data Fusion. Processes 2024, 12, 1840. [Google Scholar] [CrossRef]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. arXiv 2018, arXiv:1710.10903.2018. [Google Scholar]
Li, C.; Liu, X.; Li, W.; Wang, C.; Liu, H.; Liu, Y.; Chen, Z.; Yuan, Y. U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation. Proc. AAAI Conf. Artif. Intell. 2025, 39, 4652–4660. [Google Scholar] [CrossRef]
Guarrasi, V.; Aksu, F.; Caruso, C.M.; Di Feola, F.; Rofena, A.; Ruffini, F.; Soda, P. A Systematic Review of Intermediate Fusion in Multimodal Deep Learning for Biomedical Applications. Image Vis. Comput. 2025, 158, 105509. [Google Scholar] [CrossRef]
Wang, X.; Meng, B.; Chen, H.; Meng, Y.; Lv, K.; Zhu, W. TIVA-KG: A Multimodal Knowledge Graph with Text, Image, Video and Audio. In Proceedings of the Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 26 October 2023; ACM: New York, NY, USA, 2023; pp. 2391–2399. [Google Scholar]
Yu, Y.; Zhang, Y.; Cheng, Z.; Song, Z.; Tang, C. Multi-Scale Spatial Pyramid Attention Mechanism for Image Recognition: An Effective Approach. Eng. Appl. Artif. Intell. 2024, 133, 108261. [Google Scholar] [CrossRef]
Yan, B.; Zhao, J.; Peng, K.; Qian, L.; Li, M.; Pan, R. 3D Karst Cave Recognition Using TransUnet with Dual Attention Mechanisms in Seismic Images. Geophysics 2025, 90, IM133–IM143. [Google Scholar] [CrossRef]
Ben-Baruch, E.; Ridnik, T.; Zamir, N.; Noy, A.; Friedman, I.; Protter, M.; Zelnik-Manor, L. Asymmetric Loss For Multi-Label Classification. In Proceedings of the IEEE/CVF International Conference On Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]

Figure 1. The architecture of the GAT-UKAN.

Figure 2. Seismic facies classification of dataset.

Figure 3. Comparison of training loss functions for U-Net, U-KAN, and GAT-UKAN.

Figure 4. Evaluation metrics for U-Net, U-KAN and GAT-UKAN. (a) PA. (b) MCA. (c) MIU. (d) FIU.

Figure 5. Visualization results of the test set. First row: seismic data. Second row: ground truth labels. Third row: predictions from the U-Net. Fourth row: predictions from the U-KAN. Fifth row: predictions from the GAT-UKAN.

Figure 6. Seismic facies prediction workflow of the GAT-UKAN model integrated with transfer learning.

Figure 7. Prediction results of the transfer model. (a–d) seismic data. (e–h) prediction results.

Table 1. Comparison of evaluation metrics for U-Net, U-KAN, and GAT-UKAN.

Model	PA	MCA	MIU	FIU
U-Net	0.868	0.799	0.659	0.773
U-KAN	0.880	0.808	0.664	0.806
GAT-UKAN	0.897	0.824	0.706	0.826

Table 2. Comparison of CA for U-Net, U-KAN, and GAT-UKAN.

Model	Class 1	Class 2	Class 3	Class 4	Class 5	Class 6
U-Net	0.980	0.877	0.950	0.885	0.546	0.556
U-KAN	0.986	0.884	0.973	0.886	0.549	0.577
GAT-UKAN	0.988	0.899	0.976	0.889	0.643	0.578

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yan, B.; Li, M.; Pan, R.; Zhao, J. Seismic Facies Recognition Based on Multimodal Network with Knowledge Graph. Appl. Sci. 2025, 15, 11087. https://doi.org/10.3390/app152011087

AMA Style

Yan B, Li M, Pan R, Zhao J. Seismic Facies Recognition Based on Multimodal Network with Knowledge Graph. Applied Sciences. 2025; 15(20):11087. https://doi.org/10.3390/app152011087

Chicago/Turabian Style

Yan, Binpeng, Mutian Li, Rui Pan, and Jiaqi Zhao. 2025. "Seismic Facies Recognition Based on Multimodal Network with Knowledge Graph" Applied Sciences 15, no. 20: 11087. https://doi.org/10.3390/app152011087

APA Style

Yan, B., Li, M., Pan, R., & Zhao, J. (2025). Seismic Facies Recognition Based on Multimodal Network with Knowledge Graph. Applied Sciences, 15(20), 11087. https://doi.org/10.3390/app152011087

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Seismic Facies Recognition Based on Multimodal Network with Knowledge Graph

Abstract

1. Introduction

2. Materials and Methods

2.1. Architecture Overview

2.2. Convolutional Block

2.3. GAT

2.4. Tokenized KAN Module

2.5. Decoder

3. Results

3.1. Experimental Setup and Evaluation Matrices

3.2. Experimental Results

3.3. Transfer Learning

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI