A Diffusion Model-Empowered CNN-Transformer for Few-Shot Fault Diagnosis in Natural Gas Wells

Wang, Chuanping; Li, Yudong; Wang, Jiajia; Wang, Yuzhe; Liu, Yufeng; Han, Ling; Yang, Fan; Gao, Xiaoyong

doi:10.3390/pr13082608

Open AccessArticle

A Diffusion Model-Empowered CNN-Transformer for Few-Shot Fault Diagnosis in Natural Gas Wells

by

Chuanping Wang

¹,

Yudong Li

¹,

Jiajia Wang

¹,

Yuzhe Wang

¹,

Yufeng Liu

¹,

Ling Han

¹,

Fan Yang

¹ and

Xiaoyong Gao

^2,*

¹

No.1 Gas Production Plant, PetroChina Xinjiang Oilfield Company, Karamay 834000, China

²

I3Lab, Department of Automation, China University of Petroleum, Beijing 102249, China

^*

Author to whom correspondence should be addressed.

Processes 2025, 13(8), 2608; https://doi.org/10.3390/pr13082608

Submission received: 12 July 2025 / Revised: 6 August 2025 / Accepted: 14 August 2025 / Published: 18 August 2025

(This article belongs to the Special Issue Progress in Design and Optimization of Fault Diagnosis Modelling)

Download

Browse Figures

Versions Notes

Abstract

Natural gas wells operate under complex conditions with frequent environmental disturbances. Fault types vary significantly and often present weak signals, affecting both safety and efficiency. This paper proposes an intelligent fault-diagnosis method based on a CNN-Transformer model using real-time wellsite data. A time series diffusion model is applied to enhance small-sample data by generating synthetic fault samples, and the CNN-Transformer model extracts both local and global features from time series inputs to improve fault recognition in complex scenarios. Validation on a real-world dataset demonstrates that the proposed method achieves a macro F1-Score of 99.52% in multi-class fault diagnosis, significantly outperforming baseline models (1D-CNN: 95.83%, LSTM: 93.54%, GRU: 94.98%). Quantitative analysis confirms the diffusion model’s superiority in data augmentation, with lower Earth Mover’s Distance (0.087), KL Divergence (0.245), and Mean Squared Error (0.298) compared to GAN and VAE variants. Ablation studies show that removing diffusion-based augmentation leads to a 14.96% drop in F1-Score, highlighting its critical role in mitigating class imbalance. Results validate the diffusion model’s effectiveness for data augmentation and the CNN-Transformer’s superior ability to capture complex time series patterns, providing theoretical support and practical tools for intelligent monitoring and maintenance in natural gas well systems.

Keywords:

natural gas wells; fault diagnosis; diffusion modeling; CNN; transformer; time series

1. Introduction

Natural gas, an efficient and clean energy source, is widely used in industrial manufacturing and urban energy systems, driven by the dual-carbon strategy. Intelligent and intensive extraction technologies have improved the performance of natural gas wells. Ensuring operational safety and stability has become increasingly important.

Typical faults include freeze plugging at the wellhead or in the wellbore, fluid accumulation in the wellbore, and formation energy depletion. If not addressed in time, these faults can lead to capacity loss, equipment damage, unplanned shutdowns, and other serious consequences. Therefore, developing an intelligent fault-diagnosis system is essential to support safe and stable well operations.

In natural gas well fault diagnosis, traditional methods, relying heavily on manual experience and fixed threshold alarms, struggle with the dynamic, nonlinear, and high-dimensional nature of multi-source time series wellsite data, causing delays and false alarms. While plunger lift monitoring in unconventional gas development provides ample production and operational data for ML/AI-based anomaly diagnosis, field data suffer from scarce failure cases, hard-to-capture pre-occurrence fault features, and strong data correlation, limiting model accuracy [1]. Complex production environments and experience-dependent detection lead to reduced output, unreliability, and high labor costs [2]. Though nine advanced ML methods have been evaluated on four wells’ sensor data, identifying key features [3], existing logging tools’ shortcomings in measuring downhole temperature/pressure hinder modern monitoring and accurate fault judgment [4]. Challenges remain, including scarce samples, weak signal detection, poor model adaptability in complex conditions, and gaps in integrating time series augmentation with classification models for highly dynamic, non-stationary well environments.

In recent years, data-driven diagnostic techniques have made progress in industrial health monitoring. Among them, Diffusion Probabilistic Models (DPMs) have shown strong generative capabilities for enhancing small-sample datasets. Ho et al. (2020) proposed the DDPM model, which models data distributions through a forward noise process and a backward denoising process [5]. Nichol and Dhariwal (2021) later improved the model’s sampling efficiency and convergence [6]. Zhao et al. (2025) introduced DDPM into mechanical fault-diagnosis tasks, enhancing classification accuracy and robustness under small-sample conditions [7]. Yang et al. (2024) proposed a class-wise diffusion strategy to mitigate category imbalance and reduce diagnostic bias [8]. Zhang et al. (2024) developed an interpretable diffusion model using latent variable techniques for weakly supervised industrial scenarios [9]. Zhang et al. (2025) also applied DDPM to HVAC systems, demonstrating its applicability in real-world industrial environments [10].

Meanwhile, Convolutional Neural Network (CNN) and Transformer architectures have become mainstream in deep learning-based diagnosis tasks. CNNs are effective for local feature extraction, while Transformers excel in modeling long-range temporal dependencies. Their fusion has become a trend in fault diagnosis. For instance, Zhu et al. (2023) proposed a multi-scale Transformer-CNN model for robust knowledge transfer across devices [11]. Chen et al. (2024) designed a BCTGAN framework that integrates CNN and Transformer to handle imbalanced data [12]. Lai et al. (2025) constructed a parallel CNN-Transformer encoder to balance diagnostic accuracy and lightweight model design [13]. Zhang et al. (2023) proposed TSViT, showing the Transformer’s ability to capture complex temporal fault patterns [14]. In addition, Lu et al. (2022) and Pei et al. (2023) introduced mechanisms to improve feature representation and adaptability in complex industrial tasks [15,16].

Although diffusion models and deep fusion networks have shown great potential, building an end-to-end diagnostic system for natural gas wells remains challenging. These wells operate in highly dynamic and non-stationary environments, often under small-sample and weak-signal conditions. Moreover, current research lacks integrated methods that combine time series data augmentation with fault-classification models for complex field conditions.

To address the dual challenges of sample scarcity and feature complexity in natural gas well diagnosis, this paper proposes an intelligent fault-diagnosis method that integrates diffusion-based data augmentation with a CNN-Transformer classification model. The key contributions are as follows:

A time series diffusion model is introduced to generate high-quality and diverse synthetic fault samples, thereby alleviating the small-sample problem in real-world scenarios;
A CNN-Transformer-based diagnostic model is constructed to perform multi-class fault identification on enhanced data, capturing both local details and global temporal patterns;
The proposed method is validated on a real-world dataset from natural gas wells. Results show improved classification accuracy, robustness under weak-signal conditions, and better adaptability to complex working environments.

The remainder of this paper is organized as follows: Section 2 details the natural gas well production process, data characteristics, and common diagnostic challenges. Section 3 presents the proposed method based on diffusion enhancement and CNN-Transformer modeling. Section 4 describes comparative experiments and performance evaluations. Section 5 concludes the paper and discusses future directions.

2. Brief Introduction of Natural Gas Well System

2.1. Gas Well Production and Monitoring System Overview

A natural gas well is a complex operating system that combines underground reservoirs, wellbore structures, surface gathering and transportation devices and online monitoring systems. The typical gas-extraction process includes: natural gas is extracted from the formation, rises to the wellhead through the wellbore, and is processed through separation, dehydration, pressurization and other devices before being transported to the downstream pipeline network. In intelligent oil and gas fields, SCADA (data acquisition and monitoring) systems have been widely deployed to collect multi-dimensional monitoring data in real time, including oil pressure, casing pressure, well temperature and production.

Such multi-source time series data are characterized by non-stationarity, strong coupling, and high noise, and are continuously affected by environmental perturbations and operational adjustments, showing obvious dynamic change patterns. Therefore, an intelligent diagnosis system for natural gas wells must be able to effectively handle the subtle fluctuations and potential anomalies in the high-dimensional time series features, and have strong dynamic modeling and pattern-recognition capabilities [17].

2.2. Typical Fault Modes and Data Signatures

Combining historical operation data and expert experience, common failure modes of natural gas wells mainly include the following categories:

Wellhead hydrate freezing and plugging: due to the formation of hydrate when the gas at the wellhead is cold, the outlet is blocked, which is characterized by the increase of oil pressure, the obvious drop of well temperature within 2 h, the sudden drop of gas production, and the drop of outgoing pressure, which is often developed rapidly in a short period of time;
Wellbore hydrate plugging: hydrates are formed in the middle or lower part of the wellbore, which is characterized by a drop in production, oil pressure, and well temperature within 2–6 h, but there is no obvious change in the differential pressure between oil pressure and casing pressure, which needs to be distinguished from other types of plugging [18];
Fluid accumulation in the wellbore (liquid load): the accumulation of liquid in the well impedes gas flow, which is manifested by a rapid decline in production until close to 0 within 3–5 days, a drop in oil pressure, and a significant increase in differential pressure between oil pressure and casing pressure, while the change in casing pressure is insignificant, and it is a gradual type of failure [19];
Normal production status: all monitoring parameters such as oil pressure, well temperature, production, etc. are within reasonable fluctuation range without significant sudden change or abnormal trend, and the system runs smoothly.

These faults show change patterns under different time scales in the multidimensional monitoring data, but the early signs often have weak signals, overlapping features, and are easily affected by noise, which makes it difficult to recognize them in time by traditional methods.

2.3. Problem Statement

Given multidimensional time series monitoring data:

X = {x_{1}, x_{2}, \dots, x_{T}}, x_{t} \in R^{d},

(1)

where T denotes the length of the time series, and d is the sensor dimension, the goal is to construct a classification function

f : R^{T \times d} \to Y

that maps an input time series X to its corresponding operational state

y \in Y

, where

Y = {normal, wellhead blockage, downhole hydrate, liquid loading} .

(2)

The task faces the following two major challenges:

Small-sample problem: low frequency of occurrence of real faults, scarce number of samples, and severe imbalance in the distribution of the categories;
Multi-scale time series feature extraction: faults evolve at different rates, which requires the model to have both local detail awareness and global dependency-modeling capabilities.

Therefore, this paper aims to design an intelligent diagnostic framework with small-sample augmentation capabilities and complex time series-modeling capability to achieve accurate fault identification in natural gas wells.

3. Methodology

3.1. Overall Framework

The overall technical framework of the natural gas well fault-diagnosis method proposed in this paper is shown in Figure 1, which mainly contains two core modules: a fault data augmentation module leveraging a diffusion model, and a fault-diagnosis module built upon a CNN-Transformer architecture.

First, for the problem of scarcity of fault samples in the original monitoring data, a diffusion probability model is used to generate high-fidelity samples for various types of faulty conditions to construct a balanced training dataset; subsequently, a CNN-Transformer fusion network is used to perform in-depth feature extraction and fault category diagnosis on the augmented data, so as to realize intelligent fault diagnosis.

3.2. Fault Data Augmentation via Diffusion Model

DPMs are a class of generative modeling methods based on stochastic differential equations, whose core mechanism lies in simulating the gradual degradation process and reverse recovery process of data under noise perturbation. They outperform traditional generative models like GANs and VAEs in generating high-fidelity samples due to their unique data generation and optimization approach. In recent years, such models have shown good performance in tasks such as image, audio, and time series generation, with their stability and generation quality widely verified [20]. Specifically, GANs, relying on adversarial training between generator and discriminator, face training instability—struggling with performance and generalizability in topology optimization [21]—and mode collapse, failing to capture full data diversity, which is unreliable for tasks like generating diverse fault signatures in natural gas wells. VAEs, using variational inference for stable training, minimize the evidence lower bound (ELBO) but produce overly smoothed or “blurry” samples; they excel at latent representation learning but sacrifice fine details due to loss function trade-offs [22], hindering high-fidelity sample generation for capturing subtle time series patterns. In contrast, DPMs generate samples via gradual denoising, iteratively refining noise into structured data over steps—this multi-step mechanism effectively reconstructs complex structures and preserves details, boosting fidelity [23]. Unlike GANs, they avoid adversarial instability with explicit noise prediction, capturing full data diversity; compared to VAEs, they avoid over-smoothing from variational bounds, generating sharp samples via noise prediction error minimization under maximum likelihood, making them ideal for high-fidelity synthetic data in scarce/complex scenarios like natural gas well fault diagnosis.

The diffusion model takes the minimization of noise prediction error as the explicit training objective, achieves stable convergence under the maximum likelihood framework, and its numerical properties outperform those of traditional generative models. Its multi-step iterative generation mechanism effectively reconstructs complex data structures and significantly improves the fidelity of sample details. This modeling approach demonstrates a strong capability for characterizing high-dimensional distributions, which is especially suitable for natural gas wells with multi-channel and strongly coupled working condition data. One of the core advantages of the diffusion model is its good controllability and generalization ability, which can generate representative synthetic data even with scarce fault samples and a sparse distribution, and provide reliable data support for downstream tasks [24,25,26,27].

Therefore, as shown in Figure 2, in this study, a one-dimensional DPM (1D DPM) is used as the underlying generative architecture, combined with a 1D U-Net neural network as the denoising network, and trained on standardized and processed multivariate time series datasets, so as to achieve high-fidelity and interpretable time series sample augmentation for critical faulty conditions of natural gas wells (e.g., freeze plugging of wellheads and fluid buildup in the wellbore, etc.).

3.2.1. Forward Diffusion Process

Let the original input sample be

x_{0}

, and perform Markovian noise-addition iterations at time step

t = 1, 2, \dots, T

according to the predefined noise scheduling

β_{1}, β_{2}, \dots, β_{T}

for Markovian noise-addition iterations to obtain a series of intermediate states

x_{t}

, as:

q (x_{t} ∣ x_{t - 1}) = N (x_{t}; \sqrt{1 - β_{t}} x_{t - 1}, β_{t} I) .

(3)

The recursive expansion yields an analytic formula for the state at any t moment:

q (x_{t} ∣ x_{0}) = N (x_{t}; \sqrt{\bar{α_{t}}} x_{0}, (1 - \bar{α_{t}}) I),

(4)

where

α_{t} = 1 - β_{t}

,

\bar{α_{t}} = \prod_{s = 1}^{t} α_{s}

denotes the cumulative retention ratio of the first t steps. Thus,

x_{t}

can be sampled directly without explicitly recursing the first

t - 1

steps by adding noise

ϵ \sim N (0, I)

to

x_{0}

:

x_{t} = \sqrt{\bar{α_{t}}} x_{0} + \sqrt{1 - \bar{α_{t}}} ϵ .

(5)

This expression is used in training to sample

x_{t}

as a model input.

3.2.2. Reverse Process

The goal of the inverse process is to reconstruct its previous step

x_{t - 1}

from any moment

x_{t}

, eventually recovering to

x_{0}

. Since the true posterior distribution

q (x_{t - 1} ∣ x_{t})

is not known, it is approximated by the neural network

p_{θ} (x_{t - 1} ∣ x_{t})

for modeling:

p_{θ} (x_{t - 1} ∣ x_{t}) = N (x_{t - 1}; μ_{θ} (x_{t}, t), Σ_{θ} (x_{t}, t)) .

(6)

In practical implementations,

Σ_{θ}

is usually set to a constant, while

μ_{θ}

is predicted by a denoising network (e.g., 1D U-Net) with the goal of approximating the true noise

ϵ

:

ϵ_{θ} (x_{t}, t) \approx ϵ .

(7)

Therefore, the training objective of the model is to minimize the mean square error between the predicted noise and the true noise:

L_{s i m p l e} (θ) = E_{x_{0}, ϵ, t} (| | ϵ - ϵ_{θ} (x_{t}, t) {| |}^{2}),

(8)

where

x_{t}

is computed by the forward formula and t is uniformly sampled during training.

3.2.3. Sampling Generation Process

Inference starts with pure Gaussian noise

x_{T} \sim N (0, I)

and progressively reconstructs

x_{0}

by performing backsampling through the network-predicted noise

ϵ_{θ}

:

x_{t - 1} = \frac{1}{\sqrt{α_{t}}} (x_{t} - \frac{1 - α_{t}}{\sqrt{1 - \bar{α_{t}}}} ϵ_{θ} (x_{t}, t)) + σ_{t} z, z \sim N (0, I),

(9)

where

σ_{t}

is an adjustable standard deviation term (usually taken as

\sqrt{β_{t}}

) that introduces sampling diversity. The final output

x_{0}

is the synthesized high-dimensional time series sample.

The diffusion-denoising framework has the properties of theoretical completeness, controllable implementation and stable generation. For the multi-channel sensing data of natural gas wells (dimension

C \times L

), after normalized preprocessing

\tilde{X} = \frac{X - μ}{σ}

, the generated target is modeled as a tensor

X_{0} \in R^{C \times L}

, the synthetic fault dataset constructed accordingly

D_{s y n t h}

can significantly improve the generalization ability of downstream fault prediction models.

3.3. CNN-Transformer Based Fault Diagnosis

This model is based on the fusion architecture of CNN and Transformer encoder for the fault-diagnosis task of multivariate time series data of natural gas wells. While a multilayer 1D convolutional structure extracts local dynamic features, the Transformer module subsequently models long-range temporal dependencies and global trends, which enhances the model’s ability to characterize the evolutionary pattern of complex working conditions. The structure has strong generalization ability and robustness, which can identify weak signals at the early failure stage of wellbore fluid accumulation, freeze plugging, etc., and realize accurate diagnosis.

The main advantages of the model are reflected in the following aspects: first, the combination of CNN and Transformer enables the model to have both local feature extraction and global dependency-modeling capabilities; second, the modular design can flexibly adapt to different monitoring dimensions and multi-sensor input scenarios; and third, the model focuses on accurate fault diagnosis based on current sensor data, which supports timely detection and classification of abnormal well conditions to assist in maintenance decisions. The specific network structure is shown in Figure 3.

3.3.1. Convolutional Feature-Extraction Module

In order to effectively extract the local dynamic features and trend changes in the multidimensional monitoring signals of natural gas wells, this paper designs a multilayer one-dimensional CNN (1D-CNN) structure as the front-end feature-extraction module of the fault-diagnosis model. The input data is a normalized 3D tensor (number of samples × sequence length × number of features) with the same number of channels as the sensor dimension.

The module consists of a stack of multiple convolutional units, each of which contains a one-dimensional convolutional layer, a batch normalization, a ReLU activation function, and a maximum pooling layer. The 1D convolutional layer is used to extract local temporal patterns and capture dynamic changes in the signal; batch normalization is used to stabilize the training process and accelerate the convergence speed; the ReLU activation function introduces nonlinear factors to improve the model expression ability; and the max-pooling operation is used to compress the temporal dimension and enhance the translation invariance of the features and the generalization ability. Through multi-layer cascading, the model can gradually extract multi-scale features to provide a rich representation base for subsequent global modeling.

The convolution kernel size is fixed to 3, and padding is set to 1 to ensure the same length of input and output. The cascade of multiple convolutional layers can extract pattern features at different scales, such as slow-change trend, mutation point, periodic perturbation, etc., which provides good sensing capability for failure modes such as fluid accumulation in wellbore, freezing and plugging. The feature-extraction module has the advantages of lightweight structure, wide sensing field, and high efficiency of parallel computation, which is suitable for the fault-recognition task of natural gas wells with high frequency and long time sequence.

3.3.2. Transformer Global Modeling Module

To further enhance the model’s ability to model temporal dependencies and global evolutionary patterns, this paper introduces the Transformer Encoder (TEC) after the convolutional feature-extraction module for deep modeling of sequence features. The module consists of multiple stacked Transformer Encoder Layers, each of which contains a multi-head self-attention mechanism to capture correlations between different moments, a feed-forward fully-connected network to enhance the model’s nonlinear modeling capability, and a residual connectivity and layer normalization structure to accelerate model training and mitigate the problem of gradient vanishing in the deep network. With this structure, the model is able to effectively integrate the contextual information into the time series data and enhance the ability to recognize the complex fault evolution process.

In practice, the Transformer Encoder supports batch-first format, so there is no need to explicitly transpose the dimensions. The number of attention heads in the model is set to 4, and the coding layers are stacked in 2 layers. The convolved output feature sequence is first fed into the Transformer Encoder, and inside the encoder, the model automatically learns the relationship weights between time points, so that critical moments (e.g., signal perturbations just before a fault is about to occur) have a higher impact on the overall representation. The output is still in the form of a sequence, retaining the original temporal structure, but the features of each time step have been incorporated into the global contextual information.

Unlike channel fusion, this structure focuses on contextual modeling in the time dimension, enabling the model to understand global semantic information such as working condition evolution trends, delay variations, and cycle failure signs. The structure is effective in recognizing common non-smooth faults in natural gas wells (e.g., fluid development, freeze plug formation), especially in the weak signal stage to improve the judgment stability.

3.3.3. Fully Connected Classification Layer with Output Mechanism

After feature extraction and modeling by convolution and Transformer coding module, the model obtains a set of deep feature representations rich in temporal dynamic information. To perform the final fault classification, a fully-connected neural network classifier is implemented at the model’s output stage, which is used to map the fixed-dimension feature vectors into the predicted probabilities of various types of working conditions.

First, the model compresses the sequence features output from the Transformer into a fixed-length vector along the time dimension by adaptive average pooling. Subsequently, this vector is input to the fully connected layer, where nonlinear expressiveness is introduced by the ReLU activation function, and moderate dimensionality reduction is performed to reduce redundant features. Finally, the predicted probability of each class of faults is output by Softmax classifier, and the model designates the category with the highest probability as the final prediction.

In the training phase, the model uses the cross-entropy loss function as the optimization objective to quantify the difference between the predicted distribution and the real label. Considering the significant difference in the number of samples of different fault categories in the natural gas well operation data, the category weighting mechanism is introduced in the training process, so that the model pays more attention to the recognition effect of a few categories of faults, such as wellbore fluid buildup, freeze plugging, etc., and improves the balance and stability of the overall diagnosis.

The model output consists of two parts: the first is the diagnosed working condition category, which indicates the type of the current well operating state; the second is the confidence score, which reflects the model’s confidence in the diagnosis and is used to assess the reliability of the result. When deployed, diagnostic results with confidence exceeding a preset threshold can assist operation and maintenance personnel in decision-making.

Through this mechanism, the model focuses on providing accurate classification and diagnosis of the current operating status of natural gas wells, helping to promptly identify fault conditions and support timely maintenance actions, thus ensuring safe and economical system operation.

4. Experiments and Analysis

This section is dedicated to the comprehensive experimental validation of our proposed framework. Our primary objective is to demonstrate the framework’s effectiveness, particularly in addressing the critical industrial challenge of data scarcity in fault diagnosis. To this end, we have structured our evaluation into four distinct parts. First, we will detail the experimental setup (Section 4.1), including the real-world dataset, preprocessing pipeline, evaluation metrics, and implementation specifics. Second, we will assess the quality of the synthetic data generated by our diffusion model and showcase its powerful application in data augmentation under various data-scarce scenarios (Section 4.2). Third, a thorough comparison with other mainstream approaches (Section 4.2) will be conducted to benchmark our method’s performance. Finally, a series of ablation studies (Section 4.3) will be presented to dissect our model’s architecture and validate the contribution of its key components, followed by a discussion of the broader implications and limitations of our work.

4.1. Experimental Setup

This section outlines the foundational setup for all subsequent experiments, including a detailed description of the dataset, the metrics used for evaluation, and the specifics of the model implementations.

4.1.1. Dataset and Preprocessing

The dataset for this study was sourced from a real-world monitoring system deployed in a natural gas field. It consists of multivariate time series data capturing three key physical parameters: casing pressure, oil pressure, and external transmission pressure. Based on operational logs, each sample is labeled into one of four classes: Normal Production (NP) and three distinct fault conditions—Wellbore Fluid Accumulation (WFA), Wellbore Freeze Plugging (WFP), and Wellhead Freeze Plugging (HFP). A critical characteristic of this dataset is the significant class imbalance, with fault instances being substantially rarer than normal operational data. This inherent data scarcity poses a major challenge for training robust diagnostic models and serves as the primary motivation for our data augmentation strategy.

During data preprocessing, missing values were handled via linear interpolation to preserve temporal continuity, while the Hampel filter was applied to eliminate outliers due to its interference resistance. Subsequently, as advocated by prior studies, we employed min-max normalization to standardize the values to the [0,1] interval. Finally, a sliding window technique was utilized to slice the original signals into time series segments of 120 timesteps, rendering them suitable for model input. The resulting dataset was then partitioned into training, validation, and testing sets at a 70%/15%/15% ratio. The specific distribution of samples, which highlights the aforementioned class imbalance, is detailed in Table 1. The validation set was strictly reserved for hyperparameter tuning and model selection.

4.1.2. Evaluation Metrics

To conduct a robust and quantitative evaluation of our framework, we employ a series of evaluation metrics, which are categorized into two types: one for assessing the quality of the generated data, and the other for evaluating the performance of the final fault diagnosis model.

First, to evaluate the quality of the generated data, we rigorously quantify the similarity between synthetic and real samples using three distinct metrics:

Earth Mover’s Distance (EMD): Also known as the first Wasserstein Distance ( $W_{1}$ ), EMD measures the similarity between the global distributions of the generated and real data by calculating the minimum “cost” to transform one distribution into another. A lower EMD value signifies a closer match between the two distributions. It is defined as:

$W_{1} (P_{r}, P_{g}) = inf_{γ \in Π (P_{r}, P_{g})} E_{(x, y) \sim γ} [∥ x - y ∥]$

(10)

where $P_{r}$ and $P_{g}$ represent the real and generated data distributions, respectively.
Kullback-Leibler Divergence: KL Divergence quantifies the difference between the probability distributions of the generated ( $P_{g}$ ) and real ( $P_{r}$ ) data. It effectively measures the information lost when approximating the real distribution with the generated one. A smaller KL divergence indicates a better approximation. It is defined as:

$D_{K L} (P_{r} ∥ P_{g}) = \sum_{x \in X} P_{r} (x) log (\frac{P_{r} (x)}{P_{g} (x)})$

(11)
Mean Squared Error (MSE): MSE assesses the direct, point-wise similarity between generated and real samples by calculating the average squared difference between corresponding data points. It provides a straightforward measure of reconstruction accuracy. A smaller MSE indicates a higher fidelity. It is defined as:

$MSE = \frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - {\hat{x}}_{i})}^{2}$

(12)

where $x_{i}$ and ${\hat{x}}_{i}$ are the i-th data points of the real and generated samples.

Additionally, we adopt the F1-Score as the primary metric to evaluate the effectiveness of the fault-diagnosis model. This metric is particularly suitable for our task as it provides a harmonic mean of Precision and Recall, offering a balanced assessment in the context of class imbalance. A higher F1-Score represents a better and more robust classification performance. It is calculated as:

F 1 = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall},

(13)

where:

Precision = \frac{T P}{T P + F P}, Recall = \frac{T P}{T P + F N},

(14)

4.1.3. Implementation Details

The proposed framework was implemented in Python 3.10 using the PyTorch 2.1 library. All models were trained and evaluated on a workstation equipped with an Intel Core Ultra 7 155H CPU, 32 GB of RAM, and a single NVIDIA GeForce RTX 4060 GPU.

For the diagnostic models, training was conducted for a maximum of 50 epochs with a batch size of 512. We employed the Adam optimizer with an initial learning rate of

3 \times 10^{- 4}

and utilized the cross-entropy loss as the objective function. To prevent overfitting, an early stopping mechanism with a patience of 10 epochs was implemented, monitoring the validation loss. The detailed structural parameters for both our proposed diagnostic model and the data-augmentation module are provided in Table 2. The final hyperparameters used for training are presented in Table 3.

4.2. Data Generation Evaluation and Application

This section evaluates the core contribution of our work: the diffusion-based data-augmentation framework. We first assess the quality and fidelity of the synthetic data generated by our DPM. Subsequently, we demonstrate its practical application and effectiveness in enhancing fault-diagnosis performance, particularly under challenging data-scarce conditions.

4.2.1. Evaluation of Generated Sample Quality

Before applying the synthetic data for downstream diagnostic tasks, it is imperative to first validate its quality and fidelity. A high-quality generative model must produce samples that are not only statistically congruent with the real data but also capture the characteristic temporal dynamics inherent to each operational condition. To this end, we conducted a comprehensive evaluation involving both qualitative visual inspection and rigorous quantitative analysis.

For a qualitative assessment, Figure 4 provides a direct visual comparison between authentic time series samples randomly selected from our dataset and synthetic samples generated by our trained DPM for each of the four operational classes. As can be observed, the generated signals (bottom row) adeptly replicate the distinct morphological patterns, amplitudes, and underlying structures of their real counterparts (top row). The synthetic data closely mirrors the subtle fluctuations and characteristic trends present in the authentic signals, rendering them virtually indistinguishable by visual inspection alone. This provides strong initial evidence of our model’s ability to capture the complex data-generating process.

To complement this qualitative assessment with a more rigorous, objective evaluation, we benchmarked our DPM against three representative deep generative models from recent literature. These baselines were selected to cover advanced variations of both GAN and VAE architectures tailored for fault-diagnosis tasks: 2D-CNN conditional GAN (2D-CNNGAN) [28], a method that combines a CGAN with a 2D CNN on image-like data representations; Multilabel one-dimensional GAN (ML1D-GAN) [29], a fully 1D convolutional GAN noted for its strong generalization ability on time series signals; and CVAE with centroid loss (CL-VAE) [30], a modified VAE that uses a penalization mechanism to improve sample fidelity. All models were trained on the same dataset to generate fault samples, and the quality of their outputs was quantitatively evaluated against the real samples using the metrics defined in Section 4.1.2.

The comprehensive results of this comparison are summarized in Table 4. The quantitative analysis unequivocally confirms the superiority of our proposed DPM. It consistently achieved the best (lowest) scores across all four evaluation metrics. Specifically, its low EMD and KL Divergence scores indicate that the data it generates has the highest fidelity to the real data’s global and probabilistic distributions. Furthermore, its leading performance in MSE and DTW demonstrates its exceptional ability to reconstruct both the point-wise values and the complex morphological shapes of the time series signals. This robust evaluation confirms that our DPM is a reliable and high-quality data source, laying a solid foundation for its subsequent application in data augmentation.

4.2.2. Effectiveness of Data Augmentation

Having established the high fidelity of the synthetic data generated by our DPM, we now evaluate its primary application: addressing the class imbalance inherent in our real-world dataset. As detailed in Table 1, the training set is characterized by a significant disparity in sample counts, with the Normal Production (NP) class vastly outnumbering the three fault classes (WFA, WFP, and HFP). This imbalance can bias a diagnostic model towards the majority class, leading to poor performance on the critical, underrepresented fault conditions.

To assess the effectiveness of our data-augmentation strategy in mitigating this bias, we designed an experiment. we first established a baseline performance by training the CNN-Transformer model solely on the original, imbalanced training set. Subsequently, we progressively augmented this training set by adding varying quantities of synthetic samples (100, 200, 500, and 1000) for each of the three minority fault classes (WFA, WFP, and HFP). This approach aims to rebalance the class distribution in the training data. The performance of the diagnostic model was then evaluated on the real-data test set for each level of augmentation, allowing us to directly quantify the benefits of the synthetic data.

The results of this experiment are visualized in Figure 5, which plots the macro F1-Score on the test set as a function of the number of synthetic samples added. The findings unequivocally demonstrate the profound and positive impact of our data-augmentation strategy. The baseline model, trained without any augmentation (0 added samples), achieved a macro F1-Score of 84.56%. While respectable, this performance is susceptible to biases from the imbalanced data. As synthetic fault samples were introduced, a clear and substantial trend of performance improvement was observed. The curve shows a steep ascent, with the F1-Score surging to 99.52% after augmenting the training set with 1000 synthetic samples for each minority class. This marked improvement of nearly 8 percentage points underscores the ability of the high-quality synthetic data to effectively compensate for the underrepresentation of fault classes. The rising curve robustly validates our framework’s capacity to mitigate classification bias and develop a more reliable and equitable fault-diagnosis system, highlighting its significant potential for practical industrial applications where collecting balanced fault data is often infeasible.

4.2.3. Comparison with Other Diagnostic Models

To further validate the superiority of our proposed CNN-Transformer architecture, we conducted a comparative analysis against several widely-used deep learning models for time series classification. This experiment aims to isolate the performance of the diagnostic model itself, demonstrating its advanced capability in capturing complex fault patterns from the augmented data.

For this comparison, we replaced our CNN-Transformer model with three alternative architectures: a standard 1D-CNN, a Long Short-Term Memory (LSTM) network, and a Gated Recurrent Unit (GRU) network. To ensure a fair and rigorous benchmark, all models were trained on the exact same augmented dataset, which consists of the original imbalanced training data supplemented with 1000 synthetic samples per minority class, as this configuration yielded the best performance in our previous experiment. All models were trained using the same protocol, with their hyperparameters individually optimized on the validation set.

The performance of each diagnostic model on the real-data test set is summarized in Table 5. Our proposed CNN-Transformer model achieved the highest macro F1-Score of 99.52%, outperforming all other architectures. The 1D-CNN, while effective, achieved a lower score of (e.g., 95.83%), likely due to its limitations in modeling long-range dependencies. The recurrent models, LSTM and GRU, scored (e.g., 93.54%) and (e.g., 94.98%) respectively, indicating their proficiency in handling sequential data but perhaps less so in extracting hierarchical features compared to our hybrid approach.

A more granular analysis of the classification behavior is provided in Figure 6 and Figure 7. The confusion matrices in Figure 6 show that while all models perform well, the baseline models (b–d) exhibit noticeable confusion between the WFP and HFP fault types. In contrast, our proposed CNN-Transformer (a) displays a nearly diagonal matrix, indicating significantly higher classification accuracy and minimal errors across all classes.

This performance difference is further explained by the t-SNE feature visualizations in Figure 7. The feature space of our CNN-Transformer (a) shows four distinct and well-separated clusters, demonstrating a highly discriminative feature representation. The baseline models (b–d), however, produce feature spaces with more overlap and less defined clusters, particularly for the fault classes. This provides compelling evidence that the superior feature learning capability of our hybrid architecture is the key driver behind its state-of-the-art diagnostic performance.

4.3. Ablation Studies and Discussion

To further dissect our proposed framework and validate the contribution of its key components, we conducted a series of ablation studies. This section presents these findings and concludes with a broader discussion of our method’s implications, limitations, and potential future directions.

4.3.1. Ablation Study

To validate the individual contributions of the key components within our proposed framework, we conducted a comprehensive ablation study. The study was designed to answer two critical questions: (1) To what extent does the DPM-based data augmentation contribute to the final diagnostic performance? (2) What is the synergistic effect of combining CNN and Transformer modules in the diagnostic model? To this end, we evaluated four distinct model configurations:

Full Framework: Our complete, proposed method (DPM Augmentation + CNN-Transformer).
No Augmentation: The CNN-Transformer model trained solely on the original, imbalanced dataset, removing the effect of the DPM.
Transformer-Only: A diagnostic model using only the Transformer component, to isolate the contribution of the CNN module.
CNN-Only: A diagnostic model using only the CNN component, to isolate the contribution of the Transformer module.

To ensure a fair comparison, all variants were trained under the identical hyperparameter settings and on the same augmented dataset (where applicable). The results, as measured by the macro F-score on the test set, are presented in Table 6.

The results of the ablation study provide clear and compelling insights. The most significant performance degradation, a drop of 14.96% in the F1-Score, occurred when data augmentation was removed. This starkly demonstrates that the DPM-based augmentation is the most critical component for overcoming the class imbalance problem and is fundamental to achieving high performance.

Furthermore, the necessity of the hybrid architecture is also evident. Removing either the CNN or the Transformer module resulted in performance drops of 3.69% and 2.38%, respectively. This can be attributed to the fact that the CNN-Only model, while adept at extracting local features, struggles to capture long-range temporal dependencies. Conversely, the Transformer-Only model may overlook fine-grained local patterns that the CNN is designed to capture. The superior performance of the full framework confirms that the CNN and Transformer components work synergistically: the CNN provides a powerful local feature representation that is then globally contextualized by the Transformer, leading to a more robust and discriminative final feature space for diagnosis.

4.3.2. Discussion

The comprehensive experimental results presented in this study robustly validate the effectiveness of our proposed framework, which synergistically integrates a high-fidelity diffusion model for data augmentation with a powerful CNN-Transformer architecture for fault diagnosis. Our findings highlight two key contributions. First, the DPM proved superior to other generative models in creating realistic, high-fidelity time series data, and its application as an augmentation tool was shown to be highly effective in mitigating the diagnostic bias caused by class imbalance. Second, the hybrid CNN-Transformer model consistently outperformed other standard architectures by effectively learning both local feature details and global temporal patterns, a capability confirmed by our ablation studies.

5. Conclusions

This paper proposes a multi-fault-diagnosis method for natural gas wells that integrates a diffusion model with a CNN-Transformer architecture, addressing key challenges in field data such as complex multivariate coupling, scarce fault samples, and dynamic operating conditions. Through a combination of data augmentation and time series modeling, the following critical findings are achieved:

In terms of data augmentation, the 1D diffusion generative network based on diffusion probabilistic modeling successfully generates high-fidelity and diverse time series data for small-sample fault states. Qualitative comparisons show that synthetic samples closely replicate the morphological patterns, amplitudes, and underlying structures of real samples, making them nearly indistinguishable by visual inspection. Quantitative metrics further confirm the superiority of the diffusion model: it achieves a lower Earth Mover’s Distance (EMD = 0.087), KL Divergence (0.245), and Mean Squared Error (MSE = 0.298) compared to baseline models such as 2D-CNNGAN, ML1D-GAN, and CL-VAE, validating its effectiveness in capturing subtle time series patterns.

For fault diagnosis, the hybrid CNN-Transformer model enables in-depth analysis and accurate classification of key operational parameters by fusing local feature extraction and global temporal modeling capabilities. Experimental results demonstrate that on the augmented dataset with 1000 synthetic samples added per minority class, the model achieves a macro F1-Score of 99.52%, significantly outperforming 1D-CNN (95.83%), LSTM (93.54%), and GRU (94.98%). Confusion matrices and t-SNE feature visualizations further reveal that the proposed model achieves near-perfect classification accuracy across all fault types, with four distinct and well-separated clusters in the feature space, whereas baseline models exhibit noticeable class confusion.

Ablation studies verify the necessity of each component: removing the diffusion-based augmentation module results in a 14.96% drop in performance, highlighting its critical role in mitigating class imbalance. Using only the CNN or Transformer leads to performance declines of 3.69% and 2.38%, respectively, confirming the synergistic effect of the two components in capturing local details and global dependencies.

In summary, the proposed data-augmentation mechanism and temporal classification framework outperform traditional methods in terms of structural consistency of generated samples, augmentation effectiveness, and multi-class fault-diagnosis accuracy. The framework is highly transferable and can be extended to other industrial monitoring scenarios with nonlinear and dynamic characteristics. Future work will focus on developing an online learning mechanism to adapt to concept drift from evolving well conditions and enhancing model interpretability to provide operators with actionable insights into fault causality, ultimately advancing a more practical and deployable intelligent fault-diagnosis system.

Author Contributions

Conceptualization, C.W., Y.L. (Yudong Li) and X.G.; methodology, J.W.; validation, Y.W. and Y.L. (Yufeng Liu); data curation, L.H. and F.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

Authors Chuanping Wang, Yudong Li, Jiajia Wang, Yuzhe Wang, Yufeng Liu, Ling Han and Fan Yang were employed by the PetroChina Xinjiang Oilfield Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The PetroChina Xinjiang Oilfield Company had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Zhu, J.; Cao, G.; Tian, W.; Zhao, Q.; Zhu, H.; Song, J.; Peng, J.; Lin, Z.; Zhang, H. Improved Data Mining for Production Diagnosis of Gas Wells with Plunger Lift through Dynamic Simulations. In Proceedings of the SPE Annual Technical Conference and Exhibition, Calgary, AB, Canada, 30 September–2 October 2019. [Google Scholar]
Chen, S.; Wang, Z.; Liu, L.; Liu, Y.; Chen, H.; Tang, X. A Framework Based Isolation Forest for Detecting Anomalies in Natural Gas Production. In Communications in Computer and Information Science; Springer: Singapore, 2021. [Google Scholar]
Tang, X.; Wang, J.; Zhu, Y.; Doss, R.; Han, X. Systematic evaluation of abnormal detection methods on gas well sensor data. In Proceedings of the 2021 IEEE Symposium on Computers and Communications (ISCC), Athens, Greece, 5–8 September 2021. [Google Scholar]
Jiang, C.; Zhao, W.; Yu, M.; Zhang, K. A Model-Based Systematic Innovative Design for Sonic Logging Instruments in Natural Gas Wells. In Proceedings of the Italian National Conference on Sensors, Naples, Italy, 23–25 July 2024. [Google Scholar]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
Nichol, A.Q.; Dhariwal, P. Improved denoising diffusion probabilistic models. In Proceedings of the International Conference on Machine Learning. PMLR, Virtual, 18–24 July 2021; pp. 8162–8171. [Google Scholar]
Zhao, P.; Zhang, W.; Cao, X.; Li, X. Denoising diffusion probabilistic model-enabled data augmentation method for intelligent machine fault diagnosis. Eng. Appl. Artif. Intell. 2025, 139, 109520. [Google Scholar] [CrossRef]
Yang, X.; Ye, T.; Yuan, X.; Zhu, W.; Mei, X.; Zhou, F. A novel data augmentation method based on denoising diffusion probabilistic model for fault diagnosis under imbalanced data. IEEE Trans. Ind. Inform. 2024, 20, 7820–7831. [Google Scholar] [CrossRef]
Zhang, T.; Lin, J.; Jiao, J.; Zhang, H.; Li, H. An interpretable latent denoising diffusion probabilistic model for fault diagnosis under limited data. IEEE Trans. Ind. Inform. 2024, 20, 10354–10365. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, W.; Wen, S.; Ding, Q. Augmentation framework for HVAC fault diagnosis based on denoising diffusion models. J. Build. Eng. 2025, 106, 112646. [Google Scholar] [CrossRef]
Zhu, Q.X.; Qian, Y.S.; Zhang, N.; He, Y.L.; Xu, Y. Multi-scale Transformer-CNN domain adaptation network for complex processes fault diagnosis. J. Process Control 2023, 130, 103069. [Google Scholar] [CrossRef]
Chen, H.; Wei, J.; Huang, H.; Wen, L.; Yuan, Y.; Wu, J. Novel imbalanced fault-diagnosis method based on generative adversarial networks with balancing serial CNN and Transformer (BCTGAN). Expert Syst. Appl. 2024, 258, 125171. [Google Scholar] [CrossRef]
Lai, Y.; Li, R.; Ye, Z.; He, Y. A new fault-diagnosis method for reciprocating piston pump based on feature fusion of CNN and transformer encoder. Sci. Prog. 2025, 108, 00368504251330003. [Google Scholar] [CrossRef]
Zhang, S.; Zhou, J.; Ma, X.; Pirttikangas, S.; Yang, C. TSViT: A time series vision transformer for fault diagnosis. arXiv 2023, arXiv:2311.06916. [Google Scholar] [CrossRef]
Lu, Q.; Al-Wahaibi, S.S. Enhanced CNN with Global Features for Fault Diagnosis of Complex Chemical Processes. arXiv 2022, arXiv:2210.01727. [Google Scholar] [CrossRef]
Pei, X.; Zheng, X.; Wu, J. Rotating machinery fault diagnosis through a transformer convolution network subjected to transfer learning. IEEE Trans. Instrum. Meas. 2021, 70, 1–11. [Google Scholar] [CrossRef]
Waqas, M.; Jamil, M. Smart IoT SCADA system for hybrid power monitoring in remote natural gas pipeline control stations. Electronics 2024, 13, 3235. [Google Scholar] [CrossRef]
Meng, Y.; Han, B.; Wang, J.; Chu, J.; Yao, H.; Zhao, J.; Zhang, L.; Li, Q.; Song, Y. Hydrate Blockage in Subsea Oil/Gas Pipelines: Characterization, Detection, and Engineering Solutions. Engineering 2024, 46, 363–382. [Google Scholar] [CrossRef]
Abhulimen, K.E.; Abhulimen, K.; Oladipupo, A. Modelling of liquid loading in gas wells using a software-based approach. J. Pet. Explor. Prod. Technol. 2023, 13, 1–17. [Google Scholar] [CrossRef]
Pan, C.Q.; Su, L.Y.; Xiong, L.; Yang, J.L.; Li, F.L. CT-DDPM: Anomaly Detection of Multivariate Time Series with Copula and Transformer-based Denoising Diffusion Probabilistic Models. Inf. Sci. 2025, 717, 122279. [Google Scholar]
Mazé, F.; Ahmed, F. Diffusion Models Beat GANs on Topology Optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023. [Google Scholar]
Vivekananthan, S. Comparative Analysis of Generative Models: Enhancing Image Synthesis with VAEs, GANs, and Stable Diffusion. arXiv 2024, arXiv:2408.08751. [Google Scholar] [CrossRef]
Liu, E.h.; Ning, X.; Lin, Z.H.; Yang, H.; Wang, Y. OMS-DPM: Optimizing the Model Schedule for Diffusion Probabilistic Models. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
Liu, J.; Wang, J.; Gao, F.; Ju, Y.; Wang, X. Analytical solution for shale gas productivity of a multiple-fractured horizontal well based on a diffusion model. Arab. J. Sci. Eng. 2018, 43, 2563–2579. [Google Scholar] [CrossRef]
Yao, Q.; Bing, H.; Zhu, G.; Xiang, L.; Hu, A. A novel stochastic process diffusion model for wind turbines condition monitoring and fault identification with multi-parameter information fusion. Mech. Syst. Signal Process. 2024, 214, 111397. [Google Scholar] [CrossRef]
Zhang, P.; Lin, Y.; Cui, H.; Gu, J. Channel Attention-Based Conditional Diffusion Model Applied to Fault Diagnosis Under Imbalanced Data. Electronics 2024, 13, 4807. [Google Scholar] [CrossRef]
Wang, C.; Huang, C.; Zhang, L.; Xiang, Z.; Xiao, Y.; Qian, T.; Liu, J. Denoising Diffusion Implicit Model Combined with TransNet for Rolling Bearing Fault Diagnosis Under Imbalanced Data. Sensors 2024, 24, 8009. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Liu, J.; Xie, J.; Wang, C.; Ding, T. Conditional GAN and 2-D CNN for bearing fault diagnosis with small samples. IEEE Trans. Instrum. Meas. 2021, 70, 1–12. [Google Scholar] [CrossRef]
Guo, Q.; Li, Y.; Song, Y.; Wang, D.; Chen, W. Intelligent fault diagnosis method based on full 1-D convolutional generative adversarial network. IEEE Trans. Ind. Inform. 2019, 16, 2044–2053. [Google Scholar] [CrossRef]
Dixit, S.; Verma, N.K. Intelligent condition-based monitoring of rotary machines with few samples. IEEE Sens. J. 2020, 21, 1007–1016. [Google Scholar] [CrossRef]

Figure 1. Framework diagram of the technical route.

Figure 2. Schematic diagram of the diffusion model.

Figure 3. Structure of CNN-Transformer.

Figure 4. Comparison of actual and diffusion-generated samples.

Figure 5. Comparison of actual and diffusion-generated samples.

Figure 6. Confusion matrices of different diagnostic models on the test set.

Figure 7. Confusion matrix of each model.

Table 1. Distribution of samples in the rationalized dataset.

Label	Training Samples	Validation Samples	Testing Samples	Total
NP	1540	330	330	2200
WFA	280	60	60	400
WFP	210	45	45	300
HFP	140	30	30	200
Total	2170	465	465	3100

Table 2. Structural parameters of the proposed models.

Component	Module	Layer Name	Kernel Size/Nodes	Output Channels	Operation/Activation
CNN-Transformer	CNN Extractor	1-Conv1d	3	16	BN, ReLU
		1-MaxPool1d	2	16	–
		2-Conv1d	3	32	BN, ReLU
		2-MaxPool1d	2	32	–
		3-Conv1d	3	64	BN, ReLU
		3-MaxPool1d	2	64	–
		4-Conv1d	3	128	BN, ReLU
		4-MaxPool1d	2	128	–
	Transformer	Multi-Head Attention	(4 heads)	128	Add & Norm, Dropout (0.5)
	Transformer	Feed Forward	128	128	Add & Norm
	Classifier Head	AdaptiveAvgPool1d	1	128	–
	Classifier Head	Fully-Connected	4 (output nodes)	–	Softmax
DPM	1D U-Net	Down-sampling Path	3, 4 (stride 2)	32, 64	ConvBlock1D
		Bottleneck	3	128	ConvBlock1D
		Up-sampling Path	3, Upsample (2)	64, 32	ConvBlock1D
		Time Embedding	(Sinusoidal)	32	MLP
		Output Layer	1	3	Conv1d

Table 3. Parameter settings for the experiments.

Parameter	Value	Parameter	Value
Learning rate	$3 \times 10^{- 4}$	Optimizer	Adam
Epochs	50	Diffusion Timesteps	1000
Batch size	512	Early Stopping Patience	10

Table 4. Quantitative comparison of data generation quality.

Model	EMD ↓	KL Divergence ↓	MSE ↓
2D-CNNGAN	0.132	0.401	0.365
ML1D-GAN	0.198	0.176	0.371
CL-VAE	0.142	0.516	0.558
DPM (Ours)	0.087	0.245	0.298

Table 5. Performance comparison of different diagnostic models on the augmented dataset.

Diagnostic Model	Macro F1-Score (%)
1D-CNN	95.83
LSTM	93.54
GRU	94.98
Ours (CNN-Transformer)	99.52

Table 6. Results of the ablation study, evaluating the contribution of each component.

Model Configuration	Macro F1-Score (%)	Performance Drop (%)
Full Framework (Ours)	99.52	–
No Augmentation	84.56	14.96
Transformer-Only (w/Augmentation)	97.14	2.38
CNN-Only (w/Augmentation)	95.83	3.69

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, C.; Li, Y.; Wang, J.; Wang, Y.; Liu, Y.; Han, L.; Yang, F.; Gao, X. A Diffusion Model-Empowered CNN-Transformer for Few-Shot Fault Diagnosis in Natural Gas Wells. Processes 2025, 13, 2608. https://doi.org/10.3390/pr13082608

AMA Style

Wang C, Li Y, Wang J, Wang Y, Liu Y, Han L, Yang F, Gao X. A Diffusion Model-Empowered CNN-Transformer for Few-Shot Fault Diagnosis in Natural Gas Wells. Processes. 2025; 13(8):2608. https://doi.org/10.3390/pr13082608

Chicago/Turabian Style

Wang, Chuanping, Yudong Li, Jiajia Wang, Yuzhe Wang, Yufeng Liu, Ling Han, Fan Yang, and Xiaoyong Gao. 2025. "A Diffusion Model-Empowered CNN-Transformer for Few-Shot Fault Diagnosis in Natural Gas Wells" Processes 13, no. 8: 2608. https://doi.org/10.3390/pr13082608

APA Style

Wang, C., Li, Y., Wang, J., Wang, Y., Liu, Y., Han, L., Yang, F., & Gao, X. (2025). A Diffusion Model-Empowered CNN-Transformer for Few-Shot Fault Diagnosis in Natural Gas Wells. Processes, 13(8), 2608. https://doi.org/10.3390/pr13082608

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Diffusion Model-Empowered CNN-Transformer for Few-Shot Fault Diagnosis in Natural Gas Wells

Abstract

1. Introduction

2. Brief Introduction of Natural Gas Well System

2.1. Gas Well Production and Monitoring System Overview

2.2. Typical Fault Modes and Data Signatures

2.3. Problem Statement

3. Methodology

3.1. Overall Framework

3.2. Fault Data Augmentation via Diffusion Model

3.2.1. Forward Diffusion Process

3.2.2. Reverse Process

3.2.3. Sampling Generation Process

3.3. CNN-Transformer Based Fault Diagnosis

3.3.1. Convolutional Feature-Extraction Module

3.3.2. Transformer Global Modeling Module

3.3.3. Fully Connected Classification Layer with Output Mechanism

4. Experiments and Analysis

4.1. Experimental Setup

4.1.1. Dataset and Preprocessing

4.1.2. Evaluation Metrics

4.1.3. Implementation Details

4.2. Data Generation Evaluation and Application

4.2.1. Evaluation of Generated Sample Quality

4.2.2. Effectiveness of Data Augmentation

4.2.3. Comparison with Other Diagnostic Models

4.3. Ablation Studies and Discussion

4.3.1. Ablation Study

4.3.2. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI