Next Article in Journal
Exact Analytical Solutions for Free Single-Mode Nonlinear Cantilever Beam Dynamics: Experimental Validation Using High-Speed Vision
Previous Article in Journal
A Robot Welding Clamp Force Control Method Based on Dual-Loop Adaptive RBF Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Oil and Gas Pipeline Leakage Detection Based on MSCNN-Transformer

1
College of Mathematics and Statistics, Northeast Petroleum University, Daqing 163319, China
2
Artificial Intelligence Energy Research Institute, Northeast Petroleum University, Daqing 163319, China
3
School of Electrical Information Engineering, Northeast Petroleum University, Daqing 163319, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(1), 480; https://doi.org/10.3390/app16010480
Submission received: 7 December 2025 / Revised: 31 December 2025 / Accepted: 31 December 2025 / Published: 2 January 2026
(This article belongs to the Section Energy Science and Technology)

Abstract

The leakage detection of oil and gas is very important for the safe operation of pipelines. The existing working condition recognition methods have limitations in processing and capturing complex multi-category leakage signal characteristics. In order to improve the accuracy of oil and gas pipeline leakage detection, a multi-scale convolutional neural network-Transformer (MSCNN-Transformer)-based oil and gas pipeline leakage condition recognition method is proposed. Firstly, in order to capture the global information and nonlinear characteristics of the time series signal, STFT is used to generate the time-frequency image. Furthermore, in order to enrich the feature information from different dimensions, the one-dimensional signal and the two-dimensional time-frequency image are sampled by multi-scale convolution, and the global relationship is established by combining the multi-head attention mechanism of the Transformer module. Finally, the leakage signal is accurately identified by fusing features and classifiers. The experimental results show that the proposed method shows high performance on the GPLA-12 data set, and the recognition accuracy is 96.02%. Compared with other leakage signal recognition methods, the proposed method has obvious advantages.

1. Introduction

With the rapid development of oil and gas industry, the proportion of natural gas in China‘s energy structure continues to rise as a clean and efficient energy. Because of its large capacity, stable operation and low cost, pipeline transportation has become the main way of natural gas transportation [1]. However, during the long-term operation of natural gas pipelines, they are susceptible to natural corrosion, geological activities, man-made damage and other factors, and there is a high risk of leakage [2]. Once a leak occurs, it will not only cause serious waste of resources and economic losses, but also may cause major safety accidents such as fire, explosion, environmental pollution and threaten the safety of people’s lives and property. Therefore, it is of great practical significance to realize the rapid and accurate identification of natural gas pipeline leakage for ensuring the safety of energy transmission and reducing the cost of operation and maintenance [3].
In recent years, pipeline leakage detection technology has been continuously developed, and a variety of technical routes have been gradually formed, represented by traditional signal processing, machine learning and deep learning. In early studies, machine learning methods have been widely used in pipeline condition identification and leakage judgment. For example, Li et al. [4] used six typical models such as random forest, AdaBoost and support vector machine (SVM) to classify and identify the valve switching state and internal leakage in product oil pipelines, and verified the adaptability of different algorithms in practical applications. However, these methods usually rely on manual design features, and the model structure is shallow, so it is difficult to fully capture the deep rules of complex faults, and the performance is limited in the face of variable working conditions [5].
With the rise of deep learning technology, methods based on convolutional neural network (CNN) have received extensive attention due to their powerful automatic feature extraction capabilities. Zhang Ruicheng et al. [6] used one-dimensional CNN to directly process the acoustic emission signal generated by gas pipeline leakage, and realized adaptive learning of features. Ning et al. [7] improved the AlexNet network by adjusting the convolution kernel to a flat shape, which better matched the narrow-band line spectrum characteristics of the valve leakage ultrasonic signal and improved the recognition accuracy. Zhong et al. [8] proposed a one-dimensional multi-scale deep separable convolution model, which enhanced the generalization ability while reducing the model parameters, and realized the end-to-end detection from the original signal to the classification result, with an accuracy rate of 90.72%.
In addition, according to the characteristics of time series signals, recurrent neural network (RNN) and its improved structure also show good potential. Nie Wei et al. [9] combined CNN with long short-term memory network (LSTM) to construct CNN-LSTM model, which effectively captured the time evolution law of leakage signal and provided a new idea for pipeline leakage detection. Mohamed et al. [10] proposed an improved bidirectional LSTM (BiLSTM) model, which performs well in bearing fault diagnosis under variable load and multiple working conditions. Its powerful time series modeling ability also provides a useful reference for pipeline leakage analysis.
In order to further improve the detection performance, researchers have carried out in-depth exploration in signal preprocessing, model fusion and multi-source information integration. In terms of feature extraction, time-frequency analysis tools such as variational mode decomposition (VMD) and wavelet transform are widely used for signal denoising and feature enhancement. Zheng et al. [11] combined VMD with deep learning to effectively solve the problem that the small leakage characteristics of water supply pipelines are difficult to extract. Ma Liang et al. [12] used the frost ice optimization algorithm (RIME) to optimize the key parameters of VMD, combined with the Bubble entropy feature and the RIME-ELM model, successfully realized the high-precision identification of weak leakage of water supply pipelines, and the signal-to-noise ratio reached 23.922 dB. Tang Yongjun et al. [13] combined wavelet transform with singular value decomposition (SVD) to extract the waveform features of the swing signal of hydropower units, and fused with the image features extracted by CNN, which significantly improved the accuracy of fault identification. Song Hongzhi et al. [14] innovatively used the “energy difference method + synthetic spectral kurtosis method” to optimize the VMD parameters, and screened the effective intrinsic modal components through multi-dimensional evaluation. The accurate reconstruction of fault signals in strong noise environment provides a new solution to solve the interference problem in pipeline leak detection.
In terms of model fusion, the combination of CNN and Transformer has become a research hotspot. Li Lang et al. [15] proposed CNN-Transformer architecture to process distributed acoustic sensing (DAS) data. CNN is used to extract local spatial features, and then Transformer is used to capture global dependencies, which enhances the ability to characterize complex leakage signals. Liang Xiaolong et al. [16] applied this structure to the leakage diagnosis of heating pipe network. The accuracy of the test set was 13.21% and 7.49% higher than that of LSTM and GRU, respectively, which verified its effectiveness under complex working conditions. Meng Hongyu et al. [17] constructed a CNN-BiLSTM-Attention hybrid model. Firstly, CNN extracts detailed features, BiLSTM captures two-way time dynamics, and finally focuses on key information through the attention mechanism, which improves the detection accuracy while ensuring speed, and provides a reference for spatio-temporal feature fusion. Zhu Jin et al. [18] designed a cascade framework of ‘wavelet packet analysis-time series image transformation-CNN + GRU’, using CNN to extract spatial patterns and GRU to model time series, which significantly improved the speed and accuracy of fault diagnosis.
In terms of multi-source information fusion, researchers have tried to integrate multiple sensor data to improve detection robustness. Satterlee et al. [19] proposed a parallel multi-layer sensor fusion strategy, which combines hydrophone, acoustic emission and vibration signals, and combines small-sample learning technology to effectively improve leakage detection and positioning accuracy. Song et al. [20] integrated transient electromagnetic method (TEM) and weak magnetic detection method (WMDM) signals to construct a triple attention dual-branch fusion network (TA-DBF), which alleviated the data imbalance problem of buried pipelines in complex environments. Wu et al. [21] used Bayesian optimization algorithm to optimize stacking drop.
At the same time, in view of the special needs of pipeline leakage detection, some innovative methods are constantly emerging: Lizhong et al. [2] proposed an integrated scheme of acoustic feature processing and reconstruction, which integrates frequency domain denoising, time domain enhancement and 1D-CNN feature reconstruction, and achieves 95.17% fault recognition accuracy on the GPLA-12 data set; Mishra et al. [22] innovatively transformed the time-frequency domain features into a visibility graph network, combined with a graph convolutional network (GCN) for modeling, which further enhanced the ability to distinguish complex leakage patterns. Fangli et al. [23] proposed SE-CNN model, which introduces spectrum enhancement (SE) technology to strengthen leakage signal, suppress background noise, and realize data compression. The detection accuracy is 94.3%, and the training time is 90.6% shorter than that of traditional CNN, which meets the needs of real-time detection in industrial field. Song L et al. [24] used piecewise aggregate approximation (PAA) filtering combined with continuous wavelet transform (CWT) to transform one-dimensional acoustic signal into two-dimensional acoustic image, and constructed CWT-CNN model, which achieved 95% accuracy in small-scale leakage identification of non-metallic pipelines. Wang et al. [25] proposed a CNN + Mamba dual-modal fusion model, which converts the signal into a spectrogram by short-time Fourier transform (STFT). CNN extracts spatial features, Mamba captures temporal dynamics, and cooperates with the CosFace loss function, which significantly improves the recognition ability of fine-grained leakage levels. Luo Zhiyong et al. [26] proposed the EDViT network. Through kurtosis-weighted multi-sensor fusion, STFT image processing, combined with dual-attention convolution module and dual-branch patch visual converter, the EDViT network realizes the collaborative extraction of local and global features, showing excellent generalization and anti-noise ability. Dong et al. [27] proposed the SGAN model, used the sparrow search algorithm to optimize the GRU hyperparameters, and introduced the abnormal attention mechanism to amplify the feature difference between the leakage and the normal state, which significantly improved the detection accuracy and reduced the false positives and false negatives. Du et al. [28] proposed the CABC-BP model, and used the Logistic chaotic mapping to improve the artificial bee colony algorithm (ABC) to optimize the weights and thresholds of the BP neural network. The accuracy of the first-level diagnosis was 98.33%, and the second-level was 95.83%, which provided a high-precision solution for the leakage location of the heating pipe network. Zhang Ruirui et al. [29] proposed a corresponding identification method for the classification of leakage acoustic signals in water supply networks, which provides a practical reference for the intelligent processing of pipeline acoustic signals. Nkemeni Valeryet al. [30] distributed Kalman filter is applied to pipeline leakage detection.
Although the existing technology has made significant progress, the leakage detection of natural gas pipelines still faces many challenges: small leakage signals are weak and easily submerged by environmental noise, variable working conditions lead to unstable signal distribution and limited model generalization ability, and some scenarios have sample imbalance or data scarcity problems. It is difficult for a single sensor or model to fully reflect the leakage characteristics, which can easily lead to misjudgment and omission. Fine-grained leakage degree identification and size estimation still need to be improved. In addition, in the actual leakage scenario, the factors such as the heterogeneity of underground strata (such as natural fractures) and the interaction between fluid and medium will make the leakage fluid migration law and signal propagation characteristics more complicated [31]. However, the existing detection methods often do not fully consider such factors closely related to the actual working conditions, resulting in a deviation between the model prediction and the real situation (for example, the flow of the leakage fluid in the underground medium is affected by the pore structure and ion composition of the formation, which indirectly changes the signal propagation path and intensity and increases the difficulty of feature extraction) [32]. In summary, the current technology has accumulated rich achievements in model design, feature engineering, and information fusion. However, in complex practical scenarios such as small samples, multiple working conditions and strong noise, there are still problems such as insufficient feature recognition and weak model robustness. Therefore, based on the actual operation requirements of natural gas pipelines and the advantages of existing technologies, this paper focuses on the key directions of deep learning model optimization, multi-scale feature fusion, and anti-interference ability improvement, and proposes a multi-scale convolution-Transformer method. The purpose is to construct a leakage detection scheme with high precision, strong robustness and good engineering applicability, and provide technical support for the safe and efficient operation of natural gas pipelines.
The main innovation of this paper:
1. Aiming at the problem of single data feature of one-dimensional leakage signal, STFT is used to transform one-dimensional signal into two-dimensional time-frequency diagram to solve the problem that traditional one-dimensional signal is difficult to capture time-domain and frequency-domain features at the same time, and multi-scale convolutional neural network is used to simultaneously extract local details of one-dimensional signal and two-dimensional time-frequency diagram to obtain richer features.
2. Aiming at the fact that CNN is sensitive to the local details of the leakage signal, but cannot correlate the key features that are far away from each other, a multi-head attention mechanism is introduced to capture the long-distance dependence of the signal, so that the model can capture the global features of the leakage signal.
3. The traditional classification task only relies on the cross entropy loss to optimize the classification head, which easily leads to the loose distribution of features in the high-dimensional space, especially the weak recognition ability of the model for small sample categories, and the introduction of comparative learning loss to achieve end-to-end important hyperparameter optimization.
The structure of this paper is as follows: Section 2 “Related Work” introduces the main structures of each module of the models involved in this paper. Section 3 “Proposed Leakage Detection Method” elaborates in detail the complete architecture and design logic of the Multi-scale Convolutional Neural Network-Transformer (MSCNN-Transformer) model. Section 4 “Experimental Design and Result Verification” comprehensively verifies the model performance and engineering adaptability. Specifically, comparative experiments are conducted to verify the core performance of the model under standard working conditions; ablation experiments are carried out to quantify the contribution of each core module; and robustness tests are also implemented. Section 5 “Discussion” objectively expounds on the limitations of the data set and key considerations for industrial deployment, and looks forward to the optimization directions for technology implementation based on preliminary test results. Section 6 “Conclusions” summarizes the core findings and innovations of this study, and clarifies the model’s breakthroughs in feature complementarity, global dependency modeling, and small-sample optimization. In the future, efforts will be made to promote on-site testing of actually operating pipelines and expand the functions of leak localization and aperture quantification.

2. Related Work

2.1. Short Time Fourier Transform (STFT)

The traditional Fourier transform can only reflect the global frequency distribution of the signal, and cannot capture the time-varying frequency characteristics of the non-stationary signal. The STFT segments the signal through a sliding time window.
Fourier transform is performed in each window to realize the synchronous representation of time-frequency two-dimensional information. The formula is expressed as
X ( τ , f ) = + x ( t ) w ( t τ ) e j 2 π f t d t
where: x ( t ) denotes the input one-dimensional time-domain signal; w ( t τ ) is the sliding time window; τ represents the center position of the window (time parameter); f denotes the frequency, X ( τ , f ) denotes the complex frequency domain result of STFT, and its modulus X ( τ , f ) constitutes the time-frequency image, which reflects the frequency distribution intensity of the signal at different times.
For the pipeline leakage signal, STFT can clearly capture the frequency mutation when the leakage occurs by adjusting the window length. At the same time, the localization characteristics of the window function can suppress the influence of external sudden noise and retain the time-frequency characteristics of the leakage signal.
The time domain can only reflect the change in signal amplitude with time, while the STFT can decompose the signal into time-frequency domain representation, revealing the frequency components of the signal in different time periods. This time-frequency domain signal and the time domain signal complement each other, which helps to capture more signal characteristics. Therefore, the short-time Fourier transform is applied to the normalized signal, and the generated spectrum map is input into the network model together with the original signal, thereby improving the model‘s ability to extract the characteristics of the leakage signal and classification performance.
Steps of STFT
Step1: Select the window function. In the early stage, the Hanning window was better than the Gaussian window/Blackman by grid search. Therefore, this article chooses the Hanning window as the window.
Step2: Divide the whole signal into multiple short time periods according to a certain time interval through a sliding window, and each short time period is called a frame.
Step3: Windowing processing, multiplying each frame signal by a window function to reduce the edge effect.
Step4: Fourier transform. Discrete Fourier transform is performed on each frame after windowing to obtain the spectral information of the frame signal.
Step5: Result storage. The results of each frame after Fourier transform are stored in order to form a two-dimensional representation, one of which represents time and the other represents frequency.
The flow chart of STFT is shown in Figure 1.

2.2. Transformer

The Transformer network structure is a deep learning model architecture for sequence-to-sequence learning tasks. It has achieved great success in natural language processing tasks such as machine translation and dialog generation. It is now gradually applied to the field of time series data processing. Its core is the multi-head self-attention mechanism. This mechanism can effectively capture long-distance dependencies and integrate information in global time series data.
The self-attention feature calculates the attention through the triple ( Q , K , V ), all of which are represented by the input features. By calculating the dot product of the query vector Q and the key vector K , and then scaling and Softmax operations, the attention weight is obtained. The query vector Q is the element of the query, the key vector K represents the position encoding of the feature, and the value vector V represents the value of each element. The mechanism is mainly composed of two parts, linear transformation and scaling point projection. The overall calculation process is
Ψ ( X ) = S D P ( Q X , K X , V X ) = S D P ( X W Q , X W K , X W V )
s o f t m a x X W Q ( X W K ) T d k X W V
where Ψ ( ) represents the whole process of self-attention, S D P represents the projection of the scaling point, X represents the input feature W Q , W K , W V is the learning parameter in the linear projection, and d k is the dimension of the key vector.
The common attention is to map the features to the same high-dimensional space to calculate the similarity, while the multi-head attention is to map the features to different subspaces of the high-dimensional space. Specifically, Q , K and V are linearly mapped times, and then each mapping is executed in parallel to output different output headers. Finally, all output headers are connected and predicted again.
The multi-headed attention mechanism is shown in Figure 2.

3. Methodology

3.1. Multi-Scale CNN-Transformer Model

In this paper, a multi-scale CNN-Transformer network architecture is proposed to identify the working conditions of pipeline leakage signals. The model fully combines the advantages of multi-scale CNN in local feature extraction and Transformer’s ability to model long-term temporal dependencies, which can effectively improve the accuracy of leakage identification. Specifically, the model first transforms the one-dimensional signal into the corresponding two-dimensional time-frequency signal map through STFT. In the feature extraction stage, the structure of multi-scale parallel convolution module combined with Transformer encoder is constructed. Multi-scale parallel convolution includes multi-scale 1DCNN branch and multi-scale 2DCNN branch.
The multi-scale 1DCNN branch sets three convolution blocks with small, medium and large convolution kernels (3 × 1, 7 × 1, 15 × 1). Each convolution block contains convolution layer, batch normalization (BatchNorm) and Rectified Linear Unit (ReLU) [33] activation function, which are used to extract different time-domain features of one-dimensional signals. Similarly, the multi-scale 2DCNN branch sets three convolution blocks of small, medium and large convolution kernels (3 × 3, 5 × 5, 7 × 7). Each convolution block contains convolution layer, BatchNorm and ReLU activation function, which are used to extract different two-dimensional signal time-frequency diagram frequency domain features. After that, the output of the multi-scale 1DCNN branch is spliced, and the feature coding is performed by Transformer Encoder to capture the long-term dependence of the one-dimensional signal, and then the global feature is extracted by global average pooling and global maximum pooling. The output of the multi-scale 2DCNN branch is spliced, flattened and input into another Transformer encoder to capture the long-term dependence of the two-dimensional time-frequency map. The global features are also extracted by global average pooling and global maximum pooling. Spell these pooled features together. The CNN-Transformer fusion feature is obtained.
In order to clearly present the configuration details of each core module of the proposed MSCNN-Transformer model (whose parameter setting is the key to guarantee the performance of feature extraction and modeling), Table 1 lists the basic structural parameters of the model, covering the multi-scale convolution kernel specifications of 1DCNN and 2DCNN, the attention mechanism and feedforward network parameters of Transformer module, and the full connection layer dimension setting of classification module, which is convenient for intuitive understanding of the architecture design logic of the model.
The parameter configuration of signal preprocessing and the parameter selection of model training are the key experimental links to ensure the effective extraction of leakage signal features and the stable convergence of the model. In order to clearly present the preprocessing settings of short-time Fourier transform (STFT) and the core parameters of model training in this study, Table 2 summarizes the relevant configuration items and corresponding values, covering the key parameters of STFT such as window type and length, as well as the training parameters such as optimizer and learning rate strategy, which provides a clear basis for the reproducibility of the experiment.
In order to visually present the complete processing flow of the proposed MSCNN-Transformer oil and gas pipeline leakage detection method, and clearly show the logical correlation of each core link from leakage signal acquisition to working condition identification, Figure 3 shows the structural block diagram of the method. The block diagram covers key modules such as data preprocessing, multi-scale CNN feature extraction, Transformer feature extraction and classification decision, and fully presents the full-link detection logic of “signal preprocessing-multi-dimensional feature extraction-global dependency modeling-condition classification”, which is convenient for clarifying the synergy and data flow path of each functional module.

3.2. Multi-Scale CNN Module

The traditional CNN module is a deep neural network dedicated to processing high-dimensional data, which was initially widely used in the field of image recognition. With the deepening of research, the application of CNN has gradually expanded to the field of one-dimensional signal processing. Its basic structure includes input layer, convolution layer, pooling layer, activation layer and fully connected layer. Each layer works together to achieve efficient modeling and classification of signals. Convolution layer: it is used to process one-dimensional data such as time series. It performs weighted summation of local regions through sliding convolution kernels to achieve local feature extraction. Convolution operation has the characteristics of parameter sharing and local perception, which is helpful to capture the temporal pattern in the sequence. Pooling layer is used to reduce the dimension and compress the feature map. The common methods include maximum pooling and average pooling, which extract the maximum or mean value in the local window, respectively. The global average pooling takes the mean value of the whole feature map, which is often used for classification output. Pooling operation can reduce the robustness of the overfitting and boosting model.
Traditional CNN usually uses fixed-size convolution kernels (such as 3 × 3) for feature extraction, which is difficult to take into account local details and wider context information at the same time. This paper proposes a multi-scale CNN module. Multi-scale convolution performs multi-scale convolution operations on the same input feature map by setting up multiple convolution kernels of different sizes in parallel to obtain representations within different receptive fields. These multi-scale features are then fused to achieve comprehensive utilization of information.
1DCNN multi-scale convolution operation: let the input one-dimensional signal x L ( L represents the length of the signal), the convolution kernel of the i -th convolution block is expressed as k i k s i × 1 ( k s i is the size of the convolution kernel, i = 1 , 2 , 3 corresponds to 3 × 1, 7 × 1, 15 × 1, respectively), the output feature is x i L × 64 , and the formula is
x i , l , c = j = 0 k s i 1 x l + j k i , j , c + b i , c
where l is the temporal position index ( 0 l L 1 ) , c is the channel index ( 0 c 63 ) , b i , c is the bias term, and k i , j , c is the i -th convolution kernel parameter of the c -th convolution block and the j -th channel.
2DCNN multi-scale convolution operation: let the input one-dimensional signal X H × W ( H , W represent the height and width of the time-frequency diagram), the convolution kernel of the i convolution block is expressed as k i k s i × k s i × 64 ( k s i × k s i is the size of the convolution kernel, i = 1 , 2 , 3 corresponds to 3 × 3, 5 × 5, 7 × 7, respectively), the output feature is X H × W × 64 , and the formula is
X i , h , w , c = a = 0 k h 1 b = 0 k w 1 X h + a , w + b K i , a , b , c + B i , c
where h , w represent the spatial location index, c represent the channel index, B i , c represent the bias term, and K i , a , b , c represent the i -th convolution kernel parameter of the c -th convolution block and the ( a , b ) th channel.
Multi-scale convolution is the core module to fully mine different scale features in two-dimensional time-frequency images. Through the parallel operation of convolution kernels of different sizes, it can simultaneously capture the local details and global distribution information of the signal, which lays a foundation for the effective fusion and modeling of subsequent features. In order to clearly present the specific implementation logic of the two-dimensional multi-scale convolutional neural network in the proposed method, Figure 4 shows its structural block diagram, which intuitively reflects the feature extraction process of convolution kernels of different sizes, as well as the connection relationship between subsequent activation, pooling and full connection.

3.3. Loss Function

In order to improve the classification accuracy and generalization ability of the model, this paper designs a joint loss function of classification loss + contrast learning loss to constrain, and optimizes the training process of the model through double constraints.
The classification loss uses cross-entropy loss to directly optimize the classification prediction results of the model and minimize the gap between the prediction probability and the real label. The formula is
L 1 = 1 N i = 1 N c = 1 C y i , c log ( p i , c )
where N represents the number of batch samples; y i , c denotes the true label that the i -th sample belongs to the c -th class; p i , c denotes the prediction probability that the i -th sample belongs to the c -th class.
By constraining the distribution of samples in the feature space, the comparative learning loss can narrow the feature distance of similar samples and widen the feature distance of heterogeneous samples, and improve the feature discrimination. The formula is as follows:
L 2 = 1 N i = 1 N 1 | P i | j P i log exp ( s ( f i , f j ) / τ ) k N i exp ( s ( f i , f k ) / τ )
where f i represents the fusion feature vector of the i -th sample; P i represents the same category sample set of the i -th sample; N i denotes the set of samples of different classes of the i -th sample; s ( f i , f j ) represents the cosine similarity between the feature vector f i and f j ; τ represents the temperature coefficient, which is used to adjust the weight distribution of similarity.
The total loss function is obtained by weighting the classification loss and the contrast learning loss. The formula is
min L ( θ ) = α L 1 + β L 2
Among them, α and β are balance coefficients, which are used to adjust the importance of the two losses in model training to ensure that the classification task objectives and feature optimization objectives are consistent.

4. Experimental Design and Result Analysis

4.1. Experimental Settings

The experiment uses the GPLA-12 data set, which contains 12 types of labeled acoustic gas leakage signals under different conditions, covering different gas pressures (0.2 MPa, 0.4 MPa, 0.5 MPa), noise environments (no noise, strong noise) and microphone settings (microphone 1, microphone 2), a total of 780 data points; the laboratory sensor device is shown in Figure 5. Different from the existing research that divides the data set into three categories based on pressure variables to reduce interference, this paper directly detects all 12 categories of working conditions to maximize the simulation of leakage detection requirements in complex working scenarios. In order to make full use of the limited data set and evaluate the generalization ability of the model, a 10-fold cross-validation method is adopted: the data set is randomly divided into 10 non-overlapping subsets, and 9 subsets are selected as the training set and 1 subset as the test set in each experiment. The experiment is repeated for 10 times, and the average of the 10 experimental results is taken as the final performance index to ensure the stability and reliability of the experimental results.
The experiment is carried out under the Windows 11 system. All programs are running on the ADM Ry2cn 9 8945HX processor, NVID2A RTX 5060 graphics card, 32 G memory computer, with Python 3.10.18 as the language. A total of 8.7 M trainable parameters, a single forward propagation of about 1.2 G FLOPs.
The device for collecting GPLA-12 data set in the laboratory is shown in Figure 5.
The data set includes two types of working conditions: ‘no noise’ and ‘strong noise’. By adjusting the distance between the microphone and the leakage point, the attenuation effect of the leakage signal in the real pipeline with the propagation distance is simulated. The signal attenuation characteristics of different leakage intensities correspond to different pressure conditions (0.2 MPa/0.4 MPa/0.5 MPa) in the data set.

4.1.1. Data Processing

Hierarchical folding strategy: 10-fold cross-validation uses ‘hierarchically divided by category proportion‘ to ensure that the proportion of samples in each compromise category is consistent with the overall data set (such as the proportion of samples in strong noise conditions is 50%, and this proportion is maintained in each compromise) to avoid missing a compromise minority sample. Data leakage avoidance measures: fixed random seeds before data set partitioning to ensure that the partitioning results can be reproduced; the training set and the test set are strictly non-overlapping, and the 10-fold cross-validation subset do not cross each other; the parameters of data preprocessing (standardization, feature selection) are only calculated based on the training set, and the validation set/test set reuses the training set parameters and does not participate in parameter update. Data enhancement (slight noise addition) is only applied to the training set, and the validation set/test set maintains the original data distribution; model hyperparameter optimization (such as learning rate and loss weight) is only based on the performance adjustment of the validation set, and is not directly optimized on the test set.
As shown in Figure 6, the confusion matrix analysis results show that the enhanced feature extraction model exhibits excellent discriminative ability in multi-classification tasks. In a total of 12 categories, the model achieved 100% classification accuracy in 9 categories, which fully verified the high accuracy of the model in capturing key time-frequency features. Although there are misclassifications in several categories, the errors are mainly concentrated between adjacent categories with high time-frequency similarity. The distinct diagonal distribution characteristics of the confusion matrix strongly prove that the framework has strong category discrimination and prediction stability when dealing with complex signals.

4.1.2. Evaluating Indicator

In order to comprehensively evaluate the performance of the model in the pipeline leakage diagnosis task, four indicators of Accuracy, Precision, Recall, and F1-Score were selected as evaluation criteria to measure the classification effect of the evaluation model.
Accuracy: Correctly identify the proportion of the sample to the total sample, reflecting the overall recognition ability
Accuracy = T P + T N T P + T N + F P + F N .
Precision: the proportion of correct samples in a category to the total number of predictions in the category, reflecting the prediction accuracy
Precision = T P T P + F P .
Recall: the proportion of the predicted correct samples of a certain category to the actual number of samples in the category, reflecting the missed detection rate
Recall = T P T P + F N .
F1 score (F1-Score): the harmonic mean of precision rate and recall rate, which comprehensively measures the performance of the model
F 1 - Score = 2 × Precision × Recall Precision + Recall .
TP represents the number of correctly classified leakage samples; TN represents the number of normal samples that are normally classified; FP represents the number of misclassified normal samples; FN represents the number of leak samples misclassified.

4.1.3. Experimental Results Contrasting

In order to verify the performance of the proposed model, six mainstream leak detection models are selected as comparison objects, including two traditional machine learning models (SVM, RandomForest) and three deep learning models (TCN, ResNet18, CNN-BiLSTM). All the comparison models use the same data set, training parameters and evaluation indicators to ensure the fairness of the experiment.
Accuracy is the core index to measure the overall recognition ability of the model. Figure 7 shows the accuracy distribution of each model in the form of box plot. The median, box span and upper and lower must correspond to the mean, fluctuation range and extreme value of accuracy, respectively, which can directly reflect the accuracy and stability of the model. It can be seen from Figure 5 that the overall position of the box line diagram of the proposed model is the highest, the median accuracy rate is 96.02%, and the box span is the smallest, the length of the upper and lower whiskers is the shortest, and there is no abnormal value. This shows that the model not only has the highest average recognition accuracy, but also has strong stability with minimal performance fluctuation.
The classification performance of the model cannot be fully reflected only by accuracy. For example, some models may obtain higher accuracy due to bias prediction of most classes, but the recognition ability of minority samples is weak, that is, the missed detection rate is high. Table 2 supplements the evaluation from the three dimensions of accuracy, recall and F1 score, and comprehensively measures the classification effect of each model.
It can be seen from Table 3 that the model recall rate of the proposed model is 94.8%, ranking first. The high recall rate indicates that in the samples that actually belong to a certain leakage condition, the proportion of correct recognition by the model is high and the missed detection rate is low. Missing detection of pipeline leakage may cause serious accidents such as explosions and environmental pollution. Therefore, high recall rate is the key guarantee for the reliability of the model. The high recall rate of the proposed model can effectively reduce the safety risk. The F1 score of the proposed model is 95.6%, which is significantly higher than other models, TCN is 94.7%, CNN-BiLSTM is 92.5%. The F1 score is the harmonic average of the precision rate and the recall rate, which can balance the false positive rate and the false negative rate of the model. The high value indicates that the proposed model achieves the optimal balance between “avoiding false positives” and “avoiding false negatives”, and the comprehensive classification performance is better. The accuracy of the proposed model is 95.7%, which is higher than all other comparison models. The high accuracy rate means that in the sample of a certain leakage condition predicted by the model, the actual proportion of the condition is high and the false alarm rate is low. In industrial scenarios, low false alarm rate can reduce unnecessary maintenance costs and improve operation and maintenance efficiency. This advantage makes the proposed model more practical in engineering.

4.2. Ablation Experimental Results and Analysis

In order to verify the necessity of each core component of the proposed model (STFT time-frequency transformation, multi-scale CNN, Transformer), ablation experiments are carried out. Ablation experiments remove the core components of the model one by one or in combination, observe the change in accuracy, and quantify the contribution of each component. Six groups of ablation experiments were designed. All ablation experiments used the same training parameters and evaluation indicators to ensure that the ablation parameters were consistent.
From Figure 8, the accuracy comparison results of each ablation scheme can be seen, which directly reflects the influence of different core component deletions on the overall recognition ability of the model. It can be seen that the accuracy of the proposed model is up to 96.02%, which verifies the rationality of the fusion of each component.
In order to more clearly reflect the impact of different component missing on the comprehensive performance of the model, the ablation experiments of different components are analyzed from four different evaluation indicators to avoid the limitations of a single indicator.
It can be seen from the heat map of Figure 9 that the proposed model has the highest score in all four indicators, indicating that the model has achieved high comprehensive performance. The four indicators of No Transformer experiment are lower than those of the proposed model, indicating that Transformer has a certain influence on the modeling ability of global features. The indicators of No STFT experiment are further reduced, indicating that the loss of frequency domain features has an all-round influence on the model. The time domain signal can only reflect the change in the signal with time, while the frequency domain signal can reflect the change in the leakage signal in more detail from the characteristics such as frequency. The results of the multi-component missing experiment show that the model has lost its feature extraction ability and cannot effectively identify different working conditions.
The experimental results shown in Figure 10 show that the accuracy curve decreases with the increase in noise intensity, but it can still maintain high robustness even under moderate intensity interference, and can still maintain more than 90% recognition ability under high intensity interference. The confidence interval (shadow area) in the figure further confirms that the model has good stability under different random noises. This anti-noise performance is mainly attributed to the synergistic effect of multi-scale time-frequency feature fusion and Transformer attention mechanism, which makes it have good robustness.

5. Discussion

Discussion. Although this study has achieved some results, there are still the following limitations: the data set scene coverage is limited; the GPLA-12 data set used in the experiment is generated in the laboratory control environment. Although some noise and pressure conditions are simulated, it does not fully cover the extreme conditions in the industrial scene, such as complex soil stratification, deep burial of pipelines, strong electromagnetic interference, simultaneous occurrence of multiple leakage points and other real environmental factors.
The industrial deployment verification is not sufficient: the long-term operation verification of large-scale real operating pipelines has not been completed, and the stability of the model in the industrial environment (such as long-term anti-sensor aging, anti-temperature and humidity drastic changes) and the response speed under extreme conditions still need to be further tested.
In view of the above limitations, combined with the needs of industrial applications, in-depth research will be carried out in the future in the following aspects: this method has certain practicality, and the next step will be further verified and improved in the actual pipeline leakage problem. Promote the field test of real operating pipelines in cooperation with enterprises, collect real data containing complex soil, flow fluctuations, and extreme noise, and construct a laboratory plus industrial mixed data set to improve the scene adaptation ability of the model. Multi-source data fusion: cross-modal fusion of acoustic signals with existing sensor data such as pressure and flow is introduced to construct a multi-modal Transformer model to further improve the detection robustness in complex interference scenarios.

6. Conclusions

Aiming at the limitations of traditional methods in the identification of pipeline leakage signal conditions, such as incomplete feature extraction, insufficient global dependence capture, and poor adaptability of small sample scenarios, this paper constructs a leakage detection model based on multi-scale convolutional neural network-Transformer (MSCNN-Transformer). The model innovatively combines the advantages of multi-scale convolutional neural network (MSCNN) in local feature extraction and the ability of Transformer in long-term temporal dependency modeling. The one-dimensional original leakage signal is transformed into a two-dimensional time-frequency diagram by short-time Fourier transform (STFT), which realizes the synchronous processing of time-domain signal and frequency-domain signal, and effectively compensates for the lack of single signal dimension feature expression.
The feature complementary mechanism significantly improves the recognition ability: the combination of STFT time-frequency conversion and multi-scale CNN realizes the dual feature capture of ‘time domain detail (1DCNN extraction) + frequency domain distribution (2DCNN extraction)’, which solves the problem that the traditional single-dimensional signal is difficult to fully describe the leakage mutation characteristics, and provides more abundant feature support for the model.
Global dependence modeling optimizes the detection effect: Transformer‘s multi-head attention mechanism effectively compensates for the defects of CNN‘s local perception, and can capture the correlation of long-distance key features in the signal, so that the model can still accurately identify the leakage condition in the scene of signal attenuation and noise interference, which significantly reduces the false negative rate.
The comprehensive performance advantage is outstanding: on the GPLA-12 data set, the model achieves a recognition accuracy of 96.02%, a recall rate of 94.8%, an F1 score of 95.6%, and an accuracy rate of 95.7%. Compared with traditional machine learning methods such as SVM and Random Forest, as well as mainstream deep learning methods such as TCN, ResNet18 and CNN-BiLSTM, the model not only has a comprehensive lead in core indicators, but also has a 10-fold cross-validation standard deviation of only 0.87%, showing stronger stability.

Author Contributions

Methodology—Validation, Y.Z., H.W. and Y.W.; Writing—First Draft Preparation, Y.Z.; Writing—Review and Editing, W.L.; Funding Acquisition, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Heilongjiang Province Postdoctoral Science Foundation (LBH-Z23259) for the support in publishing this paper.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Acknowledgments

Thanks to the reviewers for their helpful comments and suggestions. The authors would like to thank Northeast Petroleum University for providing technical and administrative support during this research. The authors also appreciate the constructive feedback provided by colleagues, which greatly improved the quality of this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

MSCNN-Transformermulti-scale convolutional neural network-Transformer
SVMsupport vector machine
CNNconvolutional neural network
RNNrecurrent neural network
LSTMlong short-term memory network
BiLSTMbidirectional long short-term memory network
sVMDvariational mode decomposition
SVDsingular value decomposition
DASdistributed acoustic sensing
TEMtransient electromagnetic method
WMDMweak magnetic detection method
TA-DBFtriple attention dual-branch fusion network
GCN graph convolutional network
SEspectrum enhancement
PAApiecewise aggregate approximation
CWTcontinuous wavelet transform
ABCartificial bee colony algorithm
ReLURectified Linear Unit
GELUGaussian Error Linear Unit

References

  1. Gao, Y.; Wang, B.; Hu, Y.D.; Yu, Z. Review of China’s natural gas development in 2024 and outlook for 2025. Int. Pet. Econ. 2025, 33, 55–67. [Google Scholar]
  2. Yao, L.; Zhang, Y.; He, T.; Luo, H. Natural gas pipeline leak detection based on acoustic signal analysis and feature reconstruction. Appl. Energy 2023, 352, 121975. [Google Scholar] [CrossRef]
  3. Wen, J.T.; Fu, L.; Sun, J.D.; Wang, T.; Zhang, G.Y.; Zhang, P.C. Leakage aperture identification of natural gas pipeline based on compressed sensing combined with convolutional network. Vib. Shock 2020, 39, 17–23. [Google Scholar]
  4. Li, M.; Li, L.; Zuo, Z.; Zhang, L.; Jiang, L.; Su, H. Identification of operating conditions of refined oil pipelines based on machine learning. Chin. J. Saf. Sci. 2024, 34, 127–135. [Google Scholar]
  5. Li, X.; Zhang, W.; Ding, Q. Understanding and improving deep learning-based rolling bearing fault diagnosis with attention mechanism. Signal Process. 2019, 161, 136–154. [Google Scholar] [CrossRef]
  6. Zhang, R.C.; Wang, X.Y.; Hu, L.L.; Lin, Z.Y.; Huang, X.A.; Zhao, B. Acoustic emission signal recognition of gas pipeline leakage based on one-dimensional convolutional neural network. China Saf. Prod. Sci. Technol. 2021, 17, 104–109. [Google Scholar]
  7. Ning, F.L.; Han, P.C.; Duan, S.; Li, H.; Wei, J. Valve leakage ultrasonic signal recognition method based on improved CNN. J. Beijing Univ. Posts Telecommun. 2020, 43, 38–44. [Google Scholar]
  8. Zhong, M.S.; Li, Z.F.; Xiong, X.Z.; Wang, X.G. A natural gas pipeline leak detection model based on one-dimensional multi-scale deep separable convolution. In Proceedings of the 34th China Process Control Conference, Hangzhou, China, 28–30 July 2015. [Google Scholar]
  9. Nie, W.; Jiang, Z.; Liu, B.X. One-dimensional convolutional and long short-term memory neural network method for pipeline leak detection. J. China Rural Water Hydropower 2022, 63, 147–152+157. [Google Scholar]
  10. Nacer, S.M.; Nadia, B.; Abdelghani, R.; Mohamed, B. A novel method for bearing fault diagnosis based on BiLSTM neural networks. Int. J. Adv. Manuf. Technol. 2023, 125, 1477–1492. [Google Scholar] [CrossRef]
  11. Zheng, S.M.; Yan, J.G.; Guo, P.C.; Xu, Y.; Li, J.; Liu, Z.X. Research on small-scale leakage detection of water supply pipelines based on VMD and deep learning. Hydraul. J. 2024, 55, 999–1008. [Google Scholar]
  12. Ma, L.; An, P.F.; Liu, W.L.; Li, D.E. Research on small leakage detection technology of water supply pipe based on acoustic signal. J. Electron. Meas. Instrum. 2024, 38, 113–123. [Google Scholar]
  13. Tang, Y.J.; Liu, D.; Xiao, Z.H.; Hu, X.; Lai, X. Research on fault diagnosis method of hydropower units based on convolutional neural network and singular value decomposition. China Rural Water Conserv. Hydropower 2021, 175–181. [Google Scholar]
  14. Song, H.Z.; Li, X.J.; Qiu, Z.G.; Xing, Q.Y.; Yang, X.K.; Liu, Y.Y. A fault identification method for axle box bearings of high-speed EMUs based on improved VMD. China Railw. Sci. 2023, 44, 146–154. [Google Scholar]
  15. Li, L.; Jiang, Z.S.; Hu, Z.C.; Yang, S.; Tang, Y.; Jiang, X.; Sun, M.; Zhang, Z.; Qiu, X.; Wang, S. Leakage signal identification method of DAS pipeline based on CNN-Transformer. Opt. J. 2025, 45, 122–131. [Google Scholar]
  16. Liang, X.L.; Li, J.G.; Xu, P.P.; Wang, J.; Liu, J.; Chen, T.; Meng, X. Leakage fault diagnosis of heating pipe network based on convolutional neural network and Transformer. Sci. Eng. 2025, 25, 5589–5601. [Google Scholar]
  17. Meng, H.Y.; Zhang, J.L.; Cai, Z.L.; Li, C. Research on DC microgrid fault diagnosis based on CNN-BiLSTM-Attention. Chin. J. Electr. Eng. 2025, 45, 1369–1381. [Google Scholar]
  18. Zhu, J.; Cheng, Q.M.; Cheng, Y.M. Fault diagnosis of modular multilevel matrix converter based on CNN-GRU deep learning. China South. Power Grid Technol. 2024, 18, 13–22. [Google Scholar]
  19. Satterlee, N.; Zuo, X.; Lee, C.W.; Park, C.W.; Kang, J.S. Parallel multi-layer sensor fusion for pipe leak detection using multi-sensors and machine learning. Eng. Appl. Artif. Intell. 2025, 153, 110923. [Google Scholar] [CrossRef]
  20. Song, F.; Zhao, H.; Miao, X.; Zhao, Y. Identification method of buried pipeline complex conditions based on TA-DBF model and information fusion technique. Measurement 2025, 255, 118085. [Google Scholar] [CrossRef]
  21. Wu, D.H.; Zhu, Z.C.; Han, X.H. Fault diagnosis of wind turbine bearings based on BO-SDAE multi-source signals. J. Syst. Simul. 2021, 33, 1148–1156. [Google Scholar]
  22. Mishra, A.; Dhebar, J.; Rai, A. Leak detection in pipelines based on a multilayer network of time–frequency domain cross-visibility graphs and graph convolutional networks. Mech. Syst. Signal Process. 2025, 238, 113250. [Google Scholar] [CrossRef]
  23. Fangli, N.; Zhanghong, C.; Di, M.; Duan, S.; Wei, J. Enhanced spectrum convolutional neural architecture: An intelligent leak detection method for gas pipeline. Process Saf. Environ. Prot. 2021, 146, 726–735. [Google Scholar]
  24. Song, L.; Cui, X.; Han, X.; Gao, Y.; Liu, F.; Yu, Y.; Yuan, Y. A Non-Metallic pipeline leak size recognition method based on CWT acoustic image transformation and CNN. Appl. Acoust. 2024, 225, 110180. [Google Scholar] [CrossRef]
  25. Wang, N.; Du, W.; Liu, H.; Zhang, K.; Li, Y.; He, Y.; Han, Z. Fine-Grained Leakage Detection for Water Supply Pipelines Based on CNN and Selective State-Space Models. Water 2025, 17, 1115. [Google Scholar]
  26. Luo, Z.Y.; Li, M.Z.; Dong, X. Effective diagnosis of rolling bearing fault diagnosis method of Vision Transformer network. J. Chongqing Univ. Posts Telecommun. (Nat. Sci. Ed.) 2025, 1–9. [Google Scholar]
  27. Dong, H.L.; Sun, T.; Wang, C.; Yang, F.; Shang, R. A new method for natural gas pipeline leak detection based on gated attention network model. Nat. Gas Ind. 2025, 45, 25–36. [Google Scholar]
  28. Du, Y.F.; Duan, P.F.; Zhao, B.X.; Hao, J.; Song, K. Leakage diagnosis of heating pipe network based on CABC-BP model. J. Guangxi Univ. (Nat. Sci. Ed.) 2023, 48, 835–846. [Google Scholar]
  29. Zhang, R.R.; Fu, S.M.; Wei, Y.Y.; Wang, Y.; Chang, Q. Classification and identification of leakage acoustic signals in water supply network. Acoust. Technol. 2025, 44, 629–639. [Google Scholar]
  30. Valery, N.; Fabien, M.; Pierre, T. Evaluation of the Leak Detection Performance of Distributed Kalman Filter Algorithms in WSN-Based Water Pipeline Monitoring of Plastic Pipes. Computation 2022, 10, 55. [Google Scholar] [CrossRef]
  31. Debossam, J.G.; de Freitas, M.M.; de Souza, G.; Souto, H.P.A.; Pires, A.P. Numerical Simulation of Non-Darcy Flow in Naturally Fractured Tight Gas Reservoirs for Enhanced Gas Recovery. Gases 2024, 4, 253–272. [Google Scholar] [CrossRef]
  32. Hamouda, A.A.; Gupta, S.; Bahadori, A. Enhancing Oil Recovery from Chalk Reservoirs by a Low-Salinity Water Flooding Mechanism and Fluid/Rock Interactions. Energies 2017, 10, 576. [Google Scholar] [CrossRef]
  33. Nithikarnjanatharn, J.; Pimollukanakul, J.; Nutkhum, W.; Rongcha, K.; Arunchai, T.; Satuwong, P. Predicting the mechanical properties of polypropylene and recycled polypropylene by the application of sigmoid and ReLU functions in neural networks. Results Eng. 2026, 29, 108459. [Google Scholar] [CrossRef]
  34. Li, J.; Yao, L. GPLA-12: An Acoustic Signal Dataset of Gas Pipeline Leakage. arXiv 2021, arXiv:2106.10277. [Google Scholar] [CrossRef]
Figure 1. STFT flow chart.
Figure 1. STFT flow chart.
Applsci 16 00480 g001
Figure 2. Multi-headed attention mechanism.
Figure 2. Multi-headed attention mechanism.
Applsci 16 00480 g002
Figure 3. The structure diagram of the proposed method.
Figure 3. The structure diagram of the proposed method.
Applsci 16 00480 g003
Figure 4. Two-dimensional multi-scale convolutional neural network structure.
Figure 4. Two-dimensional multi-scale convolutional neural network structure.
Applsci 16 00480 g004
Figure 5. GPLA-12 data set experimental device [34].
Figure 5. GPLA-12 data set experimental device [34].
Applsci 16 00480 g005
Figure 6. 12 categories confusion matrix.
Figure 6. 12 categories confusion matrix.
Applsci 16 00480 g006
Figure 7. Accuracy of comparative test.
Figure 7. Accuracy of comparative test.
Applsci 16 00480 g007
Figure 8. Comparison of the accuracy of ablation experiment.
Figure 8. Comparison of the accuracy of ablation experiment.
Applsci 16 00480 g008
Figure 9. Comprehensive evaluation of ablation experiment.
Figure 9. Comprehensive evaluation of ablation experiment.
Applsci 16 00480 g009
Figure 10. Classification accuracy under different noises.
Figure 10. Classification accuracy under different noises.
Applsci 16 00480 g010
Table 1. Basic structural parameters of the model.
Table 1. Basic structural parameters of the model.
ModuleStructureStructural Parameter
1DCNNConvolution kernel 1Convolution kernel size = 3 × 1
Convolution kernel 2Convolution kernel size = 7 × 1
Convolution kernel 3Convolution kernel size = 15 × 1
2DCNNConvolution kernel 1Convolution kernel size = 3 × 3
Convolution kernel 2Convolution kernel size = 5 × 5
Convolution kernel 3Convolution kernel size = 7 × 7
TransformerMulti-head attentionHead number: 8, input dimension: 192
Feedforward networkInput dimension: 192; hidden layer dimension: 512 activation function: Gaussian Error Linear Unit (GELU)
ClassificationLinear layer 1Input channel: 832 Output channel: 512
Linear layer 1Input channel: 512 Output channel: 256
Linear layer 1Input channel: 256 Output channel: 12
Table 2. Training parameter settings.
Table 2. Training parameter settings.
Configuration ItemTaking ValuesTraining ParameterTaking Values
Window typeHanning windowOptimizerAdamW
Window length256 sampling pointsInitial learning rate0.0005
Overlap50% (128 sampling points)Learning rate scheduling strategyReduce LR On Plateau
FFT size256 pointsEpochs80
Sampling rate assumption10 Hz (fixed)Batch Size32
Table 3. Comprehensive evaluation of comparative test.
Table 3. Comprehensive evaluation of comparative test.
ModelAccuracyRecallF1Precision
SVM82.179.882.281.6
Random Forest84.783.284.983.6
TCN94.7194.794.795.2
ResNet1883.0383.080.684.9
CNN-BiLSTM92.792.792.594.1
CNN-Transformer96.0294.895.695.7
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Li, W.; Wu, Y.; Wei, H. Research on Oil and Gas Pipeline Leakage Detection Based on MSCNN-Transformer. Appl. Sci. 2026, 16, 480. https://doi.org/10.3390/app16010480

AMA Style

Zhang Y, Li W, Wu Y, Wei H. Research on Oil and Gas Pipeline Leakage Detection Based on MSCNN-Transformer. Applied Sciences. 2026; 16(1):480. https://doi.org/10.3390/app16010480

Chicago/Turabian Style

Zhang, Yingtao, Wenhe Li, Yang Wu, and Huili Wei. 2026. "Research on Oil and Gas Pipeline Leakage Detection Based on MSCNN-Transformer" Applied Sciences 16, no. 1: 480. https://doi.org/10.3390/app16010480

APA Style

Zhang, Y., Li, W., Wu, Y., & Wei, H. (2026). Research on Oil and Gas Pipeline Leakage Detection Based on MSCNN-Transformer. Applied Sciences, 16(1), 480. https://doi.org/10.3390/app16010480

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop