1. Introduction
Urban underground utility tunnels lay electricity, communication, water supply, and thermal pipelines in a centralized manner, making them an important component of modern urban infrastructure. Among them, gas pipelines, as key components of urban operations, occasionally experience damage and leakage due to mechanical stress, external impact, corrosion and aging, and the complex working environment of the underground space. Leak holes of different sizes will lead to varying degrees of gas diffusion and explosion risks: during small-hole leakage, gas continuously escapes at a lower rate, which is not easily detected immediately, and after long-term accumulation, encountering an ignition source may trigger a flash explosion or fire; during medium-hole leakage, the gas injection flow rate increases significantly, easily forming a directional jet; large-hole leakage causes a large amount of gas to be released rapidly in a short period of time, highly likely to form a large-area explosive gas cloud, posing a serious threat to the tunnel structure and ground public safety. Therefore, achieving precise identification of the leak hole size can provide operation and maintenance personnel with a basis for differentiated emergency response and disposal decision-making, which is an important issue urgently needing to be solved in the field of gas pipeline safety management in urban underground utility tunnels [
1,
2].
Passive detection technology based on acoustic signals, due to its high sensitivity and ability to achieve non-intrusive installation, is particularly suitable for continuous online monitoring in underground utility tunnels. However, the complex acoustic characteristics in the utility tunnel, such as strong reverberation and multi-source noise interference, pose severe challenges to the accurate identification of leakage signals. To address this challenge, researchers have introduced signal processing techniques into the field of pipeline leakage detection, promoting its gradual transition from manual inspection to automated analysis. Short-Time Fourier Transform (STFT), as a basic signal processing method, is used to convert time-domain leakage signals into time-frequency domain representations. For example, Lay-Ekuakille et al. conducted spectral analysis of acoustic leakage signals in urban water supply networks using STFT, verifying the effectiveness of this method in pipeline leakage detection [
3]. In addition to STFT, other time-frequency analysis tools such as wavelet transform (WT) and Hilbert-Huang transform (HHT) have also been successfully applied to pipeline leakage detection, offering multi-resolution capabilities and adaptive basis functions that are particularly effective for non-stationary leakage signals [
4,
5]. Aiming at the limitation of the fixed STFT window, Empirical Wavelet Transform (EWT) achieves finer frequency band extraction by adaptively dividing the signal spectrum [
6]. On this basis, Xiao Qiyang et al. introduced EWT into the task of pipeline leakage acoustic signal analysis, and by extracting leakage-sensitive components after EWT decomposition of leakage acoustic/vibration signals, achieved good localization accuracy and minor leakage detection performance [
7,
8]. At the same time, Empirical Mode Decomposition (EMD) and its improved methods (such as EEMD, CEEMDAN) have shown unique advantages in non-stationary signal processing, and the pipeline leakage detection denoising algorithm based on EEMD-PRT effectively improves the stability and reliability of IMF components by introducing phase randomization technology [
9,
10]. Although the aforementioned signal processing methods significantly improve the automation level of leakage detection, the features they extract are mostly statistics geared towards binary detection (leakage/normal), and their representation capabilities are obviously insufficient for distinguishing the subtle acoustic differences corresponding to different leakage hole size. The final risk level determination still needs to rely on manual experience or rules based on fixed thresholds, making it difficult to achieve automated and refined grading of leakage hole size.
In recent years, with the rise of artificial intelligence technologies such as deep learning, pipeline leakage detection has begun to deeply develop towards intelligence and automation. Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and their variant models can autonomously extract leakage features directly from input data for classification. In CNN applications, Wang Xiufang et al. proposed a pipeline leakage aperture identification method based on dynamic depthwise separable convolutional neural networks, achieving stronger feature expression capabilities through dynamic convolutional layers and channel attention mechanisms [
11]; the leakage identification model proposed by Luo Zhengshan et al., which combines EEMD with an Improved Convolutional Neural Network (ICNN), achieved an average recognition accuracy of 98.25% under multiple working conditions [
12]. In terms of acoustic-image fusion, a multi-condition pipeline leakage diagnosis method based on a whale optimization evolutionary convolutional neural network achieved accurate leakage state differentiation under five working conditions, with diagnostic accuracy significantly higher than traditional CNN models [
13]. In RNN and LSTM applications, Mishra et al. proposed a pipeline leakage detection technology based on acoustic emission feature sequences and LSTM networks, capturing the temporal dynamic patterns triggered by leakage by constructing temporally modulated short-time descriptors, and achieved high classification accuracy on experimental signals [
14]; a non-metallic pipeline leakage level identification strategy based on PCA-Bi-LSTM reached an accuracy of 98.6% in the leakage level identification task, significantly higher than that of a unidirectional LSTM network [
15]. These deep learning methods have laid a solid foundation for the automated detection of pipeline leakage. In recent years, Transformer-based models [
16], generative adversarial networks (GANs) [
17], and self-supervised learning frameworks have also been introduced into pipeline leakage detection [
18], showing promising potential in handling long-range dependencies, imbalanced data, and limited-label scenarios.
However, existing deep learning-based pipeline leakage detection methods still have two key limitations. On the one hand, predicting continuous acoustic signals merely as time-series samples will lead to the loss of spatial–temporal correlations inherent in the acoustic differences corresponding to different leakage hole size. On the other hand, when converting data into two-dimensional images to fit convolutional neural networks, although time-frequency spatial features can be extracted, restricting the data to grid-like structural features may lose the non-Euclidean topological relationships in the original signal. These limitations cause the leakage features under different pressure conditions to easily confuse with each other in weak feature working conditions, thereby affecting the accuracy of level identification and the effect of field applications [
19]. In fact, many real-world signals, including pipeline acoustic emissions, exhibit inherent graph-like or non-Euclidean structures that cannot be adequately represented by regular grids or sequences [
20]. Graph neural networks have been increasingly adopted in industrial anomaly detection [
21], further motivating their use for pipeline leakage modeling.
As a graph neural network structure developed in recent years, Spatial–Temporal Graph Neural Network (ST-GNN) can effectively process non-Euclidean spatial data and exhibits unique advantages in the field of pipeline leakage detection. Şahin and Yüce employed a Graph Convolutional Network (GCN) for leakage and blockage detection in water supply pipeline networks, achieving a 91% fault detection accuracy across five different scenarios [
22]. Zhang et al. proposed a natural gas pipeline leakage detection and localization method based on a deep probabilistic graph neural network. By modeling sensor spatial dependencies through an attention-based graph neural network and combining variational Bayesian inference, it achieved leakage localization without anomaly labeled data, with an AUC reaching 0.9484 [
23]. Regarding leakage detection in water distribution networks, Zhang et al. proposed an Algorithm-Informed Graph Neural Network (AIGNN). By simulating the Ford–Fulkerson maximum flow algorithm for pre-training, the model extracted more generalized task-related features, achieving superior generalization capabilities compared to conventional GNNs in leakage detection and localization tasks [
24]. Lu et al. proposed an end-to-end graph neural network framework based on intelligent graphs, converting acoustic emission signals into graph representations, and achieved a localization accuracy of over 93% under 2–5 MPa pressure conditions [
25]. Moreover, graph attention networks (GATs) and dynamic spatial–temporal graph networks have also been successfully applied to leak detection in water distribution networks and real-time condition monitoring of industrial pipelines [
26,
27], confirming the general advantage of GNNs in modeling spatiotemporal dependencies. These studies demonstrate the immense potential of graph neural networks in spatial–temporal dependency modeling for pipeline systems. However, it is worth noting that current applications of graph neural networks in the pipeline leakage domain are mainly concentrated on binary detection or leakage localization tasks. There is no research yet applying them to the fine-grained identification of pipeline leakage levels (such as the three-level classification of small hole, medium hole, and large hole), especially in the special scenario of an underground utility tunnel which is closed, highly reverberant, and subject to multi-source interference; research in this direction remains a blank.
With the development of edge computing and the Internet of Things, lightweight deep learning models have become a key direction for industrial field deployment due to their low latency and low power consumption [
28,
29]. Addressing the aforementioned research gaps, this paper applies ST-GNN to the task of gas pipeline leakage risk level identification in urban underground utility tunnels, aiming to solve the problem that the features of different levels of leak holes are difficult to separate effectively under complex working conditions. The main contributions of this paper are as follows:
- (1)
For the first time, a spatial–temporal graph construction method is applied to the processing of gas pipeline acoustic signals in underground utility tunnels. It converts the experimentally collected acoustic signals into graph-structured data and simultaneously encodes the spatial–temporal topological relationships between samples and the time-frequency features of the acoustic signals within the graph structure, filling the research gap of using graph neural networks for pipeline leakage level identification.
- (2)
A lightweight spatial–temporal graph convolution feature extraction structure is proposed to realize the joint modeling of spatial–temporal coupling relationships, and this lightweight structure is advantageous for subsequent embedded deployment.
- (3)
The application effectiveness of this model in multi-classification scenarios is verified on an actual simulation platform dataset. The impacts of the number of layers and the number of training samples on detection performance are analyzed, providing a basis for engineering deployment.
The structure of this paper is organized as follows:
Section 2 introduces the collection method of pipeline gas leakage acoustic signals, the grading standards, and the traditional time-frequency features of leakage signals at different levels.
Section 3 presents the specific theoretical basis, architecture, and implementation of the spatial–temporal graph network model.
Section 4 demonstrates and analyzes the numerical experimental results of the lightweight spatial–temporal graph neural network.
Section 5 summarizes the entire paper and provides an outlook on future work.
2. Acoustic Signals of Gas Pipeline Leakage in Utility Tunnel Environments
2.1. Construction of Gas Pipeline Leakage Acoustic Signals in Simulated Utility Tunnel Environments
Obtaining a high-quality dataset of pipeline leakage acoustic signals is a necessary prerequisite for research into leakage detection and level identification. In recent years, the academic community has formed two mainstream technical routes for acquiring these signals: real-world leakage data collection and numerical simulation.
In terms of real leakage data collection, the GPLA-12 dataset released by Li and Yao is one of the larger public datasets in the field of pipeline leakage acoustics, covering 684 training/testing acoustic signals and 12 classification categories; this dataset has been widely used in research on gas pipeline leakage detection based on acoustic emission signals. Li Yuxing et al. designed a high-pressure gas transmission pipeline leakage test device based on the transient model method and the acoustic wave leakage detection method, verifying the generalizability of model tests through similarity analysis [
30]. Liu et al. designed a complete high-pressure gas pipeline leakage experimental device and acoustic data acquisition system based on the acoustic wave leakage detection method, using CFD simulation to guide the selection of experimental parameters [
31]. While these experiments provided a large amount of valuable measured data, most were conducted on relatively ideal open pipeline structures with varying sensor deployment methods, failing to fully simulate real engineering scenarios such as multi-pipeline parallelism and restricted spatial reverberation in underground utility tunnels.
In terms of numerical simulation, the combination of Computational Fluid Dynamics (CFD) and aeroacoustic theory provides another important tool for studying pipeline leakage acoustic signals. By using commercial CFD software for high-precision simulation of the leakage flow field and extracting sound source characteristics through acoustic analogy methods, the limitation of experimental data coverage can be mitigated, allowing for the flexible simulation of various combinations of leakage hole size, internal pressures, and medium conditions. Regarding leakage aeroacoustic source modeling, Liu et al. studied the generation mechanism of aeroacoustics during leakage and established a simulation model in CFD software to obtain flow field and sound field data [
32]. Recent studies have further employed Large Eddy Simulation (LES) to calculate the non-steady flow field during gas pipeline leakage and combined it with the Möhring acoustic analogy method to extract aeroacoustic sources, discovering that the leakage source exhibits quadrupole characteristics [
33]. Li et al. utilized CFD methods to study the acoustic wave propagation model of pipeline leakage, constructing an amplitude attenuation model and a waveform diffusion model, which provided a theoretical basis for acoustic wave leakage localization methods [
34]. Ayyildiz et al. used CFD simulation to analyze micro-hole leakages of 1.27–3.3 mm, studying leakage flow field characteristics under low-pressure conditions, including flow leakage, pressure distribution, and velocity profiles, and used Power Spectral Density (PSD) and Fast Fourier Transform (FFT) to predict sound pressure changes and acoustic oscillations and turbulence behavior around the leak hole [
35].
Despite the unique advantages of numerical simulation in terms of operational flexibility, it faces fundamental challenges. The simplified assumptions of CFD simulation regarding actual physical processes introduce systemic biases, and the authenticity and reliability of its calculation results always require calibration and validation with experimental data [
36]. More critically, multi-path interference of sound propagation paths and strong reverberation effects in utility tunnels are often oversimplified or completely ignored in existing simulation models, making it difficult for simulated data to truly reflect sound propagation characteristics in the semi-enclosed space of an underground utility tunnel [
37]. Furthermore, CFD simulations typically require significant computational resources, especially when using high-precision turbulence models like LES, making the computational cost extremely expensive and difficult to promote on a large scale in engineering practice [
38].
The aforementioned review indicates that while existing data acquisition methods have specific application scenarios, none adequately adapt to the actual requirements of gas pipeline leakage detection in urban underground utility tunnels. Pipeline leakage is a low-probability event; operational data collection struggles to cover complex environments and multi-condition factors, while numerical simulation fails to accurately simulate the sound propagation characteristics of enclosed tunnel spaces, particularly humidity differences caused by seasonal changes. To solve these problems, this study built a gas pipeline leakage acoustic signal simulation platform based on a real utility tunnel space in Southern China. As shown in
Figure 1, the city where this utility tunnel is located is in a subtropical humid monsoon climate zone with large temperature differences across four seasons—rainy summers and relatively dry winters—covering many environmental impact factors. Compressed air was used as the leakage gas source instead of natural gas to ensure experimental safety.
The experimental platform mainly consists of four parts, as shown in
Figure 2: (A) An air dryer, used to handle the impact of the high-humidity tunnel environment on the air compressor; (B) An air compressor, providing a high-pressure gas source for the experiment; (C) A large buffer gas tank, providing a stable pressure source when the compressor is off to avoid interference from compressor noise during acquisition; (D) Pipeline samples with man-made defects, used to simulate the damaged state during pipeline leakage.
As shown in
Figure 3, the experiment employed two leak types: one is a circular-like breakage, simulating defects caused by conventional corrosion; the other is a slit-like notch, often appearing under conditions of stress corrosion or mechanical collision. To ensure sample diversity and representativeness, multiple specifications of defect sizes were designed, as shown in
Table 1.
2.2. Classification of Pipeline Leakage Levels
The reasonable classification of pipeline leakage levels serves as the foundation for the refined identification of leakage level. Currently, academia and the engineering community primarily classify gas pipeline leakage levels based on two methods: leakage aperture size or the proportion of leaked gas volume.
Classification based on leakage aperture directly determines leakage flow rate, jet velocity, and sound source intensity, and is closely related to the severity of the leakage consequences. Taking high-pressure urban gas pipelines as an example, leakage modes are typically divided into four levels: small hole leakage (5 mm), medium hole leakage (25 mm), large hole leakage (100 mm), and pipeline rupture (300 mm). Additionally, finer-grained classification methods, such as pinhole leakage (1–3 mm) and micro-hole leakage (3–10 mm), are used for identifying minor leakages in equipment, pipeline flanges, and instrument joints. Classification based on the volume of leaked gas often uses the ratio of leaked gas to the total transmission volume as the evaluation criterion: small leaks involve a volume less than 3% of the total transmission, medium leaks range from 3% to 10%, and large leaks exceed 10%.
Considering the research objectives and practical engineering requirements, this paper adopts an aperture-based classification method. Accounting for the diversity of leak hole shapes, leakage area is used as the basis for division, classifying gas pipeline leakages into three levels:
Level 1 (Leakage area < 4 mm2): Gas escapes continuously. In practical engineering, this usually corresponds to preliminary defects such as local corrosion perforation or micro-cracks. Due to rust spots or uneven pipe surfaces, the actual leak hole is difficult to find visually; however, activating explosion-proof axial fans for strong ventilation can effectively reduce the risk of explosion.
Level 2 (Leakage area ≈ 4 mm2 to 8 mm2): The gas jet velocity increases significantly, easily forming a directional jet. This typically corresponds to expanded local damage or medium-sized external force injuries. Leakage rates and volumes rise significantly, and forced ventilation or water mist dilution schemes struggle to effectively mitigate explosion risks, posing a direct threat to the structural safety of the utility tunnel.
Level 3 (Leakage area > 8 mm2): A large amount of gas is released rapidly in a short time. Even with forced ventilation, the discharged gas easily accumulates on the surface and forms a large-area explosive gas cloud, posing a severe threat to the tunnel structure and ground public safety. This usually corresponds to major accident conditions such as severe pipeline damage or joint failure, requiring the immediate activation of emergency plans and external support.
This three-level classification system fully considers the engineering consequences, providing a physically meaningful classification basis for subsequent acoustic-signal-based leakage level identification, as well as clear and distinguishable supervision labels for model training. In the following experiments, Level 1, Level 2, and Level 3 are, respectively, mapped to Class 0, Class 1, and Class 2 as model labels.
2.3. Time-Frequency Characteristics of Acoustic Signals Under Different Leakage Conditions
The acoustic signals generated by pipeline leakage are a generalized acoustic emission phenomenon formed by turbulence shear and wall interaction as gas is ejected from the leak hole at high speed. The time-frequency characteristics of these acoustic signals are closely related to factors such as leakage aperture, internal pipeline pressure, and leak hole type.
Based on the simulated experimental platform described above, this study systematically analyzed the impact of internal pipeline pressure and leakage levels on the time-frequency characteristics of acoustic signals. From a pressure gradient perspective, as shown in
Figure 4, the signals exhibit significant nonlinear evolution characteristics in both the time and frequency domains. In the time domain, signals under low-pressure conditions are dominated by stable random noise, with relative strength peaks maintained at a magnitude of
; the waveform oscillations are gentle and the transitions between peaks and valleys are smooth, indicating low vibration excitation energy and a system operating within a linear vibration range. As pressure gradually increases, the peak relative strength grows exponentially, reaching
under high-pressure conditions; the proportion of sharp peaks and valleys in the waveform increases significantly, and the intensity of oscillations is markedly enhanced.
As shown in
Figure 5, in the frequency domain, the spectral main peak under low-pressure conditions is concentrated in the 20–30 kHz range, with a peak relative strength of approximately 150. The spectral distribution is relatively broad and the energy is dispersed, with a low proportion of high-frequency harmonic components, which is consistent with the spectral characteristics of linear vibration signals. As pressure increases, the intensity of the main spectral peak grows, with relative strength peaks exceeding 12,000 under high-pressure conditions. Simultaneously, the spectral distribution range gradually broadens, and the number and intensity of secondary peaks increase. This indicates that increasing pressure leads to a nonlinear amplification of excitation energy at the vibration source, causing the system vibration mode to transition from linear to nonlinear, increasing the proportion of high-frequency harmonics, making the waveform more intermittent and bursty, and causing spectral energy to diffuse into higher frequency regions.
There is a significant synergistic effect between pressure and the leak hole on the signal. Under coupled conditions of high pressure and large leak holes, the time-domain signal simultaneously exhibits high-intensity amplitudes and strong regular fluctuations; the relative strength peak can reach 8000, and the repetition frequency of periodic pulse clusters increases significantly. In the frequency domain, the main spectral peak intensity reaches its maximum value, while the number and intensity of secondary peaks are significantly enhanced, further broadening the spectral distribution range. This indicates that the joint action of pressure and the leak hole amplifies the nonlinear vibration characteristics of the system, enhancing the coupling effect between random disturbances and periodic excitations, thereby making signal anomaly features even more prominent in both time and frequency domains.
3. Lightweight Spatial–Temporal Graph Network Model for Leakage Sound Classification
The time-frequency analysis results presented above indicate that acoustic signals corresponding to different leakage levels exhibit certain distributional differences in both the time and frequency domains. However, these differences manifest complex nonlinear coupling characteristics under varying pressure conditions, making it difficult to achieve refined leakage level identification using only traditional time-frequency features. On one hand, although the energy distributions of small-hole and large-hole leakages differ, their frequency bands overlap, and ambient noise together with tunnel reverberation further obscure feature boundaries. On the other hand, signal features of the same leakage level may drift under different pressure conditions, rendering fixed-threshold classification methods ineffective. Therefore, a modeling approach that can simultaneously capture intra-signal temporal dependencies and spectral structural similarities is required. Graph neural networks, with their powerful representation capability for non-Euclidean data, offer a new perspective for modeling complex relationships among time frames. Motivated by this, we propose a lightweight spatial–temporal graph neural network that independently constructs a graph structure for each acoustic sample and employs Chebyshev graph convolutions to extract deep discriminative features, thereby achieving high-precision leakage level identification.
3.1. Graph Convolutional Networks
A graph convolutional network is a class of neural networks specifically designed to process graph-structured data. A graph can be represented as , where is the set of nodes, is the set of edges, is the node feature matrix (: number of nodes, : feature dimension per node), and is the adjacency matrix. In the leakage sound classification task of this work, each sample is modeled as an independent graph, where nodes correspond to the time frames of the short-time Fourier transform and node features are frequency-domain magnitude vectors.
Spectral graph convolution leverages the eigen-decomposition of the graph Laplacian matrix to perform convolution operations. Traditional spectral convolution suffers from two limitations: the convolution kernel is global with a number of parameters equal to the number of nodes
, and its computational complexity is as high as
. To address these issues, Defferrard et al. [
39]. approximated the graph convolution kernel using a
Chebyshev polynomial expansion. First, the filter is parameterized as a polynomial (1).
Here,
is the highest order of the polynomial, reducing the number of parameters from
n to
, but the complexity remains
. To further reduce computational cost, the Chebyshev polynomial
is used for approximation (2).
Here,
are trainable Chebyshev coefficients, and
is defined recursively.
The matrix
is the normalized diagonal matrix of eigenvalues (4).
Here,
the largest eigenvalue of the Laplacian matrix. This scaling ensures that the eigenvalues of
lie in the interval [−1,1], as required by the Chebyshev polynomial recurrence. In practice, this normalization and the subsequent Chebyshev spectral convolution are efficiently implemented by the
layer in PyTorch 2.11.0 Geometric, which internally computes the scaled Laplacian and performs the recursive filtering operation. By convolving the Chebyshev polynomial filter with the input feature matrix
, multi−order neighborhood feature aggregation is achieved (5).
Graph convolution can be viewed as filtering the graph signal without changing the feature dimension. To change the feature dimension, a trainable weight matrix can be multiplied after the graph convolution (6).
Here, is the trainable weight matrix of the layer, and denotes the Chebyshev convolution function.
Based on the above theoretical foundation, we design a lightweight Chebyshev graph convolutional classifier using the
implementation from PyTorch Geometric, which handles the Laplacian scaling and Chebyshev polynomial recursion internally. The classifier consists of three Chebyshev graph convolutional layers. The first layer maps the input node features (dimension
) to a 64-dimensional hidden space, the second layer further compresses them to 32 dimensions, and the third layer outputs a 16-dimensional node representation. A global mean pooling operation then aggregates all node features into a graph-level vector, which is fed into a linear layer to produce classification scores for the three leakage levels, and finally
is applied to obtain the probability distribution. The entire model can be formalized as (7).
denotes the node feature matrix output by the -th Chebyshev graph convolutional layer, is the output feature dimension of that layer. is the result of applying Dropout. represents all trainable parameter tensors of the -th layer, which encapsulate both the Chebyshev polynomial coefficients and the feature projection weights. Global mean pooling compresses the node features of the third layer along the node dimension into a graph-level representation in , which is then also subjected to to produce . The weight matrix and bias of the linear classification layer map the graph-level feature to classification scores , and finally yields the log-probability prediction for the three leakage levels.
It should be noted that the effective operation of graph convolutional layers heavily depends on the reasonable construction of the input graph structure. Therefore, the following subsection elaborates on the generation of graph data from raw acoustic signals.
3.2. Spatial–Temporal Graph Construction Method for Single-Sample Acoustic Signals
In contrast to most graph neural network approaches that treat all samples as nodes in a global graph, this paper adopts a per-sample independent graph construction strategy. Each leakage acoustic sample is preprocessed to generate an independent graph data structure. This inductive graph construction method is more suitable for online monitoring scenarios and preserves the internal time-frequency structure of each sample. Specifically, the graph construction process begins with time-frequency analysis.
For a given leakage acoustic signal
, the short-time Fourier transform is first applied. A Hanning window is used as the window function, with window length set to
, overlap length set to 128, and sampling rate set to 80 kHz. The short-time Fourier transform is defined as (8).
Here,
is the complex time-frequency matrix,
denotes frequency,
denotes the time frame, and
is the window function. The Hanning window is expressed as (9).
Here, is the window length. The transformation yields a complex time-frequency spectrum matrix , where is the number of time frames and is the number of frequency bins. The magnitude spectrum yields a real matrix , with each row corresponding to the frequency-domain feature vector of a time frame. In this work, the STFT magnitude spectrum is directly used as the node feature matrix, as experiments show that the magnitude spectrum alone provides sufficient discriminative information for distinguishing different leakage levels, thereby avoiding the extra computational overhead of explicit power spectral density estimation.
Once node features are obtained, the connections between nodes—i.e., the graph edges—need to be defined. Each time frame is treated as a node in the graph, so the total number of nodes is . Two complementary strategies are employed to define graph edges. The first is temporal neighborhood edges: each node is connected to its two preceding and two succeeding frames, i.e., node is connected to . This bidirectional edge structure enables the model to capture short-term dynamic evolution patterns of the leakage acoustic signal along the time axis. The second strategy is neighbor edges: to capture non-local similarities between different time frames (e.g., harmonic structures or formant patterns), the cosine similarity between all pairs of node feature vectors is computed. Cosine similarity is defined as . Each node is connected to the most similar other nodes, with . The union of the two types of edges forms the final edge set , thereby completely defining the graph topology. Ultimately, each sample is transformed into a graph , where and the adjacency relations are recorded by an edge index matrix. This per-sample independent graph construction strategy avoids the huge computational cost associated with a global graph and is naturally suited for inductive learning in online monitoring scenarios.
3.3. Model Training and Evaluation Strategy
After obtaining the graph-structured data and determining the network architecture, the training procedure and performance evaluation methods need to be further specified. The goal of model training is to minimize the discrepancy between predicted and true labels. This paper adopts the negative log-likelihood loss function, defined as (10).
Here, denotes the set of training samples, is the cross-entropy loss function, is the true label, and is the log-probability output by the model. This loss function is equivalent to the cross-entropy loss for classification tasks.
The total sample is collected from the simulated utility tunnel platform (covering three leakage levels). Each sample is independently converted into a graph following the method described in
Section 3.2. The Adam optimizer is employed with an initial learning rate of 0.001. A cosine annealing learning rate scheduler is used to gradually reduce the learning rate over a total of 300 training epochs.
Model performance is evaluated on the test set using multiple metrics, including accuracy, macro-averaged score, macro-averaged AUC, and Matthews correlation coefficient. The confusion matrix and receiver operating characteristic curves are also computed to provide a detailed analysis of the model’s classification behavior across the three leakage levels. These evaluation methods comprehensively reflect the model’s ability to identify different levels of leakage risk. Based on the model design, graph construction method, and training strategy described above, the next section will comprehensively validate the proposed method through numerical experiments.
The constructed neural network model is illustrated in
Figure 6, and the next section will comprehensively validate the proposed method through numerical experiments.
5. Conclusions and Future Directions
This paper addressed the challenge of fine-grained identification of gas pipeline leakage level in urban underground utility tunnels by proposing a lightweight spatial–temporal graph neural network (ST-GNN). The core innovation lies in the per-sample graph construction strategy: unlike conventional approaches that treat acoustic signals as fixed-grid spectrograms or sequential time series, each signal is independently transformed into a graph where STFT time frames serve as nodes, and temporal neighborhood edges together with K-nearest neighbor edges explicitly encode both local temporal dynamics and non-local spectral similarities. This graph representation naturally captures the non-Euclidean topological structure inherent in leakage acoustics—a capability fundamentally absent in CNN- or RNN-based methods. Building upon this, a three-layer Chebyshev graph convolutional network with only 32,611 parameters and 2.58 ms inference time per sample is designed, achieving an effective accuracy-efficiency balance suitable for edge deployment.
Experimental results on a real utility tunnel simulation platform demonstrate that ST-GNN achieves a test accuracy of 96.73%, an F1 score of 0.9653, and an AUC of 0.9892 in the three-level leakage classification task. t-SNE visualization confirms progressive feature separation from complete mixing to distinct clustering, while ablation experiments validate the optimality of the three-layer architecture. Comparative experiments with 1D-CNN, 2D-CNN, 2D-CNN-AT, TCN, and F-SVM further reveal that the graph-based paradigm fundamentally overcomes the structural mismatch between fixed-grid convolutions and the non-Euclidean spectral topology of leakage signals, delivering superior convergence stability and classification accuracy. Multi-scale training statistics also verify the model’s robustness under limited data conditions.
The main contribution of this work is the introduction of per-sample spatial–temporal graph construction and Chebyshev graph convolution into gas pipeline leakage level identification in underground utility tunnels, filling a research gap in graph-based methods for this specific scenario. The proposed lightweight design reconciles high accuracy with low computational cost, facilitating subsequent deployment on edge computing platforms. Future work could be conducted in the following directions: First, extending this method to leakage detection tasks for other municipal pipelines, such as water supply pipelines, to verify its cross-domain transferability; second, refining and optimizing the algorithm and embedding it into the RK3588 device platform for operational use.